The TM method is a promising tool to extract the bearing fault features under noisy working conditions. It overcomes the difficulty of parameter setting in the VMD method by uniting the modes containing similar fault contents that are obtained with different parameters. The computational efficiency and feature extraction performance of the TM method have been enhanced as compared to the traditional VMD methods. This paper intends to further improve the TM method by developing the LTSA algorithm. The improved TM method addresses the issues in two aspects: construction of personalized local data distribution and formation of symmetric TM feature, which are described in detail in the following.
3.1 Construction of Personalized Local Data Distribution
In the original TM method, the local data distributions of the high-dimensional data of multi-bandwidth modes being established in Eq. (3) are constructed by the same number of neighboring data points via the traditional k-nearest neighbor algorithm in the LTSA. The multi-bandwidth modes is consist of the fault-induced transient components and the fault-unrelated components. The fault-induced transient components distribute in the sparse areas because they have impulsive characteristic, while the fault-unrelated components distribute in the dense areas because they are regarded as noise. If k is larger than the data number of one fault impulse in the sparse areas, some noise data points in the dense areas are selected as the neighbors of the impulse data points, which corrupts the inherent regularity of the fault impulse, leading to that the fault-induced transients in the learned TM feature are not dominated or even submerged by the noise. If k is smaller than the data number of one fault impulse in the sparse areas, the noise data points in the dense areas could not obtain enough neighbors to show their difference from the impulse data points, resulting in that some noise components are regarded as the manifold structure that is kept in the TM feature. Due to the different local linearity property and data density between the impulse areas and noise areas, the local data distributions of different data points should be represented by different number of neighboring data points. Moreover, the determination of the neighborhood size in the original TM method is time-consuming because the LTSA algorithm must be repetitively conducted with a series of neighborhood sizes.
This paper proposes to construct personalized local data distribution for each data point by introducing the natural nearest neighbor (TN) algorithm to the LTSA. The TN algorithm is a scale-free nearest neighbor method that does not preset specific scale which determines the performance of manifold learning, such as the neighborhood size k in the traditional k-nearest neighbor algorithm. If the traditional k-nearest neighbor algorithm is regarded as an active neighbor search process, the TN algorithm is a completely passive neighbor confirmation process. The idea of the TN algorithm is to assign the neighbor number of each data according to the density of the data area for construction of personalized local data distribution. The specific steps of the TN algorithm is as follows:
1) Calculate the distances between each data point zi and other data points.
2) Given the initial value of k, record the nearest k data points for each point zi.
3) For each data point zi, the data points that take zi as one of the nearest k points are considered as the natural nearest neighbors of zi.
According to the TN algorithm, the impulse data points will have less neighbor numbers than the noise data points. On the other hand, due to that the amplitudes of the fault-induced transients are usually larger than the noise, the impulse data points are difficult to be treated as the neighbors of the noise data points, and vice versa. Thus, the difference of local data distribution between the fault-related transient components and the fault-unrelated components is strengthened. Therefore, the TN algorithm helps to select proper neighborhood size for each data points and construct personalized local data distribution, which is beneficial for the TM feature learning by the LTSA technique. The LTSA based on the TN is called TN-LTSA in this paper. Some data points may also be misled as the neighbors of improper areas by the TN algorithm. In this condition, the inconsistency between the constructed local data distribution and those in the same area is increased, which will be alleviated in the dimensionality reduction process of the LTSA method. By introducing the TN algorithm to the LTSA, the performance of the TM feature is expected to be improved.
Due to the fact that the construction of the local data distribution is only related to the local data density in the TN algorithm, the parameter k has little effect on the feature learning performance. Therefore, the TN-LTSA only needs to be performed once for the TM feature learning, which improves the efficiency of the TM method significantly. The range of k value is generally between 10 and 50. Without loss of generality, k is set as 30 in this paper.
3.2 Formation of Symmetric TM Feature
In the original TM method, the manifold output is a one-dimensional vector and is regarded as the TM feature representing the fault transient components. However, the TM feature would probably be asymmetric in the up-and-down direction, which is not the real waveform pattern of the fault transient components. There are many reasons for this problem, including but not limited to unreasonable local information extraction, improper local space construction, and theoretical limitations of the LTSA on the processing of one-dimensional signals.
In order to alleviate the asymmetry phenomenon of the TM feature, a weight-based feature compensation strategy is proposed in this paper to form a synthetic TM feature. d is set as two in the improved TM feature. Then, the TN-LTSA output is written as
$$ WM = [w^{1} \;w^{2} \;]^{{\text{T}}} . $$
(7)
The two vectors w1 and w2 are actually two eigenvectors of an alignment matrix constructed in the TN-LTSA, whose corresponding eigenvalues are λ1 and λ2, and λ1 < λ2. The smaller the eigenvalue is, the lower the affine error of the manifold feature in the corresponding dimension will be. Therefore, w1 is more similar to the fault-related transients than w2. It is discovered that, when w1 is asymmetric in the vertical direction, w2 has complementary waveform pattern as compared to w1. This motivates us to combine the second vector to compensate the asymmetry of the first vector. Considering the different amplitude properties of the two vectors, the two eigenvalues are used as the weight coefficients of the opposite eigenvectors. The synthetic TM feature is formed by the weight-based feature compensation strategy as:
$$ w = \frac{{\lambda_{1} w^{2} \pm \lambda_{2} w^{1} }}{{\left\| {\lambda_{1} w^{2} \pm \lambda_{2} w^{1} } \right\|_{2} }}, $$
(8)
where the plus or minus sign is determined according to the respective waveform, ||·||2 is the L2 norm operation. The symmetry of the synthetic TM feature is much enhanced, hence is more suitable to represent the fault transient components than the feature obtained in the original TM method.
3.3 Summary of the Improved TM Method
The flowchart of the improved TM method are illustrated in Figure 2, and the specific procedures are briefly described as follows.
Step 1: Perform the RVMD on the original signal repetitively with 10 values of the bandwidth balance parameter α, which is selected from the range of [100, 5000] equidistantly.
Step 2: Construct the multi-bandwidth modes by selecting the fault-related modes from the decomposed modes based on the Gini index.
Step 3: Employ the proposed TN-LTSA on the matrix of multi-bandwidth modes with k = 30 and d = 2. The manifold output includes two-dimensional data with complementary waveform.
Step 4: Form the symmetric TM feature by the weight-based feature compensation strategy via Eq. (8). The possible bearing fault characteristic period is expected to be easily identified in the obtained symmetric TM feature.