Fault Diagnosis Based on BP Neural Network Optimized by Beetle Algorithm

In the process of Wavelet Analysis, only the low-frequency signals are re-decomposed, and the high-frequency signals are no longer decomposed, resulting in a decrease in frequency resolution with increasing frequency. Therefore, in this paper, firstly, Wavelet Packet Decomposition is used for feature extraction of vibration signals, which makes up for the shortcomings of Wavelet Analysis in extracting fault features of nonlinear vibration signals, and different energy values in different frequency bands are obtained by Wavelet Packet Decomposition. The features are visualized by the K-Means clustering method, and the results show that the extracted energy features can accurately distinguish the different states of the bearing. Then a fault diagnosis model based on BP Neural Network optimized by Beetle Algorithm is proposed to identify the bearing faults. Compared with the Particle Swarm Algorithm, Beetle Algorithm can quickly find the error extreme value, which greatly reduces the training time of the model. At last, two experiments are conducted, which show that the accuracy of the model can reach more than 95%, and the model has a certain anti-interference ability.


Introduction
According to statistics [1], 70% of rotating machinery failures are related to bearing failures. Once a bearing fails, it will cause a series of chain failures, which will seriously affect the operation safety of the entire equipment [2]. At present, the diagnosis of bearing faults is mostly based on vibration signals. Data-driven methods generally extract features of vibration signals of limited length, such as mean square error, kurtosis, and non-linear dynamic parameters for fault diagnosis [3]. In order to obtain richer features of fault diagnosis, generally, the signal samples are first decomposed, and then the corresponding parameters of each sub-signal are calculated as the fault characteristics. Fourier Analysis, Wavelet Transform, Empirical Mode Decomposition, Local Mean Decomposition [4,5] are all very effective linear system analysis methods in the past. In recent years, many new signal processing methods have been produced, such as the Wavelet Packet Analysis, which provides a broader development space for bearing fault diagnosis research [6,7].
The classification of faults is essentially a process of identifying the size and type of fault. Scholars at home and abroad have proposed many supervised models for fault classification and recognition. Yan et al. [8] extracted signal time domain and frequency domain features and based on the particle swarm optimization support vector machine (PSO-SVM) classification model to realize the recognition of rolling bearing multi-fault states. Kankar et al. [9] used continuous wavelet transform to extract statistical features, and compared support vector machines with artificial neural networks and self-organizing graph models. The results showed that the features selected by Meyer wavelet can achieve higher fault classification efficiency through SVM classifier. Zhou et al. [10] extracted bearing fault features based on the NCA method, and merged multi-channel data through the CHMM method, which effectively removed redundant information and improved the fault diagnosis effect. Zaragoza [11] and others combined hidden Markov model and deep perceptron to correctly model variable-length symbol sequences through non-segmented training. The results show that the results obtained by the proposed hybrid MLP-HMM method are to a large extent. It is better than previous work. Chen [12] combined the classification ability of SVM and the ability of HMM to distinguish dynamic time series, and built bearing fault diagnosis models of SVM and HMM with the help of Sigmoid function and Gaussian model, and established the feature vector of AR parameters for diagnosis, which effectively improved. The accuracy of bearing fault diagnosis is improved. Although these supervised models can achieve fault classification, they are not suitable for situations where the number of samples is large.
In response to this defect, some scholars have proposed unsupervised learning methods such as deep learning and deep neural networks [13]. The deep neural network model is implemented on the basis of shallow neural networks by changing the number and structure of intermediate layers. Shallow neural networks mainly include BP neural networks, LVQ neural networks, application of which is relatively extensive. Pu [14] used a three-layer BP Neural Network for bearing fault diagnosis. Wu [15] used signal processing techniques to extract the crest factor, waveform factor, pulse factor, margin factor and kurtosis of rolling bearing vibration time domain signals as feature to diagnose bearing failure. Jiang [16] introduced the signal multi-resolution technology into the analysis of bearing fault vibration signals. A fault diagnosis model for rolling bearings is established by using a Learning Vector Quantization (LVQ) Neural Network. Zhao proposed a new BP Neural Network model based on an improved frog jumping algorithm and applied it to identify bearing failures which has good generalization ability and strong robustness [17]. BP Neural Network has been widely used in many fields, but there are still some shortcomings in the application [18]. In this paper, a fault diagnosis model based on BP Neural Network optimized by Beetle Algorithm is proposed to identify the bearing faults, which greatly improves the performance of BP network and the accuracy can reach more than 95%.

BP Neural Network
BP Neural Network [19] is called Multi-layer front feedback network in most cases which is composed of three blocks: input layer, output layer and hidden layer. Figure 1 shows the structure of the BP Neural Network. The BP Neural Network [20] is trained and learned through error back feedback, which can train and store a large number of mapping relationships without explicitly knowing the mathematical equations reflecting the mapping relationship. After the training data enters the input layer of the network, the weights and activation functions of the hidden layer and the input layer are used to take corresponding calculations to obtain the actual output value. Then the actual output value expected is used to calculate the error value under actual conditions [21]. If the actual error value exceeds the expected error value, the feedback of this error value should be reversed and new training data will be entered. The condition for stopping the calculation is either to meet the accuracy requirement or to reach the maximum number of iterations. The algorithm process of BP Neural Network is shown in the Figure 1.

Beetle Algorithm
In 2017, the concept of Beetle Algorithm was first proposed. Compared with the particle swarm optimization algorithm, the Beetle Algorithm requires [22] only one individual, so its computational efficiency is much higher than that of the Particle Swarm Algorithm. Even in the absence of specific expressions of functions and gradient information, such Beetle Algorithm can achieve the desired optimization requirements. Beetle Algorithm can be described as follows.
(1) The main part of the beetle is simplified into a centroid, and the two sides of the centroid are the two tentacles of the beetle. (2) The ratio of the step size of the beetle to the distance d0 between the two antennae is a fixed constant. Therefore, it can be obtained that the long distance between the two antennae is the big beetle taking a big step, while the shorter distance between the two antennae is the small beetle, which takes a small step. (3) After the Beetle flies to the next step, the head orientation is random.
The specific modeling steps of the Beetle Algorithm are as follows: (1) To optimize a specific n-dimensional space, ′ xl ′ can be chosen as the sign of the left antenna coordinates, ′ xr ′ as the sign of the coordinates of the right antenna and ′ x ′ as the coordinate symbol of the centroid. The distance between the two antennas is represented by d0. According to the theory above, the vector direction of the right horn to the left should be also unrestricted since the orientation of the head of the celestial is not limited. So the random vector ′ dir = rands(n, 1) ′ can be used to represent it. Then dir = dir/norm(dir) the conclusion of xl − xr = d0 · dir can be got from normalizing the random vector; obviously, ′ xl ′ , ′ xr ′ can also be obtained through the position of the centroid: (2) Because the function ′ f ′ to be optimized, the function value of the pair of antennae are calculated and judged.
When f left < f right , the beetle moves forward in the direction of the left tentacles: When f left > f right , the beetle moves forward in the direction of the right tentacles: The two cases described above are all for searching for the minimum value of ′ f ′ , which can be uniformly written by using the symbol function sign:

Wavelet Packet Decomposition and Reconstruction
In the actual situation, the rolling bearing vibration signal is usually carried with many noise signals and the noise signal is very easy to cover the weak characteristic signal hidden in the vibration signal, which causes that the useful information can not be obtained. The analysis method of wavelet packet is to decompose the original signal with noise, then the characteristic signal containing energy information will be collected in some frequency bands, while the energy information of the noise signal is evenly dispersed in each frequency band. Therefore, we can separate the noise signal from the characteristic signal by setting an appropriate threshold to make the wavelet coefficient of the noise signal zero, as a result, the goal of reducing noise is achieved. The specific process of using Wavelet Packet Transform is as follows.
(1) First, the Wavelet Packet Transform is used to decompose the collected original vibration signal. We first select the appropriate wavelet function to determine the number n of decomposition layers of the wavelet packet, that is, the n-layer decomposition of the vibration signal is as follows.
(2) Calculate the best tree for the wavelet packet. We derive the best tree of wavelet packets based on an entropy criterion given by the wavelet packet. (3) The threshold of the Wavelet Packet Decomposition coefficient is quantized. The appropriate threshold and its calculation method are selected, and then the threshold quantization is used to process the Wavelet Packet Decomposition coefficient. (4) Decomposition signal wavelet packet reconstruction. Based on the decomposition coefficients of the wavelet packet and the quantized coefficients, the wavelet packet coefficients is reconstructed.

Wavelet Packet Energy Feature Extraction
The sum of the squares of the individual subspace signals is used to represent the energy value of the wavelet packet. Once the rolling bearing fails, the energy value in each frequency band of its vibration signal will be greatly disturbed. The type of fault is different, and the amount of energy reflected in each frequency band of the vibration signal is also different. Therefore, the type of fault can be determined based on the distribution of energy values in each frequency band of the vibration signal. The wavelet packet energy value after the fault signal is decomposed by the wavelet packet is taken as a feature, and then the pattern recognition of the fault is performed according to the feature. The main steps of extracting features of vibration signals are as follows.
(1) Wavelet Packet Decomposition is performed. The same wavelet base is selected from the noisereduced signals, and the noise-reduced signal is decomposed by 'N' layer. The component characteristics of the 2 N bands of the decomposed N th layer (j = 0, 1, 2, ..., 2 N − 1) is picked, where 'j' is the number of nodes. (2) The signals of 2 N frequency band components are selected for reconstruction. S Nj represents the reconstructed signal of X Nj , the total signal S can be expressed as The energy value is normalized in order to facilitate subsequent data processing: where the vector T ′ is a feature vector after the normalization process.
As is shown Figure 2, the wavelet packet energy value feature is extracted for the noise-reduced vibration signal. The input speed of the rolling bearing is about 1800 r/min, and its energy value is mostly in the range of 0-2500 Hz. Four Wavelet Packet Decomposition layers are selected, and the first 8 wavelet packet components from the low frequency to the high frequency in the fourth layer are selected to extract the energy values.

BP Neural Network Optimized by Beetle Algorithm
BP Neural Network which is suitable for the identification of complex patterns, such as multiple symptoms and multiple faults, has strong self-learning, self-adaptation and associative memory capabilities. The BP Neural Network is a single-point search algorithm that does not have the global search ability, on account of using the error function gradient descent method ( Figure 3). Therefore, in the process of learning, BP Neural Network has the disadvantages of poor network performance, poor robustness, slow convergence speed and easy to fall into local extreme. The principle of particle swarm optimization algorithm is relatively simple, and easy to understand. At the same time, the information of speed and position can be updated continuously to correct the search direction, until the global optimal solution is found. The particle swarm optimization algorithm can use the mutual cooperation between individuals and the way of competing to achieve the purpose of global search, which greatly reduces the probability of falling into the local optimal solution. The BP Neural Network optimized by Particle Swarm Algorithm [23,24] combines the advantages of the two above, and then fully utilizes the strong global search ability of the particle swarm optimization algorithm to optimize the initial weight and threshold of the neural network. Similar to the particle swarm optimization algorithm for BP Neural Network, Beetle Algorithm [25,26] can also be combined with BP Neural Network (Figure 4). First, the global search capabilities of the Beetle Algorithm are used to optimize the initial weights and thresholds of the neural network. Then the BP algorithm is used to update the initial weights and thresholds, and the final neural network is obtained. Compared with the Particle Swarm Algorithm, the Beetle Algorithm is much simpler, which greatly

Experimental Demonstrations
The raw data used in the fault simulation are from the rolling bearing failure comprehensive test bench of Case Western Reserve University in the United States, which is shown in Figure 5. Acceleration sensors are used to collect vibration acceleration signals of the rolling bearing under different working conditions. The  test platform mainly includes a 2-horsepower motor, a torque sensor, a power meter and some electronic control equipment. In the experiment, an acceleration sensor was used to collect vibration signals, and the sensor was placed on the motor housing by using a magnetic base. The acceleration sensor is mounted on the drive end of the motor housing. The speed of the motor is 1797 r/min. Vibration signals were collected via a 16-channel DAT recorder and processed later in the MATLAB environment. The sampling frequency of the digital signal is 12000 S/s, the sampling time is set to 20 s.

Description of Test and Training Samples
The first set of experiments is the simplest experiment.  Table 1.

Visualization of Characteristic Data
In order to display the extracted energy features, the K-means clustering method is used to cluster the samples, and the generated 2D elements are drawn as a scatter diagram in Figure 6. Different colors represent different categories. It can be seen from the figure that the wavelet packet energy features can produce well clustering performance. Among them, the third type of data points are scattered around the center point in the figure,  which may influence the diagnostic effect of the model. But overall, each feature set can be clearly distinguished.

Diagnosis Results of Model
In order to test the performance and effect of the BAS-BP model, different sample sets were set up for experiments. The results is shown in Table 2. The model can diagnose a single sample set with a probability of 100%. However, after the data set is mixed, the diagnosis result has dropped significantly. When mixing the three data sets, the test results were 99.44%, 96.5% and 96.2%. It can be concluded that there is a coupling effect between sample set 3 (broken and wear) and sample set 4 (broken tooth), which affects the fault diagnosis result. Overall, the diagnostic efficiency of the model can reach 95%, indicating that the model has high diagnostic performance. Figure 7 shows the error iteration curve of each data set training.

Description of Test and Training Samples-Inner ring
In  Table 3.

Visualization of Characteristic Data
These data sets mainly describe the bearing inner ring faults of different sizes, so the coupling interference between the data sets is relatively large, which may greatly interfere with the diagnostic effect of the model. The data set is visualized through K-means clustering, and the data point aggregation is shown in the Figure 8. It can be seen from the Figure 8 that the aggregation effect of each data set is not very satisfactory. It can be analyzed that the diagnostic efficiency for this data set will have a certain drop.

Diagnosis Results of Model
Similarly, set up different sample sets to test model performance. The results are shown in Table 4. The diagnosis result of a single data set is still very accurate, while the diagnosis effect of the mixed data set has been reduced to a certain extent. Even if there is coupling interference between data sets 2 and 3, the overall diagnosis rate can still reach 89.17%, which shows that the model has a certain degree of anti-interference. The error iteration curve is shown in Figure 9.

Diagnosis Performance and Comparison
In order to further study the model performance, compare the constructed model with the network model optimized by stochastic gradient descent method and   Particle Swarm Algorithm. The comparison is made from the diagnosis rate, update speed and the number of iterations to reach the minimum fitness value. The result is shown in Figures 10, 11. The overall diagnosis rate of the BAS-BP model is significantly higher than the stochastic gradient descent method and Particle Swarm Algorithm. However, the stochastic gradient descent algorithm randomly selects a certain number of samples for training in each iteration, resulting in unsatisfactory training results and easy to fall into local extreme values, which greatly affects the    performance of the model. In the second sample set test, its diagnostic performance almost failed. On the contrary, the model optimized by the BAS algorithm and the Particle Swarm Algorithm has a certain degree of anticoupling. In the case of aliasing of samples in the sample set, the diagnostic level of the model can still be maintained above 85%. Comparing the error iteration curves of the two algorithms, the declining speed and convergence speed of the BAS curve are significantly faster than the PSO curve, and in the actual training process, the PSO takes significantly longer than the BAS algorithm. Therefore, it can be concluded that the BAS-BP model bearing fault diagnosis has anti-coupling properties, and can quickly and accurately realize fault identification.

Conclusions
(1) A fault diagnosis model based on BP Neural Network optimized by Beetle Algorithm is proposed to identify bearing failure status. The Wave Packet Energy feature are obtained by Wave Packet Transform learned from different frequency bands of signals and fed into the supervised model for health condition recognition. (2) A visual study on the extracted features has been carried out. The study shows that the wavelet packet energy feature can accurately distinguish different working conditions and can be used to evaluate the health status of bearings. (3) This article implemented two experiments to test the performance of the BAS-BP model, including diagnostic rate and anti-interference ability. The results show that the BAS-BP model can improve the problem of BP easily falling into local extreme values and identify bearing faults in a shorter time with higher accuracy. At the same time, the coupling interference between data can be avoided to a certain extent.