Method for Fault Feature Selection for a Baler Gearbox Based on an Improved Adaptive Genetic Algorithm

The performance and efficiency of a baler deteriorate as a result of gearbox failure. One way to overcome this challenge is to select appropriate fault feature parameters for fault diagnosis and monitoring gearboxes. This paper proposes a fault feature selection method using an improved adaptive genetic algorithm for a baler gearbox. This method directly obtains the minimum fault feature parameter set that is most sensitive to fault features through attribute reduction. The main benefit of the improved adaptive genetic algorithm is its excellent performance in terms of the efficiency of attribute reduction without requiring prior information. Therefore, this method should be capable of timely diagnosis and monitoring. Experimental validation was performed and promising findings highlighting the relationship between diagnosis results and faults were obtained. The results indicate that when using the improved genetic algorithm to reduce 12 fault characteristic parameters to three without a priori information, 100% fault diagnosis accuracy can be achieved based on these fault characteristics and the time required for fault feature parameter selection using the improved genetic algorithm is reduced by half compared to traditional methods. The proposed method provides important insights into the instant fault diagnosis and fault monitoring of mechanical devices.


Introduction
The selection of fault feature parameters is a common problem in fault diagnosis and monitoring. There is increasing evidence that the fault diagnosis rate of a baler gearbox is affected by the selection of fault characteristic parameters. As a relatively new type of agricultural equipment, the maintenance of continuous-operation balers requires professional equipment, parts, and technical personnel, which further increases the time required for maintenance and delays agricultural production [1]. Therefore, we conducted a study to obtain the minimum characteristic parameter set that can effectively diagnose different fault types in baler gearboxes. Figure 1 presents a self-propelled straw baler and its gearbox.
Analyzing vibration signals in the time and frequency domains and using the obtained feature parameters to identify faults are commonly used fault diagnosis methods [2][3][4][5][6][7]. Tang et al. [8] used the time-domain signal statistical index as a feature to process the vibration characteristics of a gearbox and combined it with the fast Fourier transform algorithm to determine fault locations. Barszcz et al. [9] analyzed the impact signal of a gearbox using kurtosis as a characteristic parameter and realized the fault diagnosis of a planetary gearbox. Long et al. [10] proposed an adaptive tunable Q-factor wavelet transform algorithm that introduces envelope spectral entropy as a fault feature parameter based on spectral kurtosis to measure the signal pulse strength and periodicity, and experimental results demonstrated that this method can

Open Access
Chinese Journal of Mechanical Engineering  35:45 effectively realize bearing fault diagnosis. Hou et al. [11] used globally optimized sparse coding and approximate singular value decomposition to extract the weak fault features of rolling bearings and realize fault diagnosis. Zappald [12] proposed the use of the sideband power factor as a standard for evaluating the gear state and realized automatic diagnosis. Considering the redundant components of the vibration signals of a gearbox and numerous fault characteristic parameters, it is difficult to select truly effective and concise fault characteristic parameters. The feature parameter set contains many feature parameters that are insensitive and redundant as fault features. Too many feature parameters increases the complexity of decision rules and difficulty of fault diagnosis. Therefore, it is necessary to extract a set of meaningful parameters from a large number of fault feature parameters to make the set as small as possible and more convenient for the decision making process [13]. The rough set theory is an attribute reduction theory proposed by Pawlak, a Polish scholar [14]. Classical rough set attribute reduction generally adopts the method of determining a kernel set [15]. On this basis, Hu et al. [16] proposed a discernibility matrix method using classical rough set theory to complete the calculations of the attribute reduction process. Xu et al. [17] improved the traditional equivalence class division method using the cardinality sorting algorithm, which was proven to be beneficial for large-scale decision making in real applications. The above methods, which are based on the classical rough set theory, have significant defects. Attribute reduction is a nondeterministic polynomial complete problem and the calculations of the classical attribute reduction algorithm are very complex and inefficient when the decision table contains massive data and attribute sets. To address this challenge, Chen et al. [18] proposed an attribute reduction algorithm based on granular computing and obtained decision rules with greater generalization ability. Ganter et al. [19] proposed a formal background attribute reduction method using reducible objects and attributes. Qu et al. [20] proposed a method for feature selection based on a support vector machine (SVM) and performed accurately feature classification. However, the SVM algorithm has the disadvantage of requiring training for multiple data types. Liu et al. [21] realized attribute set reduction using a genetic algorithm and effectively analyzed road traffic accidents. Compared to the methods of finding a kernel set and discernibility matrix, the genetic algorithm provides a method for attribute reduction without requiring prior information, and calculation efficiency is improved when dealing with big data. However, to the best of our knowledge, improper parameters for a genetic algorithm will cause the algorithm to fall into local optimal solutions, which deteriorates the effectiveness of optimization.
To address these problems directly, we proposed and improved version of the genetic algorithm, which is applied to attribute reduction. In our method, redundant information in the fault feature parameter set is eliminated, and fault diagnosis is realized according to a decision rule table obtained through reduction, which provides a scientific and effective method for fault feature selection and diagnosis.
The remainder of this paper is organized as follows. In Section 2, the theory and methodology of fault feature selection are presented. A fault diagnosis experiment on a baler gearbox is presented in Section 3. Finally, our conclusions are summarized in Section 4.

Attribute Reduction Based on an Improved Genetic Algorithm
Attribute reduction is a method for eliminating redundant information while maintaining classification ability. Currently, the rough set is a commonly used method for attribute reduction. It uses concepts such as indiscernible relationships and positive domains to perform traversal analysis and judge different attributes to eliminate unnecessary attributes in a decision system [22][23][24]. The traditional reduction algorithm is inefficient, and the results of reduction are limited by prior information. As a heuristic intelligent algorithm, the genetic algorithm [25][26][27] can improve the efficiency of reduction and is not restricted by prior information. In our improved genetic algorithm, the dependence of decision attributes on conditional attributes is used as a fitness function and different combinations of condition attributes are used as genetic populations. Additionally, the simplest and most important condition attributes for decision attributes are obtained through selection and cross-mutation genetic operators. The structure of the genetic algorithm is schematically illustrated in Figure 2.

Theory of Improved Genetic Algorithm
In the genetic algorithm, the crossover probability p c and mutation probability p m have a significant impact on algorithm results. The values of p c and p m are static in the standard genetic algorithm. According to this algorithm principle, if the value of p c is too large, the algorithm will generate new individuals quickly, but individuals with high adaptability will be destroyed. If the value of p m is too small, then evolution speed will be slow. Additionally, if the value of p m is large, then the algorithm convergence speed will be slow. In contrast, new individuals will be generated slowly and population diversity will be reduced if the value of p m is small [28]. To address these issues, Cao et al. [29] proposed an adaptive genetic algorithm that introduces calculation formulas for the values of m and c according to the fitness value of each individual as follows: where F max is the maximum fitness value of the population, F avg is the average fitness value of the population, F ′ is the larger fitness value among two crossed individuals, and F ′′ is the fitness value of mutated individuals. C 1 , C 2 , C 3 , and C 4 are parameters greater than zero and less than one.
The adaptive genetic algorithm adjusts the mutation and crossover probabilities according to individual fitness values, which improves the iteration speed of the algorithm. However, it does not consider the diversity and dispersion of the entire population [30][31][32]. Additionally, in the adaptive genetic algorithm, if the fitness value is close to or reaches the maximum value, then the mutation probability p m will be close to zero. In this case, the ability to generate new individuals in the early stages of the algorithm is reduced and it easily falls into local optima [33]. Given the above two problems, we propose the following new calculation equations for the crossover and mutation probabilities of individuals whose fitness values are greater than F avg in Eq. (1): represent the adaptive control parameters. ω 1 and ω 2 are the weights of the influence of the population dispersion and individual fitness on the crossover probability and ω 1 + ω 2 = 1. When ω 1 is zero, only the influence of the individual fitness value on the crossover probability is considered. When ω 1 is one, only the population dispersion on the crossover probability is considered. The parameters for population dispersion and individual fitness are the same as those in Eq. (1). In Eqs. (3) and (4), represents the dispersion degree of the population. From these equations, one can see that if the population tends to be discrete, then the crossover probability increases and the mutation probability decreases to improve the ability of the population to develop excellent individuals. In contrast, when the crossover probability decreases, the mutation probability increases and the ability of the population to produce new individuals increases.

Selection, Crossover, and Mutation in the Improved Genetic Algorithm
Selection, crossover, and mutation are the core operations of the genetic algorithm. The roulette gambling method is adopted for the selection of individuals in the population. First, the selection probability of individuals is set according to the fitness function value such that an individual with a larger fitness function value is more likely to be selected. The candidate individuals for (3) crossover and mutation are then selected from the initial population according to roulette gambling. The fitness function used to evaluate the quality of an individual is key to selection from the population. In the process of attribute reduction using a genetic algorithm, the decision attribute has the greatest dependence on the reduced condition attribute set and the condition attribute set is minimized. The fitness function was defined as follows: where L r is the sum of the different digits in an individual's chromosomes. If additional attributes are included in the reduced attribute set, then the greater the value of L r , the lower the fitness F. γ c (D) is the dependence of decision attribute D on condition attribute C, which is the importance of the condition attribute set [34]. In Eq. (6), Selected individuals from the population are randomly matched. A random selection of nodes in a pair of chromosomes must be located at the same position as the pair. For each node in the crossover process, there is a certain probability of replacement with a node in the same position as the paired chromosome. Mutation is the inversion of the binary code of individual chromosomes in the population. For candidate mutant individuals, each point has a certain probability of mutation in a chromosome. The probabilities of crossover and mutation are obtained using Eqs. (3) and (4), respectively.
A new population is generated through selection, crossover, and mutation. According to the optimal conservation strategy [35], the optimal individuals in the parent population are copied directly to the offspring population, replacing the individuals with the lowest adaptability among the offspring. The population size is kept constant.

Attribute Reduction Simulation Experiment
To test the effectiveness and feasibility of attribute reduction based on the improved genetic algorithm, it was used to analyze the basic features of signals and identify their categories. Harmonic, superposition, and noise signals were the objects to be recognized. The periodicity, maximum value, frequency component, and mean value were considered as recognition features. The simulated harmonic signal {x 1 , x 2 , x 3 }, harmonic superimposed signal {x 4 , x 5 , x 6 }, and noise signal {x 7 , x 8 , x 9 } were defined as follows: where x j is the amplitude of the simulated signal, and t is the time domain of the simulated signal. From Figure 3, it can be inferred that if the simulated signal is periodic, it sets the attribute feature C1 = 1. Otherwise, C1 = 0. If its maximum value is greater than 2.5, then C2 = 1. Otherwise, it is zero. If only one frequency component is included, then C3 = 1. Otherwise, it is zero. if the overall mean value of the signal is zero, then C4 = 1. Otherwise, it is zero. The initial decision table is provided in Table 1.
As shown in Table 1, C1, C2, C3 and C4 are the features of the signals. The numbers 1, 2 and 3 in column D represent the harmonic, superposition, and noise signals, respectively. After establishing the decision information table, the adaptive genetic algorithm and improved genetic algorithm were used to reduce the attributes of the decision table.
The first step in this process is to generate several binary individuals with chromosome length L randomly. An individual L chromosome represents L attributes in the decision table. If a chromosome is encoded as one, then the corresponding attribute of the chromosome is reserved. Otherwise, it is removed. For example, for a decision system with a conditional attribute C = {C1, C2, C3, C4}, chromosome {1100} represents an attribute set composed of C1 and C2.
MATLAB was used as the simulation software to program the adaptive genetic algorithm and improved genetic algorithm. The initial population size for the algorithms was set to five, and the probability control parameters of the adaptive genetic algorithm crossover and mutation were set as k 1 = 1, k 2 = 1, k 2 = 0.1, k 4 = 1. Performance can be improved by the genetic algorithm control parameter:k 1 = 0.2, k 2 = 0.8, k 3 = 0.02, k 4 = 0.08.
The iteration termination conditions were set as follows. ① The number of iterations is 50; ② The fitness improvement is lower than the set threshold for every iteration. If the genetic algorithm satisfies either of these two termination conditions, then the optimal individual (7) x 4 = sin(3t) + 2sin(t+26), x 5 = sin(t) + 1.5sin(0.5t+17) + 0.5sin(2t+75), x 6 = sin(t) + 2.5sin(t) + sin(t+16), x 7 = rand(1, N),  35:45 in the current population represents the optimal solution. The adaptive genetic algorithm and improved genetic algorithm were used for attribute reduction and the iterative process is illustrated in Figure 4. In Figure 4, the red "*" curve represents the optimal value searching iteration process of the adaptive genetic algorithm. The red "o" curve represents the average value obtained by searching through the iteration process of the adaptive genetic algorithm. The blue "*" curve represents the optimal value searching iteration process of the improved genetic algorithm. The blue "o" curve represents the average value searching iteration process of the improved genetic algorithm. As shown in Figure 4, in terms of optimal value optimization, the results of the two algorithms are approximately the same, and the optimal solution is obtained in the third iteration of each algorithm. The population average fitness of the improved genetic algorithm is optimized at the fifth iteration and that in the adaptive genetic algorithm is optimized in the eighth iteration. The optimization result of both algorithms is {1010} and the decision rule in Table 2 can be obtained by deleting repeated elements. Figure 4 and Table 2 indicate that there is no significant difference between the two algorithms when the number of simulation experimental data is small. However, the population evolution speed of the improved genetic algorithm is significantly higher than that of   the adaptive genetic algorithm. Attribute reduction can be performed based on the adaptive genetic algorithm, and the improved genetic algorithm can effectively realize the identification signal, which proves the effectiveness and feasibility of the proposed method

Feature Selection Experiment on a Baler Gearbox
The main method of fault diagnosis for baler gearboxes is to extract the fault features of a fault signal and then judge the occurrence and type of the fault according to the fault feature values. Different feature parameters exhibit different sensitivities and correlations for different fault types. In the past, for fault diagnosis, a variety of fault feature parameters have generally been used for comprehensive analysis, and the selection of feature parameters has lacked a unified standard. In this study, an improved genetic algorithm was used for attribute reduction to extract the minimal and most effective feature parameters for gearbox fault diagnosis.

Fault Feature Extraction
The test object we selected was a prototype of a selfpropelled wheat straw baler developed by the project team, and the acquisition object was the vibration signal of the baler gearbox when the gear was in the states of broken teeth, gear wear, no-fault, and inner ring wear, and outer ring wear. The hardware used for signal acquisition consisted of an INV982X acceleration sensor and INV3810CT acquisition instrument from the China Orient Institute of Noise & Vibration. The software used was DASP. Figure 5 presents the arrangement of the acceleration sensor on the baler gearbox. The sensor is driven by the acquisition instrument, and the acquisition signal is stored by the acquisition instrument and transmitted to the host computer. The acquisition instrument is controlled by the host computer DASP software and the control interface (Chinese interface) is presented in Figure 6. The vibration signals of the gearbox operating at three different speeds were collected separately under no load. After data acquisition and sorting, the vibration signals of the different fault types in the same working state were selected as signals for diagnosis. Multiple sets of signals were selected for each fault type to extract fault feature values. For each signal type, 43 sets of feature values were extracted, 10 of which were selected as reduction data, and 40 of which were selected as testing data. In the time domain, the mean value, maximum value, peak value, effective value, root-mean-squared value, squareroot amplitude, skewness, kurtosis, margin, and kurtosis indicators were selected as feature parameters. In the frequency domain, the power center of gravity and power spectrum dispersion were selected as feature parameters [36]. Table   The initial decision table is presented in Table 3, where C1 to C12 are the conditional attributes, which are the feature values of each fault extracted from the gearbox vibration signal. D is the decision attribute, where 1 represents the broken tooth fault, 2 represents no failure, 3 represents the bearing outer ring wear fault, 4 represents the bearing inner ring wear fault, and 5 represents the gear wear fault. The data in Table 3 have two decimal places and the scientific counting method is adopted for large datasets.

Establishing a Decision
From the initial decision table, one can see that the values of the same attribute in different instances are on a continuum. For example, attribute C1 (mean) is a group of evenly distributed data points from 4.18 to 5.49. According to attribute C1, the domain can be divided into 10 sets {{X1, X4}, {X2}, {X3} {X5, X6}, {X7, X13}, {X8}, {X9, X15},  {X10}, {X11, X12}, and {X14}}. With an increase in domain size, the number of groups increases, which increases the difficulty of data analysis. To facilitate data processing and improve the efficiency of fault diagnosis, this study used the semi-naive scalar discrete algorithm [33,34] to discretize the condition attributes C1 to C12 in Table 3. The decision table following discretization is presented in Table 4.

Attribute Reduction
Following discretization of the decision table, the adaptive genetic algorithm and improved genetic algorithm were used to reduce redundant attributes. First, we initialized the genetic population and used the fitness function defined in Eq. (5). After initialization, the termination conditions for algorithm iteration were set as follows: ① reaching the maximum number of iterations, where this study set the maximum number of iterations to 150, and ② the fitness value has not improved in the past 15 iterations. The setting of the probability control parameters for crossover and  mutation for the two algorithms was the same as that in the simulation experiment. The iterative reduction process is illustrated in Figure 7. In Figure 7, the red "*" curve represents the optimal value searching iteration process of the adaptive genetic algorithm. The red "o" curve represents the average value obtained by searching through the iteration process of the adaptive genetic algorithm. The blue "*" curve represents the optimal value searching iteration process of the improved genetic algorithm. The blue "o" curve represents the average value searching iteration process of the improved genetic algorithm.
As shown in Figure 7, the optimization speed for the attribute reduction of the improved genetic algorithm is significantly higher than that of the adaptive genetic algorithm. The improved genetic algorithm converges to the highest fitness value faster than the adaptive genetic algorithm and the average fitness of the entire population is approximately 1.9. The optimal solution of the adaptive genetic algorithm is obtained in the 40th iteration and the average fitness of the population is approximately 1.8. The improved genetic algorithm has clear advantages over the adaptive genetic algorithm in terms of convergence speed.
The improved genetic algorithm continues to cross, mutate, and select over time. After several iterations, the optimal solution is {00000101000}, and the minimal decision attribute set is {C6, C7, C9}. For the attributes in Table 5, only {C6, C7, C9} are reserved, and redundant instances with the same attribute values are removed. The resulting decision rules are presented in Table 5.
According to Table 5, if the discrete set of the squareroot amplitude, skewness, and margin of a vibration signal is {110}, then the gear teeth are judged to be broken. If the discrete set is {011} or {001}, then the gear teeth are considered to be worn. Table 4 Discrete decision table of fault features   U  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12  D   X1  1  0  1  1  1  1  1  1  0  0  0  1  1   X2  1  1  1  1  1  1  1  1  0  0  0  1  1   X3  1  1  0  1  1  1    Note 1: The optimal solution obtained through attribute reduction is not unique. The optimal solution obtained by the adaptive genetic algorithm in this study is different from that of the improved genetic algorithm, and the optimization result of the improved genetic algorithm is not unique. Different reduction results do not interfere with the final diagnosis result in theory. For example, by repeating the above reduction experiment, we can obtain a new minimal feature parameter set {C1, C7, C9}, which can also be used as a fault diagnosis knowledge base.

Validation of Decision Rules
For the decision rule table obtained by the improved genetic algorithm and rough set reduction, the validity of the rule table was verified through testing. For each fault type, 40 groups of data were selected as testing data. Three fault features of square-root amplitude, skewness, and margin were extracted from the testing data and   Figure 8a and b presents the results of fault diagnosis based on the minimal characteristic parameter sets {C6, C7, C9} and {C1, C7, C9}, respectively. In Figure 8, one can see that the fault diagnosis method based on genetic algorithm attribute reduction can effectively identify the type of fault. However, in Figure 8a, there is one misjudgment in the normal gearbox signal and two misjudgments in the bearing inner-ring wear signal. Similar misjudgments can be observed in Figure 8b. For misjudgment in the diagnosis process, considering that the number of signal groups for different fault types retained during reduction is small and that the resulting decision table knowledge base is not complete, it is reasonable to observe a misjudgment in the diagnosis process.
Therefore, based on the original experiment, the number of reduced signal groups was increased from three to six, the experimental parameters were unchanged, and the above experiment was repeated, as shown in Figure 9.  One can see that the correct diagnosis rate based on the new decision rule table reaches 100%, which further proves that the proposed method can accurately and effectively determine the occurrence and type of gearbox failures in a baler. According to the above fault diagnosis experiment on a self-propelled baler gearbox, it has been proven that the improved genetic algorithm is more efficient at attribute reduction and fault feature extraction, and can obtain the minimum feature parameter set. The decision table obtained through reduction can accurately diagnose different fault types of the gearbox.

Conclusions
The following conclusions can be drawn from the research presented above.
(1) A novel method for fault feature selection based on an improved adaptive genetic algorithm for attribute reduction was proposed to obtain fault characteristic parameters accurately without prior information. (2) A comparative experiment on the adaptive genetic algorithm was designed based on the fault data of a baler gearbox and it was observed that the improved algorithm can obtain optimization results faster, which proves the significance of the improved algorithm in terms of improving the efficiency of fault diagnosis. (3) Further verification experiments using data on strapping machine gearbox failure were conducted, and the population average fitness of the adaptive genetic algorithm reached approximately 1.8 at the 40th iteration, whereas that of the improved genetic algorithm reached 1.9 at the 25th iteration. (4) Fault diagnosis results indicated that the fault diagnosis accuracy of the baler gearbox based on the proposed method reached 100%. In other words, fault feature selection was completed effectively without a priori information, and fault diagnosis was realized based on the selection results, which proves that the proposed method can realize fault feature selection quickly and effectively.