 Original Article
 Open Access
 Published:
Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions
Chinese Journal of Mechanical Engineering volume 34, Article number: 13 (2021)
Abstract
Gear fault diagnosis technologies have received rapid development and been effectively implemented in many engineering applications. However, the various working conditions would degrade the diagnostic performance and make gear fault diagnosis (GFD) more and more challenging. In this paper, a novel model parameter transfer (NMPT) is proposed to boost the performance of GFD under varying working conditions. Based on the previous transfer strategy that controls empirical risk of source domain, this method further integrates the superiorities of multitask learning with the idea of transfer learning (TL) to acquire transferable knowledge by minimizing the discrepancies of separating hyperplanes between one specific working condition (target domain) and another (source domain), and then transferring both commonality and specialty parameters over tasks to make use of source domain samples to assist target GFD task when sufficient labeled samples from target domain are unavailable. For NMPT implementation, insufficient target domain features and abundant source domain features with supervised information are fed into NMPT model to train a robust classifier for target GFD task. Related experiments prove that NMPT is expected to be a valuable technology to boost practical GFD performance under various working conditions. The proposed methods provides a transfer learningbased framework to handle the problem of insufficient training samples in target task caused by variable operation conditions.
Introduction
Gear has been used extensively in transmission system due to its large velocity ratio, strong bearing capacity, compactness and high efficiency [1‒4]. Gear fault diagnosis (GFD) also becomes one of the most important research hotspots from both industrial and academic communities for ensuring the safe and efficient operation of gear transmission system. Till now, with the development of sensing methods (e.g., vibration, rotor speed, acoustic signal and others), datadriven methods [5, 6], which are based on analyzing measured data without need of a deep understanding of the mechanical drive systems, have become more and more attractive and been proved to be valid in the field of gear fault diagnosis. Generally, there are two steps in datadriven method: (1) constructing a classification model based on sampled data, and (2) using the welltrained model to predict the mechanical fault type. In many existing researches, the fault diagnostic task can be treated as a problem of pattern recognition, which usually is composed of two technical processes: (1) feature extraction, and (2) fault recognition. The purpose of feature extraction is to obtain lowdimensional fault descriptors from highdimensional vibration data. There are many advanced signal processing methods that have been proposed to provide cognizable features, such as wavelet transform (WT) [7], principal components analysis (PCA) [8], singular value decomposition (SVD) [9], empirical mode decomposition (EMD) [10], etc. Then, conventional machine learning methods (e.g., extreme learning machine, support vector machine and neural network) are employed to build a gear fault diagnostic model. However, these conventional methods usually work for GFD under constant speed and load conditions, thus having weak generality when facing with variable working conditions. Generally, gears are often working under timevarying operating conditions, for example, the running states of gas turbines or wind power generators change very often while working, and the operation parameters of planetary gearbox may vary correspondingly, thus inevitably resulting in a consequence that the extracted features in one time period might be different from those in the next time period. More importantly, the identical and independent distribution (IID) between training data and test data is required to ensure effective implementation of these conventional machine learning methods. Recently, these problems have aroused researchers’ interest and received intensive attentions. For example, Song et al. [11] developed a new singular value decomposition interpolation (SVDI) based signal processing method, in which the timedomain and frequencydomain characteristic matrices extracted from vibration signals under discrete working conditions were firstly decomposed into singular vectors, rotation matrices and characteristic means with SVD, then these three parts were interpolated to reconstruct the target eigenmatrix for data augmentation. Han et al. [12] utilized empirical mode decomposition (EMD) to decompose vibration signals into several intrinsic mode functions (IMF), and extracted feature vectors that consist of time domain indexes, frequency domain indexes, energy domain characteristic parameter and fractal box dimension from the selected IMF to investigate the dynamic feature of vibration signal accurately and improve the robustness of feature vectors under different loads for GFD. Meanwhile, Zhao et al. [13] designed a synchrosqueezing transform (SST) and deep convolutional neural network (DCNN) based method for gearbox fault classification under varying operation conditions, where a new index, the envelope timefrequency representation (TFR), was calculated by using SST, then DCNN was adopted to dig underlying features of the TFRs and determine the fault type of planetary gearbox automatically. In general, most of these methods can achieve good results by exploring advanced feature extraction methods or building a complex network classifier, but they rely on sufficient labeled training dataset normally, which could degrade performance when facing with insufficient data. However, only a few number of labeled samples collected for training probably exist in many realworld applications, which hinder the promotion of these methods greatly.
Therefore, how to train a robust model with high accuracy under limited labeled data is important. Recently, transfer learning (TL), a fastgrowing filed of machine learning, has been emerging due to its knowledge transfer ability [14]. To be delighted, the amount of labeled target data (termed as target domain, TD) maybe small, but there are still plenty of relevant data which can be obtained in machine industry from another time period (e.g., under another speed and load) or adjacent components (termed as source domain, SD). By utilizing the TL technology, useful information can be extracted from existing or previous task to boost the learning efficiency of target task. The model parameter transfer (MPT), one of the transfer learning architectures, is an effective tool to transfer the shared parameters or prior distributions of hyperparameters. Recently, most of these approaches are designed to work for multitask learning (MTL). For example, Lawrence et al. [15] succeeded in learning parameters from multiple tasks through the shared Gaussian process (GP) prior. Bonilla et al. [16] proposed a GPbased model to learn the shared model knowledge over tasks. Schwaighofer et al. [17] succeed in learning multitasks by utilizing the combination of hierarchical Bayesian framework (HB) and GP. Besides, Evgenious et al. [18] proposed a new algorithm by referencing HB idea to solve multitask learning in the frame of support vector machine (SVM). All these methods can be easily modified for TL. Strictly speaking, MTL tries to learn different tasks jointly and simultaneously, while TL prefer to improve the performance of TD task with the help of knowledge extracted and stored from SD data. Comparison between MTL and TL is shown in Figure 1. Intuitively, we may minimize the difference in parameters of classification hyperplane between TD and SD to transfer the knowledge obtained from SD, so that a robust GFD model with better performance in TD can obtained.
According to the above analysis, a novel model parameter transfer (NMPT) approach, which aims at excavating and further transferring the shared characteristic parameters of hyperplane for the problems of insufficient labeled training samples and nonIID between source and target domains, is developed to assist target gear fault identification using source domain data. Specifically, on this basis of controlling the empirical risk of source domain, the proposed method further integrates the advantage of the conventional MPT and TL together, which can be concluded that: (a) the least square support vector machine (LSSVM) based MPT can characterize the shared and domainspecific parameters over tasks; and (b) the idea of TL is introduced to dig and extract transferable knowledge and to minimize the distributional discrepancies between source and target domains. To sum up, the novelties and main contributions of this paper can be summarized as:

Based on controlling the empirical risk of source domain features in LSSVM framework, an improved TL model is proposed by further minimizing the discrepancies of separating hyperplanes between source and target domains, and then transferring both shared and domainspecific parameters over tasks to make use of source domain data to assist target diagnostic task;

The model parameter transfer idea is innovatively introduced to the area of gear fault diagnosis, which provides a new idea for gear fault diagnosis under variable working conditions, especially when sufficient training data from target domain are not available.
The rest of this paper is organized as follows. In Section 2, the theoretical background is briefly presented. Section 3 concentrates on introducing details of the proposed NMPT method and then gives the whole framework of GFD. Section 4 illustrates the experimental study and proves that NMPT can achieve good results in GFD under variable working conditions. Finally, some conclusions drawn from this paper are listed in Section 5.
Theoretical Background
This study is going to leverage the NMPT model under LSSVM framework for GFD. Therefore, in this section, the fundamental theory of LSSVM as well as its improvement for MTL are briefly reviewed.
Least Squares Support Vector Machine (LSSVM)
First, the basic principle of training a SVMbased model for classification problem is to find the optimal separating hyperplane (f = w*φ(x) + b) in a reproducing kernel Hilbert space (RKHS) [19]. According to structural risk minimization (SRM) principle, the optional w and b can be obtained by minimizing the following function:
where C is positive real regularized parameter, w is weight vector defining the orientation of separating hyperplane, R represents structural risk, R_{emp} denotes loss function which controls the error of separating hyperplane f on training data, and different kinds of R_{emp} can contribute to different forms of SVMs. By utilizing squared error function, the SRM problem in LSSVM is to compute the optimal decisionmade separating hyperplane according to the vector x and its label y∈{−1,+1} by minimizing the following function with a constraint, which can be formulated as:
where e_{i} is error function, φ(·) denotes a transform function that maps the input features x into RKHS, b is a bias term, N indicates the total number of training samples. Then a classification hyperplane f = w*φ(x) + b is constructed for this task.
MultiTask LSSVM (MTLSSVM)
Given m learning tasks, the MTL aims to learn all tasks simultaneously rather than individually. Let each task ∀i∈m, we have n_{i} training samples \(\left\{ {{\varvec{x}}_{i,j} ,y_{i,j} } \right\}_{j = 1}^{{n_{i} }}\), thus the total number of training samples is \(N = \sum\nolimits_{i = 1}^{m} {n_{i} }\).
Based on the regularization framework and hierarchical Bayesian framework, some researchers assumed that all w_{i} can be rewritten as w_{i} = w_{0} + v_{i}, where w_{0} (playing the role of mean vector) and v_{i} carry the information of commonality and specialty over tasks [20, 21], respectively. That is to say, when m learning tasks are analogous to each other, the vectors v_{i} tend to be “small”, otherwise, the vector w_{0} tends to be “small”. To this end, the following optimization problem which is similar to LSSVM for single task is solved to estimate all v_{i} as well as w_{0} simultaneously:
where C and λ are positive real regularized parameters, \({\varvec{b}} = \left\{ {b_{1} ,\;b_{2} , \cdots ,\;b_{m} } \right\}^{\text{T}} ,\) \({\varvec{e}}_{i} = \left\{ {e_{i,1} ,\;e_{i,2} , \cdots ,\;e_{{i,n_{i} }} } \right\}^{\text{T}} ,\) \({\varvec{Z}}_{i} = \left\{ {\varphi ({\varvec{x}}_{i,1} )y_{i,1} } \right.,\) \(\varphi ({\varvec{x}}_{i,2} )y_{i,2} , \cdots ,\;\varphi ({\varvec{x}}_{{i,n_{i} }} )y_{{i,n_{i} }} \} ,\) \({\varvec{y}}_{i} = \left\{ {y_{i,1} ,\;y_{i,2} , \cdots ,\;y_{{i,n_{i} }} } \right\}^{\text{T}} .\)
These previous works of LSSVM and MTLSSVM are not oriented to the target task where there exists the problem of insufficient training data or nonIID between training and testing data. Whereas, it is significant to derive useful information from these existed models to enhance the TD task. Therefore, different from the single task learning and multitask learning, the proposed NMPT utilizes SD data (related but different from TD) to solve target domain problems with a specific structure, which is introduced in the following section.
Proposed NMPT Framework for GFD
The proposed NMPT method via transferring the knowledge of classification hyperplane from SD to TD is presented in this section.
Basic Definition
Given SD and TD, the main purpose of NMPT can be described as: under LSSVM framework, NMPT aims to improve the performance of TD classification model f_{t} = w_{t}*x_{t} + b_{t} by using the knowledge from source domain classifiers model f_{s} = w_{s}*x_{s} + b_{s}, where the SD and TD are different but similar in some aspects. In addition, the training data is set as follows:
where Ds, Dt are SD and TD labeled data, respectively; \({\varvec{x}}_{j}^{s}\),\(y_{j}^{s}\) denote the jth feature vector and corresponding label of SD data; \({\varvec{x}}_{i}^{t}\),\(y_{i}^{t}\) denote the ith feature vector and corresponding label of TD data; Ns and Nt represent the number of SD and TD, in this paper, Nt<< Ns.
NMPT Architecture
In this section, the proposed NMPT approach is discussed. As mentioned above, the method mainly utilizes the labeled data from SD and TD to solve the target GFD problem. First, inspired by the work of multitask LSSVM framework [21, 22], we assume that the parameters, w_{t} and w_{s} form both tasks can be separated into two parts, respectively:
where w_{0} is the shared parameter, v_{s} and v_{t} are the domainspecific parameters of SD and TD tasks, respectively. Then, based on previous transfer strategy that controls empirical risk of source domain, we want to find the knowledge from w_{s} and transfer it to w_{t} ulteriorly. As enough training data can prevent the model from overfitting, parameter w_{0} from w_{s} is set as one of transfer knowledge. In addition, by minimizing the term μ v_{t}−v_{s} ^{2} during the optimization process, we can also recognize and apply knowledge of v_{s} learned from SD. Hence, to achieve this goal, an extension of LSSVM to transfer learning case is built as follows:
where w_{0} and μ v_{t} − v_{s} ^{2} are transfer learning items, Cs, Ct, λ and μ are positive real regularized parameters. An illustration that describes the diagram of NMPT is presented in Figure 2.
As less tagged target training data will cause the corresponding classification model to show some tendency towards performance degradation, the decision boundary with parameter w_{t} from target task could suffer from this problem. However, by utilizing the knowledge of w_{s} from source domain, NMPT architecture can ensure a relatively small generalization error on the target domain by mainly focusing on achieving the following goals: (1) learning a more accurate w_{0} for target domain; (2) reducing the difference of model parameters by minimizing μ v_{t}−v_{s} ^{2} (see the purple line in Figure 2). These two goals can make source domain model be applicable for target domain and ensure the leading role of Dt in building classification model for target task. In addition, by comparing eq. (2) with eq. (6), we find the NMPT model tries to make the separating hyperplane of SD be qualified for TD classification task from two aspects on the basis of SRM principle: one is to minimize the margin discrepancies of training data between SD and TD to adjust separating hyperplane, the other is to control loss function on SD data, simultaneously. All these two improvements can prove a good capability of generalization on TD.
Then, the solving process of NMPT optimization problem (c.f. Eq. (6)) is listed as follows:
First, the Lagrangian function for Eq. (6) is built as:
where a_{i} is a Lagrange multiplier. Then, according to Karush–Kuhn–Tucker (KKT) conditions, the solutions for optimality are yielded as:
where v_{t} and v_{s} can be derived as:
By eliminating w_{0}, v_{t}, v_{s} and e_{i} through substitution, one linear system can be obtained as follows:
where \({\varvec{a}} = \left[ {a_{1} ,a_{2} , \cdots ,a_{Nt} ,a_{Nt + 1} , \cdots ,a_{Nt + Ns} } \right]^{\text{T}} ,\) \({\varvec{b}} = \left[ {b_{t} ,b{}_{s}} \right]^{\text{T}} ,\)\({\varvec{Y}}_{1} = [y_{1}^{t} ,y_{2}^{t} , \cdots ,y_{Nt}^{t} ,y_{1}^{s} ,y_{2}^{s} , \cdots ,y_{Ns}^{s} ],\) \({\varvec{I}} = [1,1, \cdots ,1]_{(Nt + Ns) \times 1} ,\)\({\varvec{0}} = \left[ {0,0} \right],\) Y = blockdiag(y_{s}, y_{t}), \({\varvec{y}}_{t} = [y_{1}^{t} ,y_{2}^{t} , \cdots ,y_{Nt}^{t} ]^{\text{T}} ,\)\({\varvec{y}}_{s} = [y_{1}^{s} ,y_{2}^{s} , \cdots ,y_{Ns}^{s} ]^{\text{T}} ,\) Ω is (Nt + Ns) × (Nt + Ns) symmetric matrix \({\varvec{\varOmega}}{ = }\Omega_{0} + \Omega_{1} + \frac{1}{C}{\varvec{I}}_{Nt + Ns} ,\) Ω_{1}= blockdiag(Ω_{t}, Ω_{s}), K represents the kernel function, the detail element in Ω is defined as:
The best fit values of parameters a, b_{t} and b_{s} can be finally worked out, then the corresponding decision function can be constructed as follows:
Complete Process of NMPT Model for Gear Fault Diagnosis
In the proposed framework, an intrinsic timescale decomposition (ITD) architecture is first introduced to decompose a vibration signal into a set of proper rotation components (PRCs). Then, the energy parameter of each proper rotation component (PRC) is calculated to conduct dimensionality reduction and construct feature vectors. By structuring and solving the optimization problem of NMPT (c.f. Eq. (6)) using the learned fault representations, the parameters of NMPT model (including w_{0}, v_{s} v_{t}, b_{s} and b_{t}) can be learned simultaneously. Finally, the target data are fed into NMPT to output the predicted fault categories. Figure 3 gives the overall proposed framework for NMPTbased GFD.
Experiment and Discussion
Descriptions of Experimental Simulator and Datasets
To conduct experimental verification, the testing platform, drivetrain dynamics simulator (DDS), is shown in Figure 4. It includes driving motor, speed regulator, planetary gearbox, reduction gearbox, brake device, brake regulator. During data collection, the variety of speeds and loads can be implemented through speed regulator and brake regulator, respectively. Meanwhile, there are altogether 7 vibration sensors (model: 608A11, sample frequency: 5120 Hz) in the structure, one is mounted on the surface of motor to measure zaxial vibration signal of the motor (F1), the rest are as follows: three for planetary gearbox (F2) and three for reduction gearbox (F3). Except for the healthy gear (Healthy, C1), there are four different types of gear faults, denoted as a small piece of material breaking away from tooth (Chipped, C2), a tooth fracturing at the location of root (Missing, C3), the emergence of cracks on root cracked (Cracked, C4) and the loss of material from the contacting surface of tooth (Worn, C5). The descriptions of fault types and different experiment conditions are shown in Table 1.
Experimental Results and Analysis
Feature Extraction
Intrinsic timescale decomposition (ITD) , proposed by Frei et al. [23], is a time frequency analysis method which can adaptively decompose a given vibration signal X into a series of proper rotation components (PRCs) and a monotonous trend signal (remaining baseline signal) with low end effects and high efficiency, which can described as:
where p denotes the final decomposition level, H^{i} is the ith PRC, L^{p} is the remaining baseline signal.
Nevertheless, these obtained PRCs with ITD technology are too complex to be taken as fault vectors as inputs for conducting fault classification directly. Thus, the energies of first six level PRCs are calculated for dimensionality reduction of PRCs and fault feature design.
Experimental Study
In this part, the diagnostic performance of the proposed NMPT is first analyzed, then, in order to further demonstrate the superiority of NMPT, it is also compared with other methods:

LSSVM(nontransfer): Least squares support vector machine;

MTLSSVM (nontransfer): MultiTask LSSVM;

TCA [24]: Transfer component analysis;

DSM [25]: Domain selection machine;

ELSSVM [26]: Enhanced LSSVM
For a fair comparison, all kernelbased methods use the Radial Basis Function (RBF) as the kernel function. In this study, 2000 sampled data points of original vibration signal under each specific working condition were fed into ITD model for feature extraction. Regardless in source or target domain, each gear fault category contains 200 samples under any chosen working condition. The datasets to perform experiments are set as follows: for LSSVM, 10 samples of each fault type are selected from target domain; for MTLSSVM and those transfer strategies, both the aforesaid 10 target domain samples and 100 source domain samples are arranged. Moreover, 100 testing samples from target domain are also arranged, and there is no overlap between training and testing samples in target domain. Therefore, the total size of training set is 50 and 550 for LSSVM and the rest methods, respectively; the total size of testing set is 500. In order to quantitatively describe the domain differences, the KullbackLeibler (KL) divergence is calculated by:
where KL( · ·) represents the KL divergence between Ds and Dt. Table 2 shows the descriptions of datasets (from DA1 to DA10) as well as their corresponding KL divergences. It shows that the KL indexes of all the data sets are larger than zero, which means there exists differences between SD and TD indeed. The signals that come from the same axis have relatively small KL divergence compared with those from different axes (e.g., transferring among different rotating speeds: DA1/DA3/DA4 vs DA2, different loads: DA5/DA7/DA8 vs DA8). Meanwhile, the KL divergence of nonadjacent mechanical components is larger than those adjacent to each other (DA10 vs DA9).
First, Figures 5, 6, 7 and 8 give the visualized results of separating hyperplanes on four source domain datasets with three different fault types, including varying speeds (DA3), changing loads (DA7), adjacent mechanical parts (DA9 and DA10), to show the effectiveness of NMPT in minimizing the discrepancies of classification hyperplanes between SD and TD caused by operation conditions. Here, all datasets share the same target domain. By comparing these original classification hyperplanes, as is shown in Figure 5(a), Figure 6(a), Figure 7(a), Figure 8(a) and Figure 9, different working conditions can bring diversified results, which could easily cause erroneous diagnoses on target task when utilizing source domain samples as auxiliary training data directly. Whereas, NMPT tries to generalize the distinguishing ability from source domain to target domain, as shown in Figure 5(b), Figure 6(b), Figure 7(b) and Figure 8(b). Among them, Figure 5(b) and Figure 6(b) demonstrate similar results, which indicate that the proposed model are relatively more robust to transfer source domains from different speeds or loads compared with that from adjacent mechanical components.
Then, the performance of NMPT strategy for GFD from Test DA1 to DA10 are presented by confusion matrix, which are drawn in Figures 10, 11, and 12. In confusion matrix, the rows and columns show the actual and predicted fault types, respectively. The diagnostic accuracies of each fault type are shown in diagonal cells. Meanwhile, the misclassification rates are also listed outside the diagonal cells. Thus, from Figures 10, 11, 12 and Table 2, we can find that:
(1) Even though there exists relatively high domain differences between SD and TD in some data sets (e.g., DA9 and DA10), the NMPT model can still learn a precise classification for target task (e.g., Figure 12(a) and (b));
(2) The NMPT model investigated in this study shows very similar GFD accuracies among varying loads (from DA5 to DA8), similar conclusion can be found in changing speeds (from DA1 to DA4), which verify the robustness of NMPT to sensor axis factors. Meanwhile, the best performance of NMPT under different loads happens in diverse sensor axes (DA6). Whereas, transferring among the same axis can achieve performance improvement in the cases of varying rotating speeds (DA1 & DA3);
(3) The optimal classification performance occurs in the cases where source and target data come from the same gearbox (from DA1 to DA8), among them, the best classification accuracy of NMPT reaches 98.8% (DA1 & DA3). Besides, the performance of utilizing motor data to assist the fault recognition of reduction gearbox is lower than transferring between reduction gearbox and planetary gearbox;
(4) By comparing the accuracy and error rates in all data sets, there are many factors that can affect the model performance, among them, the mechanical components that contribute source data is the most crucial element.
In general, the classification accuracy of NMPT is always over 94%. Therefore, NMPT model can avoid overfitting of GFD under various working conditions by making reasonable use of abundant labeled data form another working condition or adjacent components.
After investigating the classification performances of NMPT method on all data sets, it is still meaningful to further compare NMPT with other methods. Table 3 lists the comparison results from DA1 to DA10, which are calculated over the whole categories. Among them, the classification performance of LSSVM model is the lowest mainly due to two things: (a) the LSSVM model is trained only by using the insufficient target domain samples, which will inevitably hinder the generalization performance according to the principles of structural risk minimization; and (b) the standard LSSVM model is lack of transferring knowledge among domains, while NMPT can make the best use of source domain samples to provide a performance improvement of diagnostic model for target task. Compared with other models, NMPT possesses the highest accuracy in the whole datasets (with the highest diagnostic accuracy: 98.8%), which proves the superiority of NMPT in utilizing source domain signals to assist GFD in target domain and provides a practical method for improving GFD performance.
Conclusions

(1)
For the GFD problems under variable working conditions, the structure of a NMPTtheoretic strategy is presented, which utilizes ITD technology to structure fault characteristics for model parameter transferring. Experimental results indicate that the proposed method can achieve 97.16% diagnostic precision when the energies of first six level PRCs are set as feature vectors.

(2)
The visualization results verify that NMPT can generalize the distinguishing ability from source domain to target domain, which is beneficial for GFD under various working conditions.

(3)
With regard to the diagnostic performance, the NMPT model shows a strong robustness under different working conditions. Meanwhile, it can be found that the influence of working conditions on the GFD results is ordered by: rotating speed < load < location.

(4)
The proposed model parameter transfer strategy show better performance than other popular methods, because NMPT can further minimize the discrepancy of two decision boundaries over tasks. Thus, the proposed strategy is expected to be an effective and feasible tool to solve GFD problem with less labeled target training data.

(5)
In the future, we could explore the relationships between KL indicator, working condition factors and GFD results to improve the universality of the NMPT model.
Abbreviations
 GFD:

Gear fault diagnosis
 MPT:

Model parameter transfer
 ITD:

Intrinsic timescale decomposition
 LSSVM:

Least squares support vector machine
 MTLSSVM:

Multitask LSSVM
 DDS:

Drivetrain dynamics simulator
References
 [1]
F Shen, C Chen, R Q Yan, et al. A fast multitasking solution: NMFtheoretic coclustering for gear fault diagnosis under variable working conditions. Chinese Journal of Mechanical Engineering, 2020, 33: 16.
 [2]
X H Jin, Y Sun, J H Shan, et al. Fault diagnosis and prognosis for wind turbines: An overview. Chinese Journal of Scientific Instrument, 2017, 38(5): 10411053. (in Chinese)
 [3]
L M Wang, Y M Shao. Crack fault classification for planetary gearbox based on feature selection technique and Kmeans clustering method. Chinese Journal of Mechanical Engineering, 2018, 31: 4.
 [4]
R N Liu, B Y Yang, E Zio, et al. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mechanical Systems and Signal Processing, 2018, 108: 3347.
 [5]
J Yu, Y He. Planetary gearbox fault diagnosis based on datadriven valued characteristic multigranulation model with incomplete diagnostic information. Journal of Sound and Vibration, 2018, 429: 6377.
 [6]
Z Gao, C Cecati, S X Ding. A survey of fault diagnosis and faulttolerant techniques—Part I: Fault diagnosis with modelbased and signalbased approaches. IEEE Transactions on Industrial Electronics, 2015, 62(6): 37573767.
 [7]
R Q Yan, R X Gao, X F Chen. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Processing, 2014, 96(PART A): 115.
 [8]
S J Deng, L W Tang, X T Zhang. Gear fault diagnosis based on an adaptive neighborhood incremental PCALPP manifold learning algorithm. Journal of Vibration and Shock, 2017, 36(14): 111132. (in Chinese)
 [9]
M Zeng, Y Yang, J S Cheng, et al. µSVD based denoising method and its application to gear fault diagnosis. Journal of Mechanical Engineering, 2015, 51(3): 95103. (in Chinese)
 [10]
S Park, S Kim, J Choi. Gear fault diagnosis using transmission error and ensemble empirical mode decomposition. Mechanical Systems and Signal Processing, 2018, 108: 262275.
 [11]
T Song, Y L Wang, M F Zhao, et al. Fault diagnosis for rotating machineries under variable operation conditions based on SVDI. Journal of Vibration and Shock, 2018, 37(19): 211216. (in Chinese)
 [12]
D Y Han, N Zhao, P M Shi. Gear fault feature extraction and diagnosis method under different load excitation based on EMD, PSOSVM and fractal box dimension. Journal of Mechanical Science and Technology, 2019, 33(2): 487494.
 [13]
D Z Zhao, T Y Wang, F L Chu. Deep convolutional neural network based planet bearing fault classification. Computers in Industry, 2019, 107: 5966.
 [14]
S J Pan, Q Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 13451359.
 [15]
N D Lawrence, J C Platt. Learning to learn with the informative vector machine. Proceedings of the 21th International Conference on Machine Learning, Banff, Alberta, Canada, July 48, 2004: 6572.
 [16]
E V Bonilla, K M A Chai, C K I Williams. Multitask Gaussian process prediction. Proceedings of the 22th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 811, 2008: 153160.
 [17]
A Schwaighofer, V Tresp, K Yu. Learning Gaussian process kernels via hierarchical Bayes. Proceedings of the 18th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 1318, 2004: 12091216.
 [18]
T Evgenious, M Pontil. Regularized multitask learning. Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 2225, 2004: 109117.
 [19]
L Chen, S Zhou. Sparse algorithm for robust LSSVM in primal space. Neurocomputing, 2018, 275: 28802891.
 [20]
R Q Yan, F Shen, C Sun, et al. Knowledge transfer for rotary machine fault diagnosis. IEEE Sensors Journal, 2020, 20(15): 83748393.
 [21]
S Xu, X An, X Qiao, et al. Multitask leastsquares support vector machines. Multimedia Tools and Applications, 2014, 71(2): 699715.
 [22]
C A Micchelli, M Pontil. Kernels for multitask learning. Proceedings of the 18th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 1318, 2004: 921928.
 [23]
M G Frei, I Osorio. Intrinsic timescale decomposition: time–frequency–energy analysis and realtime filtering of nonstationary signals. Proceedings of the Royal Society A Mathematical Physical and Engineering Sciences, 2007, 463(2078): 321342.
 [24]
S J Pan, I W Tsang, J T Kwok, et al. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199210.
 [25]
L X Duan, D Xu, S F Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 1621, 2012: 1338–1345.
 [26]
C Chen, F Shen, R Q Yan. Enhanced least squares support vector machinebased transfer learning strategy for bearing fault diagnosis. Chinese Journal of Scientific Instrument, 2017, 38(1): 3340. (in Chinese)
Acknowledgements
Not applicable.
Funding
Supported by National Natural Science Foundation of China (Grant No. 51835009).
Author information
Affiliations
Contributions
RY and JX designed the experiment, CC and FS analyzed the data, all the authors wrote and improved the paper. All authors read and approved the final manuscript.
Authors’ information
Chao Chen received his B.Sc. and M.Sc. degree from Jiangsu University in 2011 and 2014 respectively. Now he is pursuing his PhD degree in School of Instrument Science and Engineering, Southeast University. His main research interest is machine fault diagnosis..
Fei Shen received his B.Sc. and M.Sc. degree from Southeast University in 2014 and 2016 respectively. Now he is pursuing his PhD degree in School of Instrument Science and Engineering, Southeast University. His main research interest is machine fault diagnosis..
Jiawen Xu is currently an associate researcher in School of Instrument Science and Engineering, Southeast University.
Ruqiang Yan received his B.Sc. and M.E. degree from University of Science and Technology of China in 1997 and 2002 respectively, and received his Ph.D. degree in 2007 from University of Massachusetts, Amherst. Now he is a professor and Ph.D. supervisor in Xi’an Jiaotong University. His main research interests include machine condition monitoring and fault diagnosis, signal processing, and wireless sensor networks.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, C., Shen, F., Xu, J. et al. Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions. Chin. J. Mech. Eng. 34, 13 (2021). https://doi.org/10.1186/s10033020005209
Received:
Revised:
Accepted:
Published:
Keywords
 Gear fault diagnosis
 Model parameter transfer
 Varying working conditions
 Least square support vector machine