Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions

Chen, Chao; Shen, Fei; Xu, Jiawen; Yan, Ruqiang

doi:10.1186/s10033-020-00520-9

Original Article
Open access
Published: 18 January 2021

Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions

Chao Chen¹,
Fei Shen¹,
Jiawen Xu¹ &
…
Ruqiang Yan ORCID: orcid.org/0000-0003-4341-6535^1,2

Chinese Journal of Mechanical Engineering volume 34, Article number: 13 (2021) Cite this article

1933 Accesses
7 Citations
Metrics details

Abstract

Gear fault diagnosis technologies have received rapid development and been effectively implemented in many engineering applications. However, the various working conditions would degrade the diagnostic performance and make gear fault diagnosis (GFD) more and more challenging. In this paper, a novel model parameter transfer (NMPT) is proposed to boost the performance of GFD under varying working conditions. Based on the previous transfer strategy that controls empirical risk of source domain, this method further integrates the superiorities of multi-task learning with the idea of transfer learning (TL) to acquire transferable knowledge by minimizing the discrepancies of separating hyperplanes between one specific working condition (target domain) and another (source domain), and then transferring both commonality and specialty parameters over tasks to make use of source domain samples to assist target GFD task when sufficient labeled samples from target domain are unavailable. For NMPT implementation, insufficient target domain features and abundant source domain features with supervised information are fed into NMPT model to train a robust classifier for target GFD task. Related experiments prove that NMPT is expected to be a valuable technology to boost practical GFD performance under various working conditions. The proposed methods provides a transfer learning-based framework to handle the problem of insufficient training samples in target task caused by variable operation conditions.

1 Introduction

Gear has been used extensively in transmission system due to its large velocity ratio, strong bearing capacity, compactness and high efficiency [1‒4]. Gear fault diagnosis (GFD) also becomes one of the most important research hotspots from both industrial and academic communities for ensuring the safe and efficient operation of gear transmission system. Till now, with the development of sensing methods (e.g., vibration, rotor speed, acoustic signal and others), data-driven methods [5, 6], which are based on analyzing measured data without need of a deep understanding of the mechanical drive systems, have become more and more attractive and been proved to be valid in the field of gear fault diagnosis. Generally, there are two steps in data-driven method: (1) constructing a classification model based on sampled data, and (2) using the well-trained model to predict the mechanical fault type. In many existing researches, the fault diagnostic task can be treated as a problem of pattern recognition, which usually is composed of two technical processes: (1) feature extraction, and (2) fault recognition. The purpose of feature extraction is to obtain low-dimensional fault descriptors from high-dimensional vibration data. There are many advanced signal processing methods that have been proposed to provide cognizable features, such as wavelet transform (WT) [7], principal components analysis (PCA) [8], singular value decomposition (SVD) [9], empirical mode decomposition (EMD) [10], etc. Then, conventional machine learning methods (e.g., extreme learning machine, support vector machine and neural network) are employed to build a gear fault diagnostic model. However, these conventional methods usually work for GFD under constant speed and load conditions, thus having weak generality when facing with variable working conditions. Generally, gears are often working under time-varying operating conditions, for example, the running states of gas turbines or wind power generators change very often while working, and the operation parameters of planetary gearbox may vary correspondingly, thus inevitably resulting in a consequence that the extracted features in one time period might be different from those in the next time period. More importantly, the identical and independent distribution (IID) between training data and test data is required to ensure effective implementation of these conventional machine learning methods. Recently, these problems have aroused researchers’ interest and received intensive attentions. For example, Song et al. [11] developed a new singular value decomposition interpolation (SVDI) based signal processing method, in which the time-domain and frequency-domain characteristic matrices extracted from vibration signals under discrete working conditions were firstly decomposed into singular vectors, rotation matrices and characteristic means with SVD, then these three parts were interpolated to reconstruct the target eigenmatrix for data augmentation. Han et al. [12] utilized empirical mode decomposition (EMD) to decompose vibration signals into several intrinsic mode functions (IMF), and extracted feature vectors that consist of time domain indexes, frequency domain indexes, energy domain characteristic parameter and fractal box dimension from the selected IMF to investigate the dynamic feature of vibration signal accurately and improve the robustness of feature vectors under different loads for GFD. Meanwhile, Zhao et al. [13] designed a synchrosqueezing transform (SST) and deep convolutional neural network (DCNN) based method for gearbox fault classification under varying operation conditions, where a new index, the envelope time-frequency representation (TFR), was calculated by using SST, then DCNN was adopted to dig underlying features of the TFRs and determine the fault type of planetary gearbox automatically. In general, most of these methods can achieve good results by exploring advanced feature extraction methods or building a complex network classifier, but they rely on sufficient labeled training dataset normally, which could degrade performance when facing with insufficient data. However, only a few number of labeled samples collected for training probably exist in many real-world applications, which hinder the promotion of these methods greatly.

Therefore, how to train a robust model with high accuracy under limited labeled data is important. Recently, transfer learning (TL), a fast-growing filed of machine learning, has been emerging due to its knowledge transfer ability [14]. To be delighted, the amount of labeled target data (termed as target domain, TD) maybe small, but there are still plenty of relevant data which can be obtained in machine industry from another time period (e.g., under another speed and load) or adjacent components (termed as source domain, SD). By utilizing the TL technology, useful information can be extracted from existing or previous task to boost the learning efficiency of target task. The model parameter transfer (MPT), one of the transfer learning architectures, is an effective tool to transfer the shared parameters or prior distributions of hyperparameters. Recently, most of these approaches are designed to work for multitask learning (MTL). For example, Lawrence et al. [15] succeeded in learning parameters from multiple tasks through the shared Gaussian process (GP) prior. Bonilla et al. [16] proposed a GP-based model to learn the shared model knowledge over tasks. Schwaighofer et al. [17] succeed in learning multi-tasks by utilizing the combination of hierarchical Bayesian framework (HB) and GP. Besides, Evgenious et al. [18] proposed a new algorithm by referencing HB idea to solve multitask learning in the frame of support vector machine (SVM). All these methods can be easily modified for TL. Strictly speaking, MTL tries to learn different tasks jointly and simultaneously, while TL prefer to improve the performance of TD task with the help of knowledge extracted and stored from SD data. Comparison between MTL and TL is shown in Figure 1. Intuitively, we may minimize the difference in parameters of classification hyperplane between TD and SD to transfer the knowledge obtained from SD, so that a robust GFD model with better performance in TD can obtained.

According to the above analysis, a novel model parameter transfer (NMPT) approach, which aims at excavating and further transferring the shared characteristic parameters of hyperplane for the problems of insufficient labeled training samples and non-IID between source and target domains, is developed to assist target gear fault identification using source domain data. Specifically, on this basis of controlling the empirical risk of source domain, the proposed method further integrates the advantage of the conventional MPT and TL together, which can be concluded that: (a) the least square support vector machine (LSSVM) based MPT can characterize the shared and domain-specific parameters over tasks; and (b) the idea of TL is introduced to dig and extract transferable knowledge and to minimize the distributional discrepancies between source and target domains. To sum up, the novelties and main contributions of this paper can be summarized as:

Based on controlling the empirical risk of source domain features in LSSVM framework, an improved TL model is proposed by further minimizing the discrepancies of separating hyperplanes between source and target domains, and then transferring both shared and domain-specific parameters over tasks to make use of source domain data to assist target diagnostic task;
The model parameter transfer idea is innovatively introduced to the area of gear fault diagnosis, which provides a new idea for gear fault diagnosis under variable working conditions, especially when sufficient training data from target domain are not available.

The rest of this paper is organized as follows. In Section 2, the theoretical background is briefly presented. Section 3 concentrates on introducing details of the proposed NMPT method and then gives the whole framework of GFD. Section 4 illustrates the experimental study and proves that NMPT can achieve good results in GFD under variable working conditions. Finally, some conclusions drawn from this paper are listed in Section 5.

2 Theoretical Background

This study is going to leverage the NMPT model under LSSVM framework for GFD. Therefore, in this section, the fundamental theory of LSSVM as well as its improvement for MTL are briefly reviewed.

2.1 Least Squares Support Vector Machine (LSSVM)

First, the basic principle of training a SVM-based model for classification problem is to find the optimal separating hyperplane (f = w*φ(x) + b) in a reproducing kernel Hilbert space (RKHS) [19]. According to structural risk minimization (SRM) principle, the optional w and b can be obtained by minimizing the following function:

$$\min \, R = \frac{1}{2}\left\| {\varvec{w}} \right\|^{2} + C \times R_{{{\text{emp}}}} ,$$

(1)

where C is positive real regularized parameter, w is weight vector defining the orientation of separating hyperplane, R represents structural risk, R_emp denotes loss function which controls the error of separating hyperplane f on training data, and different kinds of R_emp can contribute to different forms of SVMs. By utilizing squared error function, the SRM problem in LSSVM is to compute the optimal decision-made separating hyperplane according to the vector x and its label y∈{−1,+1} by minimizing the following function with a constraint, which can be formulated as:

$$\begin{array}{ll} \mathop {\min }\limits_{{{\varvec{\omega}},e,d}} \begin{array}{ll} {} \\ \end{array} J({\varvec{w}},e) = \frac{1}{2}\left\| {\varvec{w}} \right\|^{2} + \frac{C}{2}\sum\limits_{i = 1}^{N} {e_{i}^{2} } , \hfill \\ {\text{s.t.,}}\begin{array}{ll} {} \\ \end{array} y_{i} \{{\varvec{w}}^{\text{T}} \varphi ({\varvec{x}}_{i} ) + b\} ={\text{ 1}} - e_{i} ,\quad i = 1,2, \ldots ,N, \hfill \\ \end{array}$$

(2)

where e_i is error function, φ(·) denotes a transform function that maps the input features x into RKHS, b is a bias term, N indicates the total number of training samples. Then a classification hyperplane f = w*φ(x) + b is constructed for this task.

2.2 Multi-Task LSSVM (MTLSSVM)

Given m learning tasks, the MTL aims to learn all tasks simultaneously rather than individually. Let each task ∀i∈m, we have n_i training samples $\left\{ {{\varvec{x}}_{i,j} ,y_{i,j} } \right\}_{j = 1}^{{n_{i} }}$, thus the total number of training samples is $N = \sum\nolimits_{i = 1}^{m} {n_{i} }$.

Based on the regularization framework and hierarchical Bayesian framework, some researchers assumed that all w_i can be rewritten as w_i = w₀ + v_i, where w₀ (playing the role of mean vector) and v_i carry the information of commonality and specialty over tasks [20, 21], respectively. That is to say, when m learning tasks are analogous to each other, the vectors v_i tend to be “small”, otherwise, the vector w₀ tends to be “small”. To this end, the following optimization problem which is similar to LSSVM for single task is solved to estimate all v_i as well as w₀ simultaneously:

$$\begin{gathered} \mathop {\min }\limits_{{{\varvec{w}},e,d}} \begin{array}{*{20}c} {} \\ \end{array} J({\varvec{w}}_{0} ,\left\{ {{\varvec{v}}_{i} } \right\}_{i = 1}^{m} ,\left\{ {{\varvec{e}}_{i} } \right\}_{i = 1}^{m} ) \hfill \\ \, = \frac{1}{2}\left\| {{\varvec{w}}_{0} } \right\|^{2} + \frac{1}{2} \times \frac{\lambda }{m}\sum\limits_{i = 1}^{m} {\left\| {{\varvec{v}}_{i} } \right\|^{2} } + \frac{C}{2}\sum\limits_{i = 1}^{m} {{\varvec{e}}_{i}^{{\text{T}}} {\varvec{e}}_{i} } , \hfill \\ {\text{s.t.,}}\begin{array}{*{20}c} {} \\ \end{array} {(}{\varvec{w}}_{0} { + }{\varvec{v}}_{i} {)}^{\text{T}} {\varvec{Z}}_{i} + b_{i} {\varvec{y}}_{i} { = }{\mathbf{1}}_{{n_{i} }} - {\varvec{e}}_{i} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} i = 1,2, \cdots ,m, \hfill \\ \end{gathered}$$

(3)

where C and λ are positive real regularized parameters, ${\varvec{b}} = \left\{ {b_{1} ,\;b_{2} , \cdots ,\;b_{m} } \right\}^{\text{T}} ,$ ${\varvec{e}}_{i} = \left\{ {e_{i,1} ,\;e_{i,2} , \cdots ,\;e_{{i,n_{i} }} } \right\}^{\text{T}} ,$ ${\varvec{Z}}_{i} = \left\{ {\varphi ({\varvec{x}}_{i,1} )y_{i,1} } \right.,$ $\varphi ({\varvec{x}}_{i,2} )y_{i,2} , \cdots ,\;\varphi ({\varvec{x}}_{{i,n_{i} }} )y_{{i,n_{i} }} \} ,$ ${\varvec{y}}_{i} = \left\{ {y_{i,1} ,\;y_{i,2} , \cdots ,\;y_{{i,n_{i} }} } \right\}^{\text{T}} .$

These previous works of LSSVM and MTLSSVM are not oriented to the target task where there exists the problem of insufficient training data or non-IID between training and testing data. Whereas, it is significant to derive useful information from these existed models to enhance the TD task. Therefore, different from the single task learning and multitask learning, the proposed NMPT utilizes SD data (related but different from TD) to solve target domain problems with a specific structure, which is introduced in the following section.

3 Proposed NMPT Framework for GFD

The proposed NMPT method via transferring the knowledge of classification hyperplane from SD to TD is presented in this section.

3.1 Basic Definition

Given SD and TD, the main purpose of NMPT can be described as: under LSSVM framework, NMPT aims to improve the performance of TD classification model f_t = w_t*x_t + b_t by using the knowledge from source domain classifiers model f_s = w_s*x_s + b_s, where the SD and TD are different but similar in some aspects. In addition, the training data is set as follows:

$$\begin{gathered} Ds = \left\{ {({\varvec{x}}_{j}^{s} ,y_{j}^{s} )} \right\},j = 1,2, \cdots ,Ns, \hfill \\ Dt = \left\{ {({\varvec{x}}_{i}^{t} ,y_{i}^{t} )} \right\},i = 1,2, \cdots ,Nt, \hfill \\ \end{gathered}$$

(4)

where Ds, Dt are SD and TD labeled data, respectively; ${\varvec{x}}_{j}^{s}$,$y_{j}^{s}$ denote the jth feature vector and corresponding label of SD data; ${\varvec{x}}_{i}^{t}$,$y_{i}^{t}$ denote the ith feature vector and corresponding label of TD data; Ns and Nt represent the number of SD and TD, in this paper, Nt<< Ns.

3.2 NMPT Architecture

In this section, the proposed NMPT approach is discussed. As mentioned above, the method mainly utilizes the labeled data from SD and TD to solve the target GFD problem. First, inspired by the work of multitask LSSVM framework [21, 22], we assume that the parameters, w_t and w_s form both tasks can be separated into two parts, respectively:

$${\varvec{w}}_{{\text{t}}} = {\varvec{w}}_{0} + {\varvec{v}}_{{\text{t}}} ,_{ } {\varvec{w}}_{{\text{s}}} = {\varvec{w}}_{0} + {\varvec{v}}_{{\text{s}}}$$

(5)

where w₀ is the shared parameter, v_s and v_t are the domain-specific parameters of SD and TD tasks, respectively. Then, based on previous transfer strategy that controls empirical risk of source domain, we want to find the knowledge from w_s and transfer it to w_t ulteriorly. As enough training data can prevent the model from overfitting, parameter w₀ from w_s is set as one of transfer knowledge. In addition, by minimizing the term μ|| v_t−v_s ||² during the optimization process, we can also recognize and apply knowledge of v_s learned from SD. Hence, to achieve this goal, an extension of LSSVM to transfer learning case is built as follows:

$$\begin{array}{ll} \mathop {\min }\limits_{{\varvec{w},e,d}} J(w_{0} ,v_{t} ,v_{s} ,e) \hfill \\ = \frac{1}{2}\left\| {\varvec{w}_{0} } \right\|^{2} + \frac{1}{2} \times \frac{\lambda }{2}\left( {\left\| {\varvec{v}_{t} } \right\|^{2} + \left\| {\varvec{v}_{s} } \right\|^{2} } \right) + \frac{{Ct}}{2}\sum\limits_{{i = 1}}^{{Nt}} {e_{i}^{2} } \hfill \\ + \frac{{Cs}}{2}\sum\limits_{{i = Nt + 1}}^{{Ns{{ + }}Nt}} {e_{i}^{2} } + \mu \left\| {\varvec{v}_{t} - \varvec{v}_{s} } \right\|^{2} , \hfill \\ {\text{s.t.,}}y_{i}^{t} \{{ (\varvec{w}_{0}+ \varvec{v}_{t} )}^{\text{T}} \varphi (\varvec{x}_{i}^{t} ) + b_{t} \} = 1 - e_{i} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} i = 1,2, \cdots ,Nt,{\kern 1pt} {\kern 1pt} \hfill \\ y_{j}^{s} \{ {(\varvec{w}_{0} +\varvec{v}_{s} )}^{\text{T}} \varphi (\varvec{x}_{j}^{s} ) + b_{s} \} = 1 - e_{j} ,{\kern 1pt} {\kern 1pt} j = 1,2, \ldots ,Ns,{\kern 1pt} {\kern 1pt} \hfill \\ \end{array}$$

(6)

where w₀ and μ|| v_t − v_s ||² are transfer learning items, Cs, Ct, λ and μ are positive real regularized parameters. An illustration that describes the diagram of NMPT is presented in Figure 2.

As less tagged target training data will cause the corresponding classification model to show some tendency towards performance degradation, the decision boundary with parameter w_t from target task could suffer from this problem. However, by utilizing the knowledge of w_s from source domain, NMPT architecture can ensure a relatively small generalization error on the target domain by mainly focusing on achieving the following goals: (1) learning a more accurate w₀ for target domain; (2) reducing the difference of model parameters by minimizing μ|| v_t−v_s ||² (see the purple line in Figure 2). These two goals can make source domain model be applicable for target domain and ensure the leading role of Dt in building classification model for target task. In addition, by comparing eq. (2) with eq. (6), we find the NMPT model tries to make the separating hyperplane of SD be qualified for TD classification task from two aspects on the basis of SRM principle: one is to minimize the margin discrepancies of training data between SD and TD to adjust separating hyperplane, the other is to control loss function on SD data, simultaneously. All these two improvements can prove a good capability of generalization on TD.

Then, the solving process of NMPT optimization problem (c.f. Eq. (6)) is listed as follows:

First, the Lagrangian function for Eq. (6) is built as:

$$\begin{gathered} L({\varvec{w}}_{0} ,{\varvec{v}}_{t} ,{\varvec{v}}_{s} ,b,e,a) \hfill \\ = \frac{1}{2}\left\| {{\varvec{w}}_{0} } \right\|^{2} + \frac{1}{2} \times \frac{\lambda }{2}\left( {\left\| {{\varvec{v}}_{t} } \right\|^{2} + \left\| {{\varvec{v}}_{s} } \right\|^{2} } \right) + \frac{Ct}{2}\sum\limits_{i = 1}^{Nt} {e_{i}^{2} } \hfill \\ { + }\frac{Cs}{2}\sum\limits_{{i = Nt{ + }1}}^{{Ns{ + }Nt}} {e_{i}^{2} } { + }\mu \left\| {{\varvec{v}}_{t} - {\varvec{v}}_{s} } \right\|^{2} \hfill \\ \, - \sum\limits_{i = 1}^{Nt} {a_{i} } \left\{ {y_{i}^{t} {\{ (}}{\varvec{w}}_{0} { + }{\varvec{v}}_{t} {)}^{\text{T}} \varphi ({\varvec{x}}_{i}^{t} ) + b_{t} {\} } - 1 + e_{i} \right\} \hfill \\ \, - \sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} } \left\{ {y_{i}^{s} {\{ (} {\varvec{w}}_{0} { + }{\varvec{v}}_{s} {)}^{\text{T}} \varphi ({\varvec{x}}_{i}^{s} ) + b_{s} {\} }} - 1 + e_{i} \right\}, \hfill \\ \end{gathered}$$

(7)

where a_i is a Lagrange multiplier. Then, according to Karush–Kuhn–Tucker (KKT) conditions, the solutions for optimality are yielded as:

$$\begin{gathered} \frac{{\partial L}}{{\partial {\varvec w}_{0} }} = 0 \to {\varvec w}_{0} = \sum\limits_{{i = 1}}^{{Nt}} {a_{i} y_{i}^{t} \varphi ({\varvec x}_{i}^{t} )} + \sum\limits_{{i = Nt + 1}}^{{Nt + Ns}} {a_{i} y_{i}^{s} \varphi ({\varvec x}_{i}^{s} )} , \hfill \\ \frac{{\partial L}}{{\partial {\varvec v}_{t} }} = 0 \to \frac{\lambda }{2}{\varvec v}_{t} + 2\mu ({\varvec v}_{t} - {\varvec v}_{s} ) - \sum\limits_{{i = 1}}^{{Nt}} {a_{i} y_{i}^{t} \varphi ({\varvec x}_{i}^{t} )} = 0, \hfill \\ \frac{{\partial L}}{{\partial {\varvec v}_{s} }} = 0 \to \frac{\lambda }{2}{\varvec v}_{s} + 2\mu ({\varvec v}_{s} - {\varvec v}_{t} ) - \sum\limits_{{i = Nt + 1}}^{{Nt + Ns}} {a_{i} y_{i}^{s} \varphi ({\varvec x}_{i}^{s} )} = 0, \hfill \\ \frac{{\partial L}}{{\partial b_{t} }} = 0 \to \sum\limits_{{i = 1}}^{{Nt}} {a_{i} y_{i}^{t} } = 0, \hfill \\ \frac{{\partial L}}{{\partial b_{s} }} = 0 \to \sum\limits_{{i = 1}}^{{Ns}} {a_{i} y_{i}^{s} } = 0, \hfill \\ \frac{{\partial L}}{{\partial e_{i} }} = 0 \to a_{i} = Ce_{i} , \hfill \\ \frac{{\partial L}}{{\partial a_{i} }} = 0 \to \left\{ {\begin{array}{*{20}c} {y_{i}^{t} \{ ({\varvec w}_{0} + {\varvec v}_{t} )^{{\text{T}}} \varphi ({\varvec x}_{i}^{t} ) + b_{t} \}- 1 + e_{i} = 0} \\ {(i = 1,2, \ldots , Nt)} \\ {y_{i}^{s} \{ ({\varvec w}_{0} + {\varvec v}_{s} )^{{\text{T}}} \varphi ({\varvec x}_{i}^{s} ) + b_{s} \}- 1 + e_{i} = 0} \\ {(i = Nt + 1,Nt + 2, \ldots , Nt + Ns),} \\ \end{array} } \right. \hfill \\ \end{gathered}$$

(8)

where v_t and v_s can be derived as:

$${\varvec{v}}_{t} = \frac{{\left( {1 + \frac{4\mu }{\lambda }} \right){\varvec{w}}_{0} - \sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} y_{i}^{s} \varphi ({\varvec{x}}_{i}^{s} )} }}{{\frac{\lambda }{2} + 4\mu }} = \frac{{\frac{4\mu }{\lambda }\left( {\sum\limits_{i = 1}^{Nt} {a_{i} y_{i}^{t} \varphi ({\varvec{x}}_{i}^{t} )} + \sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} y_{i}^{s} \varphi ({\varvec{x}}_{i}^{s} )} } \right){ + }\sum\limits_{i = 1}^{Nt} {a_{i} y_{i}^{t} \varphi ({\varvec{x}}_{i}^{t} )} }}{{\frac{\lambda }{2} + 4\mu }}, {\varvec{v}}_{s} = \frac{{\left( {1 + \frac{4\mu }{\lambda }} \right){\varvec{w}}_{0} - \sum\limits_{i = 1}^{Nt} {a_{i} y_{i}^{t} \varphi ({\varvec{x}}_{i}^{t} )} }}{{\frac{\lambda }{2} + 4\mu }} = \frac{{\frac{4\mu }{\lambda }\left( {\sum\limits_{i = 1}^{Nt} {a_{i} y_{i}^{t} \varphi ({\varvec{x}}_{i}^{t} )} + \sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} y_{i}^{s} \varphi ({\varvec{x}}_{i}^{s} )} } \right){ + }\sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} y_{i}^{s} \varphi ({\varvec{x}}_{i}^{s} )} }}{{\frac{\lambda }{2} + 4\mu }}.$$

(9)

By eliminating w₀, v_t, v_s and e_i through substitution, one linear system can be obtained as follows:

$$\left[ {\begin{array}{*{20}c} \varvec{0} \\ \varvec{Y} \\ \end{array} \begin{array}{*{20}c} {{\varvec{Y}}_{1} } \\ {\begin{array}{*{20}c}{\varvec{\varOmega}}\\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\varvec{b}} \\ {\varvec{a}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\varvec{0}} \\ {\overline{\user2{I}}} \\ \end{array} } \right],$$

(10)

where ${\varvec{a}} = \left[ {a_{1} ,a_{2} , \cdots ,a_{Nt} ,a_{Nt + 1} , \cdots ,a_{Nt + Ns} } \right]^{\text{T}} ,$ ${\varvec{b}} = \left[ {b_{t} ,b{}_{s}} \right]^{\text{T}} ,$${\varvec{Y}}_{1} = [y_{1}^{t} ,y_{2}^{t} , \cdots ,y_{Nt}^{t} ,y_{1}^{s} ,y_{2}^{s} , \cdots ,y_{Ns}^{s} ],$ ${\varvec{I}} = [1,1, \cdots ,1]_{(Nt + Ns) \times 1} ,$${\varvec{0}} = \left[ {0,0} \right],$ Y = blockdiag(y_s, y_t), ${\varvec{y}}_{t} = [y_{1}^{t} ,y_{2}^{t} , \cdots ,y_{Nt}^{t} ]^{\text{T}} ,$${\varvec{y}}_{s} = [y_{1}^{s} ,y_{2}^{s} , \cdots ,y_{Ns}^{s} ]^{\text{T}} ,$ Ω is (Nt + Ns) × (Nt + Ns) symmetric matrix ${\varvec{\varOmega}}{ = }\Omega_{0} + \Omega_{1} + \frac{1}{C}{\varvec{I}}_{Nt + Ns} ,$ Ω₁= blockdiag(Ω_t, Ω_s), K represents the kernel function, the detail element in Ω is defined as:

$$\begin{gathered} \Omega_{0ij} = \left( {1 + {{\frac{4\mu }{\lambda }} \mathord{\left/ {\vphantom {{\frac{4\mu }{\lambda }} {\left( {\frac{\lambda }{2} + 4\mu } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\frac{\lambda }{2} + 4\mu } \right)}}} \right)y_{i} y_{j} K({\varvec{x}}_{i} ,{\varvec{x}}_{j} ), \, y_{i} ,y_{j} \in {\varvec{Y}}_{1} , \hfill \\ \left( {{\varvec{x}}_{i} ,\;{\varvec{x}}_{j} \in \left[ {{\varvec{x}}_{1}^{t} ,\;{\varvec{x}}_{2}^{t} , \cdots ,\;{\varvec{x}}_{Nt}^{t} ,\;{\varvec{x}}_{1}^{s} ,\;{\varvec{x}}_{2}^{s} \cdots ,\;{\varvec{x}}_{Ns}^{s} } \right]{,}} \right) \hfill \\ \Omega_{tij} = \frac{1}{{\frac{\lambda }{2} + 4\mu }}y_{i}^{t} y_{j}^{t} K({\varvec{x}}_{i}^{t} ,{\varvec{x}}_{j}^{t} ) \, ,i,j \in \left[ {1,Nt} \right], \hfill \\ \Omega_{sij} = \frac{1}{{\frac{\lambda }{2} + 4\mu }}y_{i}^{s} y_{j}^{s} K({\varvec{x}}_{i}^{s} ,{\varvec{x}}_{j}^{s} ),i,j \in \left[ {1,Ns} \right]. \hfill \\ \end{gathered}$$

(11)

The best fit values of parameters a, b_t and b_s can be finally worked out, then the corresponding decision function can be constructed as follows:

$$y = sgn\left[ {\left( {1 + {{\frac{4\mu }{\lambda }} \mathord{\left/ {\vphantom {{\frac{4\mu }{\lambda }} {\left( {\frac{\lambda }{2} + 4\mu } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\frac{\lambda }{2} + 4\mu } \right)}}} \right) \times \left( {\sum\limits_{i = 1}^{Nt} {a_{i} y_{i}^{t} K({\varvec{x}}_{i}^{t} ,{\varvec{x}})} + \sum\limits_{i = Nt + 1}^{Nt + Ns} {a_{i} y_{i}^{s} K({\varvec{x}}_{i}^{s} ,{\varvec{x}})} } \right) + \frac{1}{{\frac{\lambda }{2} + 4\mu }}\sum\limits_{j = 1}^{Nt} {a_{i} y_{j}^{t} K({\varvec{x}}_{j}^{t} ,{\varvec{x}})} + b_{t} } \right].$$

(12)

3.3 Complete Process of NMPT Model for Gear Fault Diagnosis

In the proposed framework, an intrinsic time-scale decomposition (ITD) architecture is first introduced to decompose a vibration signal into a set of proper rotation components (PRCs). Then, the energy parameter of each proper rotation component (PRC) is calculated to conduct dimensionality reduction and construct feature vectors. By structuring and solving the optimization problem of NMPT (c.f. Eq. (6)) using the learned fault representations, the parameters of NMPT model (including w₀, v_s v_t, b_s and b_t) can be learned simultaneously. Finally, the target data are fed into NMPT to output the predicted fault categories. Figure 3 gives the overall proposed framework for NMPT-based GFD.

4 Experiment and Discussion

4.1 Descriptions of Experimental Simulator and Datasets

To conduct experimental verification, the testing platform, drivetrain dynamics simulator (DDS), is shown in Figure 4. It includes driving motor, speed regulator, planetary gearbox, reduction gearbox, brake device, brake regulator. During data collection, the variety of speeds and loads can be implemented through speed regulator and brake regulator, respectively. Meanwhile, there are altogether 7 vibration sensors (model: 608A11, sample frequency: 5120 Hz) in the structure, one is mounted on the surface of motor to measure z-axial vibration signal of the motor (F1), the rest are as follows: three for planetary gearbox (F2) and three for reduction gearbox (F3). Except for the healthy gear (Healthy, C1), there are four different types of gear faults, denoted as a small piece of material breaking away from tooth (Chipped, C2), a tooth fracturing at the location of root (Missing, C3), the emergence of cracks on root cracked (Cracked, C4) and the loss of material from the contacting surface of tooth (Worn, C5). The descriptions of fault types and different experiment conditions are shown in Table 1.

Table 1 Gear fault type and working conditions

Full size table

4.2 Experimental Results and Analysis

4.2.1 Feature Extraction

Intrinsic time-scale decomposition (ITD) , proposed by Frei et al. [23], is a time frequency analysis method which can adaptively decompose a given vibration signal X into a series of proper rotation components (PRCs) and a monotonous trend signal (remaining baseline signal) with low end effects and high efficiency, which can described as:

$$X{ = }H^{1} + H^{2} + \cdot \cdot \cdot + H^{p} + L^{p} ,$$

(13)

where p denotes the final decomposition level, Hⁱ is the ith PRC, L^p is the remaining baseline signal.

Nevertheless, these obtained PRCs with ITD technology are too complex to be taken as fault vectors as inputs for conducting fault classification directly. Thus, the energies of first six level PRCs are calculated for dimensionality reduction of PRCs and fault feature design.

4.2.2 Experimental Study

In this part, the diagnostic performance of the proposed NMPT is first analyzed, then, in order to further demonstrate the superiority of NMPT, it is also compared with other methods:

LSSVM(non-transfer): Least squares support vector machine;
MTLSSVM (non-transfer): Multi-Task LSSVM;
TCA [24]: Transfer component analysis;
DSM [25]: Domain selection machine;
ELSSVM [26]: Enhanced LSSVM

For a fair comparison, all kernel-based methods use the Radial Basis Function (RBF) as the kernel function. In this study, 2000 sampled data points of original vibration signal under each specific working condition were fed into ITD model for feature extraction. Regardless in source or target domain, each gear fault category contains 200 samples under any chosen working condition. The datasets to perform experiments are set as follows: for LSSVM, 10 samples of each fault type are selected from target domain; for MTLSSVM and those transfer strategies, both the aforesaid 10 target domain samples and 100 source domain samples are arranged. Moreover, 100 testing samples from target domain are also arranged, and there is no overlap between training and testing samples in target domain. Therefore, the total size of training set is 50 and 550 for LSSVM and the rest methods, respectively; the total size of testing set is 500. In order to quantitatively describe the domain differences, the Kullback-Leibler (KL) divergence is calculated by:

$${\text{KL}}(Ds,Dt) = \frac{{\text{KL}(Ds||Dt) + {\text{KL}}(Dt||Ds)}}{2},$$

(14)

where KL( ·|| ·) represents the KL divergence between Ds and Dt. Table 2 shows the descriptions of datasets (from DA1 to DA10) as well as their corresponding KL divergences. It shows that the KL indexes of all the data sets are larger than zero, which means there exists differences between SD and TD indeed. The signals that come from the same axis have relatively small KL divergence compared with those from different axes (e.g., transferring among different rotating speeds: DA1/DA3/DA4 vs DA2, different loads: DA5/DA7/DA8 vs DA8). Meanwhile, the KL divergence of nonadjacent mechanical components is larger than those adjacent to each other (DA10 vs DA9).

Table 2 Specific tests in experimental section

Full size table

First, Figures 5, 6, 7 and 8 give the visualized results of separating hyperplanes on four source domain datasets with three different fault types, including varying speeds (DA3), changing loads (DA7), adjacent mechanical parts (DA9 and DA10), to show the effectiveness of NMPT in minimizing the discrepancies of classification hyperplanes between SD and TD caused by operation conditions. Here, all datasets share the same target domain. By comparing these original classification hyperplanes, as is shown in Figure 5(a), Figure 6(a), Figure 7(a), Figure 8(a) and Figure 9, different working conditions can bring diversified results, which could easily cause erroneous diagnoses on target task when utilizing source domain samples as auxiliary training data directly. Whereas, NMPT tries to generalize the distinguishing ability from source domain to target domain, as shown in Figure 5(b), Figure 6(b), Figure 7(b) and Figure 8(b). Among them, Figure 5(b) and Figure 6(b) demonstrate similar results, which indicate that the proposed model are relatively more robust to transfer source domains from different speeds or loads compared with that from adjacent mechanical components.

Then, the performance of NMPT strategy for GFD from Test DA1 to DA10 are presented by confusion matrix, which are drawn in Figures 10, 11, and 12. In confusion matrix, the rows and columns show the actual and predicted fault types, respectively. The diagnostic accuracies of each fault type are shown in diagonal cells. Meanwhile, the misclassification rates are also listed outside the diagonal cells. Thus, from Figures 10, 11, 12 and Table 2, we can find that:

(1) Even though there exists relatively high domain differences between SD and TD in some data sets (e.g., DA9 and DA10), the NMPT model can still learn a precise classification for target task (e.g., Figure 12(a) and (b));

(2) The NMPT model investigated in this study shows very similar GFD accuracies among varying loads (from DA5 to DA8), similar conclusion can be found in changing speeds (from DA1 to DA4), which verify the robustness of NMPT to sensor axis factors. Meanwhile, the best performance of NMPT under different loads happens in diverse sensor axes (DA6). Whereas, transferring among the same axis can achieve performance improvement in the cases of varying rotating speeds (DA1 & DA3);

(3) The optimal classification performance occurs in the cases where source and target data come from the same gearbox (from DA1 to DA8), among them, the best classification accuracy of NMPT reaches 98.8% (DA1 & DA3). Besides, the performance of utilizing motor data to assist the fault recognition of reduction gearbox is lower than transferring between reduction gearbox and planetary gearbox;

(4) By comparing the accuracy and error rates in all data sets, there are many factors that can affect the model performance, among them, the mechanical components that contribute source data is the most crucial element.

In general, the classification accuracy of NMPT is always over 94%. Therefore, NMPT model can avoid overfitting of GFD under various working conditions by making reasonable use of abundant labeled data form another working condition or adjacent components.

After investigating the classification performances of NMPT method on all data sets, it is still meaningful to further compare NMPT with other methods. Table 3 lists the comparison results from DA1 to DA10, which are calculated over the whole categories. Among them, the classification performance of LSSVM model is the lowest mainly due to two things: (a) the LSSVM model is trained only by using the insufficient target domain samples, which will inevitably hinder the generalization performance according to the principles of structural risk minimization; and (b) the standard LSSVM model is lack of transferring knowledge among domains, while NMPT can make the best use of source domain samples to provide a performance improvement of diagnostic model for target task. Compared with other models, NMPT possesses the highest accuracy in the whole datasets (with the highest diagnostic accuracy: 98.8%), which proves the superiority of NMPT in utilizing source domain signals to assist GFD in target domain and provides a practical method for improving GFD performance.

Table 3 Total GFD accuracies from test DA1 to DA10

Full size table

5 Conclusions

(1)
For the GFD problems under variable working conditions, the structure of a NMPT-theoretic strategy is presented, which utilizes ITD technology to structure fault characteristics for model parameter transferring. Experimental results indicate that the proposed method can achieve 97.16% diagnostic precision when the energies of first six level PRCs are set as feature vectors.
(2)
The visualization results verify that NMPT can generalize the distinguishing ability from source domain to target domain, which is beneficial for GFD under various working conditions.
(3)
With regard to the diagnostic performance, the NMPT model shows a strong robustness under different working conditions. Meanwhile, it can be found that the influence of working conditions on the GFD results is ordered by: rotating speed < load < location.
(4)
The proposed model parameter transfer strategy show better performance than other popular methods, because NMPT can further minimize the discrepancy of two decision boundaries over tasks. Thus, the proposed strategy is expected to be an effective and feasible tool to solve GFD problem with less labeled target training data.
(5)
In the future, we could explore the relationships between KL indicator, working condition factors and GFD results to improve the universality of the NMPT model.

Abbreviations

GFD:: Gear fault diagnosis
MPT:: Model parameter transfer
ITD:: Intrinsic time-scale decomposition
LSSVM:: Least squares support vector machine
MTLSSVM:: Multi-task LSSVM
DDS:: Drivetrain dynamics simulator

References

F Shen, C Chen, R Q Yan, et al. A fast multi-tasking solution: NMF-theoretic co-clustering for gear fault diagnosis under variable working conditions. Chinese Journal of Mechanical Engineering, 2020, 33: 16.
Article Google Scholar
X H Jin, Y Sun, J H Shan, et al. Fault diagnosis and prognosis for wind turbines: An overview. Chinese Journal of Scientific Instrument, 2017, 38(5): 1041-1053. (in Chinese)
Google Scholar
L M Wang, Y M Shao. Crack fault classification for planetary gearbox based on feature selection technique and K-means clustering method. Chinese Journal of Mechanical Engineering, 2018, 31: 4.
Article Google Scholar
R N Liu, B Y Yang, E Zio, et al. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mechanical Systems and Signal Processing, 2018, 108: 33-47.
Article Google Scholar
J Yu, Y He. Planetary gearbox fault diagnosis based on data-driven valued characteristic multigranulation model with incomplete diagnostic information. Journal of Sound and Vibration, 2018, 429: 63-77.
Article Google Scholar
Z Gao, C Cecati, S X Ding. A survey of fault diagnosis and fault-tolerant techniques—Part I: Fault diagnosis with model-based and signal-based approaches. IEEE Transactions on Industrial Electronics, 2015, 62(6): 3757-3767.
Article Google Scholar
R Q Yan, R X Gao, X F Chen. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Processing, 2014, 96(PART A): 1-15.
S J Deng, L W Tang, X T Zhang. Gear fault diagnosis based on an adaptive neighborhood incremental PCA-LPP manifold learning algorithm. Journal of Vibration and Shock, 2017, 36(14): 111-132. (in Chinese)
Google Scholar
M Zeng, Y Yang, J S Cheng, et al. µ-SVD based denoising method and its application to gear fault diagnosis. Journal of Mechanical Engineering, 2015, 51(3): 95-103. (in Chinese)
Article Google Scholar
S Park, S Kim, J Choi. Gear fault diagnosis using transmission error and ensemble empirical mode decomposition. Mechanical Systems and Signal Processing, 2018, 108: 262-275.
Article Google Scholar
T Song, Y L Wang, M F Zhao, et al. Fault diagnosis for rotating machineries under variable operation conditions based on SVDI. Journal of Vibration and Shock, 2018, 37(19): 211-216. (in Chinese)
Google Scholar
D Y Han, N Zhao, P M Shi. Gear fault feature extraction and diagnosis method under different load excitation based on EMD, PSO-SVM and fractal box dimension. Journal of Mechanical Science and Technology, 2019, 33(2): 487-494.
Article Google Scholar
D Z Zhao, T Y Wang, F L Chu. Deep convolutional neural network based planet bearing fault classification. Computers in Industry, 2019, 107: 59-66.
Article Google Scholar
S J Pan, Q Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359.
Article Google Scholar
N D Lawrence, J C Platt. Learning to learn with the informative vector machine. Proceedings of the 21th International Conference on Machine Learning, Banff, Alberta, Canada, July 4-8, 2004: 65-72.
E V Bonilla, K M A Chai, C K I Williams. Multi-task Gaussian process prediction. Proceedings of the 22th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008: 153-160.
A Schwaighofer, V Tresp, K Yu. Learning Gaussian process kernels via hierarchical Bayes. Proceedings of the 18th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 13-18, 2004: 1209-1216.
T Evgenious, M Pontil. Regularized multi-task learning. Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, 2004: 109-117.
L Chen, S Zhou. Sparse algorithm for robust LSSVM in primal space. Neurocomputing, 2018, 275: 2880-2891.
Article Google Scholar
R Q Yan, F Shen, C Sun, et al. Knowledge transfer for rotary machine fault diagnosis. IEEE Sensors Journal, 2020, 20(15): 8374-8393.
Article Google Scholar
S Xu, X An, X Qiao, et al. Multi-task least-squares support vector machines. Multimedia Tools and Applications, 2014, 71(2): 699-715.
Article Google Scholar
C A Micchelli, M Pontil. Kernels for multi-task learning. Proceedings of the 18th Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 13-18, 2004: 921-928.
M G Frei, I Osorio. Intrinsic time-scale decomposition: time–frequency–energy analysis and real-time filtering of non-stationary signals. Proceedings of the Royal Society A Mathematical Physical and Engineering Sciences, 2007, 463(2078): 321-342.
Article MathSciNet Google Scholar
S J Pan, I W Tsang, J T Kwok, et al. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2011, 22(2): 199-210.
Article Google Scholar
L X Duan, D Xu, S F Chang. Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012: 1338–1345.
C Chen, F Shen, R Q Yan. Enhanced least squares support vector machine-based transfer learning strategy for bearing fault diagnosis. Chinese Journal of Scientific Instrument, 2017, 38(1): 33-40. (in Chinese)
Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

Supported by National Natural Science Foundation of China (Grant No. 51835009).

Author information

Authors and Affiliations

School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
Chao Chen, Fei Shen, Jiawen Xu & Ruqiang Yan
School of Mechanical Engineering, Xi’an Jiaotong University, Xi’an, 710049, China
Ruqiang Yan

Authors

Chao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fei Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jiawen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ruqiang Yan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RY and JX designed the experiment, CC and FS analyzed the data, all the authors wrote and improved the paper. All authors read and approved the final manuscript.

Authors’ information

Chao Chen received his B.Sc. and M.Sc. degree from Jiangsu University in 2011 and 2014 respectively. Now he is pursuing his PhD degree in School of Instrument Science and Engineering, Southeast University. His main research interest is machine fault diagnosis..

Fei Shen received his B.Sc. and M.Sc. degree from Southeast University in 2014 and 2016 respectively. Now he is pursuing his PhD degree in School of Instrument Science and Engineering, Southeast University. His main research interest is machine fault diagnosis..

Jiawen Xu is currently an associate researcher in School of Instrument Science and Engineering, Southeast University.

Ruqiang Yan received his B.Sc. and M.E. degree from University of Science and Technology of China in 1997 and 2002 respectively, and received his Ph.D. degree in 2007 from University of Massachusetts, Amherst. Now he is a professor and Ph.D. supervisor in Xi’an Jiaotong University. His main research interests include machine condition monitoring and fault diagnosis, signal processing, and wireless sensor networks.

Corresponding author

Correspondence to Ruqiang Yan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, C., Shen, F., Xu, J. et al. Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions. Chin. J. Mech. Eng. 34, 13 (2021). https://doi.org/10.1186/s10033-020-00520-9

Download citation

Received: 28 April 2020
Revised: 13 November 2020
Accepted: 19 November 2020
Published: 18 January 2021
DOI: https://doi.org/10.1186/s10033-020-00520-9

Model Parameter Transfer for Gear Fault Diagnosis under Varying Working Conditions

Abstract

1 Introduction

2 Theoretical Background

2.1 Least Squares Support Vector Machine (LSSVM)

2.2 Multi-Task LSSVM (MTLSSVM)

3 Proposed NMPT Framework for GFD

3.1 Basic Definition

3.2 NMPT Architecture

3.3 Complete Process of NMPT Model for Gear Fault Diagnosis

4 Experiment and Discussion

4.1 Descriptions of Experimental Simulator and Datasets

4.2 Experimental Results and Analysis

4.2.1 Feature Extraction

4.2.2 Experimental Study

5 Conclusions

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors’ information

Corresponding author

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords