Dynamic Distribution Adaptation Based Transfer Network for Cross Domain Bearing Fault Diagnosis

In machinery fault diagnosis, labeled data are always difficult or even impossible to obtain. Transfer learning can leverage related fault diagnosis knowledge from fully labeled source domain to enhance the fault diagnosis performance in sparsely labeled or unlabeled target domain, which has been widely used for cross domain fault diagnosis. However, existing methods focus on either marginal distribution adaptation (MDA) or conditional distribution adaptation (CDA). In practice, marginal and conditional distributions discrepancies both have significant but different influences on the domain divergence. In this paper, a dynamic distribution adaptation based transfer network (DDATN) is proposed for cross domain bearing fault diagnosis. DDATN utilizes the proposed instance-weighted dynamic maximum mean discrepancy (IDMMD) for dynamic distribution adaptation (DDA), which can dynamically estimate the influences of marginal and conditional distribution and adapt target domain with source domain. The experimental evaluation on cross domain bearing fault diagnosis demonstrates that DDATN can outperformance the state-of-the-art cross domain fault diagnosis methods.


Introduction
As a critical part of modern equipment, bearing always works under harsh conditions and suffers from timevarying load, which results in a significant risk of failure [1].Bearing failure is the main cause of the machinery breakdowns, and sometimes can lead to huge economic loss and severe casualties [2,3].To ensure the operational reliability of equipment, researchers have conducted related studies for bearing fault diagnosis and proposed many effective methods [4][5][6][7][8].
Among these methods, deep learning based methods have shown its excellent performance in recent years, which can learn diagnosis knowledge from large amount of labeled data and reduce the dependence on expertise [9][10][11][12].Although deep learning based methods are not expertise-dependent, they are heavily data-dependent.Unfortunately, collecting labeled machinery failure data are expensive or even impossible [13].Under such circumstance, transfer learning [14] begins to attract researchers' attention, which could transfer the related knowledge of fully labeled source domain to enhance the fault diagnosis performance in sparsely labeled or unlabeled target domain [15,16].
Han et al. [17] introduced adversarial learning for feature distribution adaptation and transferred the source domain fault diagnosis model to the target domain.Guo et al. [18] utilized both Maximum mean discrepancy (MMD) and adversarial learning to adapt the feature distribution, which can transfer the fault diagnosis model to other machineries.Yang et al. [19]  The original version of this article was revised: the affiliation has been updated.
bearings with MMD-based multi-layer feature alignment and pseudo label learning.Wang et al. [20] utilized conditional maximum mean discrepancy (CMMD) to align feature distribution for cross-domain bearing fault diagnosis.Li et al. [21] proposed a novel cross domain fault diagnosis method, which take full advantage of the availability of target domain health labels.These transfer learning methods have achieved great success in unsupervised domain adaptation, which can build fault diagnosis models for unlabeled target domain.However, these works only tend to adapt either marginal or conditional distributions (MDA or CDA) between source and target domain.In practice, both marginal and conditional distribution discrepancies have significant but different influences on domain divergence [22].Recently, researchers have carried out some work in joint distribution adaptation (JDA) [23,24], which simultaneously adapt marginal and conditional distribution.Although these works have achieved better performance, they allocate equal weights to marginal and conditional distributions discrepancies, which cannot quantify the different contributions of these distributions discrepancies.
In this paper, a dynamic distribution adaptation based transfer network (DDATN) is proposed for cross domain bearing fault diagnosis, which utilizes the proposed instance-weighted dynamic maximum mean discrepancy (IDMMD) for dynamic distribution adaptation (DDA).The main contributions of the paper are as follows.
(1) Introduce DDA framework for cross domain bearing fault diagnosis, which can dynamically adjust the weights of marginal and conditional distributions discrepancies in domain adaptation.(2) Propose a novel dynamic distribution discrepancy metric (IDMMD) for unsupervised DDA.IDMMD uses a novel dynamic factor estimation method to dynamically estimate the contributions of MDA and CDA, which further considers the contribution of CDA of each class.In addition, it takes the confidence of target domain pseudo labels into account when calculates the conditional distribution discrepancy.
The remainder of the paper are organized as follows.The theoretical and technical bases are introduced in Section 2. Section 3 describes the detail of the proposed DDATN, which has been experimental evaluated in Section 4. Finally, the conclusion is drawn in Section 5.

Dynamic Distribution Adaptation
Marginal and conditional distributions have different contributions on domain divergence and their contributions dynamically change during the transfer learning procedures.To improve transfer learning performance, DDA [22] is proposed as a general transfer learning framework, which considers the different and everchanging contributions of marginal and conditional distributions on domain divergence.In DDA, the dynamic distribution discrepancy has the general form as where P s and Q s are marginal and conditional distributions of source domain Ω s , respectively; P t and Q t are marginal and conditional distributions of source domain Ω t , respectively; D(P s , P t ) is marginal distribution discrepancy, D (c) (Q s , Q t ) is conditional distribution discrepancy for class c, C is the number of classes; μ is the dynamic weight which changes when the training goes on.
From Eq. ( 1), DDA degenerates to MDA and CDA when μ=0 and μ=1, respectively.Therefore, DDA can be regarded as a more general distribution adaptation framework.

Maximum Mean Discrepancy
Maximum mean discrepancy (MMD) [25] which is an effective distribution discrepancy metric widely used in transfer learning.Given datasets X s and X t sampled from distributions P(X s ) and P(X t ), the MMD between P(X s ) and P(X t ) is can be calculated as where x i s ∈X s , x j t ∈X t ; n s and n t are the numbers of samples in X s and X t , respectively; φ is a nonlinear mapping func- tion in reproducing kernel Hilbert space (RKHS) H.
From Eq. ( 2), the MMD is expressed as the distance in H between mean embeddings of X s and X t . (1)

Supervised Learning
The DDATN is proposed for unsupervised domain adaptation, which target domain data are totally unlabeled.The supervised learning is realized using labeled source domain data, whose loss can be defined as where J (•, •) is cross-entropy loss function, y i s is the labeled of source domain sample x i s .

Instances-weighted Dynamic Maximum Mean Discrepancy (IDMMD)
In unsupervised domain adaptation, target domain cannot provide label information.The final fault diagnosis process can just be conducted by the shared classifier G y which trained by labeled source domain data.To prevent the interference of target domain specific features and domain divergence, it is important to extract the domain-invariant features. ( In DDATN, a novel dynamic distribution discrepancy metric IDMMD is proposed for constraining G f to extract domain-invariant features.IDMMD based on the DDA framework, which considers the ever-changing contributions of marginal and conditional distribution discrepancy on domain divergence.In addition, IDMMD further considers the different contributions of conditional distributions discrepancies of different class, and the confidence of target domain samples' pseudo labels.The IDMMD between Ω s and Ω t is defined as: where IDMMD M is the marginal distributions discrepancy, IDMMD C (c) is the conditional distributions discrepancy for class c, μ (c) is the dynamic factor for IDMMD C (c) .They are defined as where y i (c) is the real one-hot label of x i s for class c, ŷ j (c) is the prediction probabilities of x j t for class c.The IDMMD M is the original form of MMD.For unsupervised domain adaptation, the labels of target domain samples which are necessary for calculating conditional distributions discrepancy is unavailable.Therefore, the predictions of Ω t are regarded as its soft labels.Considering the confidence of the soft labels, different weights are allocated to different target domain samples while calculating IDMMD C (c) , which are their prediction probabilities for class c.Intuitively, MMD calculates the distance between the centers of two datasets in the embedded feature space.In target ( 4) domain, the center of each class will tend to be closer to the samples which have higher prediction probabilities with the proposed weight allocation.Therefore, the negative effects of misclassification will be diminished.
To quantify the ever-changing contributions of marginal and conditional distributions, the dynamic factors for each class are calculated as Eq.(7).The class has larger IDMMD C value will be allocated larger dynamic factor, and the dynamic factor of marginal distribution will be calculated as Eq. ( 4).The proposed dynamic factor allocation method aims at guiding the DDA to focus on the main cause of domain shift.

General Procedure of DDATN
As mentioned above, DDATN contain two parts: supervised learning and DDA.Therefore, the total loss function of DDATN can be defined as where λ is the trade-off factor.
The procedures of DDATN are presented in Figure 2 and summarized as follows.
(1) Datasets generation.The source and target domain signals are segmented and standardized to form

Dataset Description
In this section, the CWRU [26] bearing dataset (CW) and the bearing dataset from our laboratory (OL) are utilized for verifying the effectiveness of the DDATN.The test rig of CWRU bearing dataset is shown in Figure 3, which consists of a driven motor (left), a torque transducer and an encoder (middle), and a dynamometer (right).The test bearing is installed at the output side of the motor and support the motor shaft.This bearing test includes four health conditions with four fault diameters (0.18, 0.36, 0.53, 0.071 mm), and it is conducted on four different loads (0, 1, 2, 3 hp) with sampling rates of 12 kHz and 48 kHz.The details of the used part of data whose sampling rate is 12 kHz are listed in Table 1.
The bearing test rig of our laboratory is shown in Figure 4.The test rig is driven by the motor, and the power transfer to the shaft which is supported by the test bearing with belt drive.The loading device exert radial force on the shaft to simulate the load of bearing.In this test, inner and outer race faults with size 0.5 mm are introduced to the test bearing by wire-electrode cutting.The acceleration signals are collected with sampling rate of 12 kHz.The details of this dataset are listed in Table 2.
For each health condition, 100 samples are segmented from the original vibration signals.Therefore, there are 300 and 1200 (300 for each speed) sampled from CW and OL bearing datasets, respectively.The length of the sample is set as 2048.

Comparison Setting
Thirty-six cross equipment tasks are conducted to verify the effectiveness of DDATN, which are listed in Table 3.For target domain dataset, half are used for training and the rest are served as testing dataset.In Table 3   dataset, OL 500 (150) denotes the OL bearing data with speed of 500 r/min and the number of samples are 150 (50 samples for each health condition).CW0.18_1(300) denotes the CWRU data with 0.18 mm fault diameter and 1 hp load, and the number of samples are 300 (100 samples for each health condition).

, S denotes source domain dataset, T denotes target domain training
The structures of features extractor G f and classifier G y are presented in Table 4, where Conv1D denotes 1D convolutional layer, MP1D denotes 1D max pooling layer, FC denotes fully connected layer.The G f and G y are trained by Adam optimizer (learning rate = 0.001, β 1 = 0.9, β 2 = 0.999).The tradeoff factor λ is set as 1.
The details of the comparison methods are as following.They use the same CNN structure as DDATN.

Result and Discussion
The experiment is conducted on a computer with two E5-2630 v3 CPUs, a Nvidia GeForce RTX 2080 Ti GPU (11 GB memory), and 64 GB memory.To avoid the influence of randomness, each task is repeated 10 times.The mean accuracies, standard deviations, training and testing time are listed in Table 5.The overall accuracy curves of these methods are presented in Figure 5.In addition, for each method, the average accuracy and standard deviation among all tasks are also presented in Avg.
The comparison shows that IWC, IWCM and DDATN have better performances than other methods.DDC has the worst accuracy in in almost all tasks except task 14.FTNN and DTN are both derived from DDC, whereas they have different improving directions.FTNN extends the single-layer adaptation to multi-layer adaptation and introduces pseudo label learning for further improvement, which has demonstrated to be effective in this case.DTN extends the MDA to JDA and achieves higher average accuracy than FTNN.IWC achieves the average accuracy of 91.24% with standard deviation of 11.16%, which indicates that the proposed conditional distribution discrepancy metric is effective and robust.IWCM shows better performance than IWC in all tasks.The extension from CDA to JDA is proved valid while comparing IWCM with IWC.In addition, the IWCM can be regarded as the variation of DTN, which replaces the pseudo label strategy with instanceweighted strategy when calculating conditional distribution discrepancy.The comparison between IWCM and DTN demonstrates the effectiveness of the instanceweighted strategy.
Specifically, DDATN outperforms other methods in all tasks and achieves the highest average accuracy of 98.43% with lowest standard deviation, which indicates its superior effectiveness and robustness.In tasks 3, 7, 13, 21, 23, DDATN does not perform best, but it still gains a very close accuracy with the highest one.In tasks 4 (CW0.18_1 to OL1400) and 8 (CW0.18_2 to OL1400), the accuracies of DDATN are relatively low (67.07%for both tasks), which indicates that DDATN cannot gain very high accuracy in some transfer tasks.However, the accuracies of DDATN in these tasks still higher than other methods.In some difficult transfer tasks, DDATN may not be able to gain very high accuracy, but it can improve the performance to some degree.
In summary, the comparison indicates that the proposed conditional distribution discrepancy metric is effective and robust, whereas the extension from MDA, CDA and JDA to DDA can further improve the crossdomain fault diagnosis performance.

Feature Visualization
All the methods used in this experiment are featurebased transfer learning methods.To further evaluate the feature alignment performance of DDATN, t-distributed stochastic neighbor embedding (t-SNE) [28] is utilized for feature visualization.Tasks 33 and 36 are selected for visualization.For DDC, IWC, IWCM and DDATN, the feature visualizations are conducted on Flatten layer, whereas it is conducted on FC_2 layer for FTNN and DTN.In Figures 6 and 7, the legend consists of two parts: bearing health condition (outside the bracket) and the domain label (inside the bracket).For example, IR (T) denotes the inner race fault sample of the target domain.The marker of source and target domain samples are circle and triangle, respectively.The color represents the health condition, e.g., blue represent Normal (N), red represent Inner Race fault (IR), green represent Outer Race fault (OR).
In task 33, the features of IWC, IWCM and DDATN show good fusion of source and target domains, whereas great discriminability with respect to bearing health conditions is also observed.For other methods, their features still have related good interclass discriminability, but the aggregation of source and target domains is poor.Especially, the source and target domain samples can be linearly separated with the feature of DDC.
In task 36, the features of IWC, IWCM and DDATN still show superior performance compared with the features of other methods, which still have good fusion of source and target domains.However, there has been a significant degeneration of the interclass separability of IWC and IWCM, whereas DDATN still hold excellent interclass separability.For DDC, FTNN and DTN, the interclass separability and intraclass aggregation are both poor, whereas the fusion of source and target domains are hard to be observed.The feature visualization demonstrates that DDATN can effectively adapt the target domain features distributions to that of source domain.The target samples can be accurately aggregated to the corresponding source cluster, and the extracted features shows good fusion of domains and excellent discriminability of bearing health conditions.

Conclusions
Dynamic Distribution Adaptation Based Transfer Network In this paper, DDA is introduced for improving the cross domain bearing fault diagnosis performance.The framework of DDATN is shown in Figure 1.It consists of DDA and supervised learning.The DDA part aims to constrain the feature extractor G f to extract domain-invariant features by minimizing the proposed dynamic distribution discrepancy IDMMD.The supervised learning part realized by minimizing the supervised loss L C will guide the G f to extract features which are discriminative for bearing health conditions, and train the effective classifier G y which can accurately diagnosis bearing fault with these features.

Figure 3
Figure 3 CWRU bearing test rig

Method 1 (
DDC): Deep domain confusion (DDC) [27] is a deep transfer learning method proposed by Tzeng et al., which utilizes MMD for single-layer feature alignment.Method 2 (FTNN): Feature-based transfer neural network (FTNN) [19] is proposed by Yang et al, which applied MMD-based multi-layer feature alignment and pseudo label learning to transfer fault diagnosis knowledge from laboratory bearings to locomotive bearings.Method 3 (DTN) [23]: Deep transfer network (DTN) is a cross-domain fault diagnosis method proposed by Han et al.It utilizes MMD and CMMD to evaluate the marginal and conditional distributions discrepancies, respectively.They are given equal weights for single-layer feature joint distribution adaptation.Method 4 (IWC): IWC is derived from DDATN, which only takes the conditional part of IDMMD as the evaluation of the domain divergence.Method 5 (IWCM): IWCM is also derived from DDATN, which allocates equal weights to the marginal and conditional parts of the IDMMD.
This article proposes a novel unsupervised domain adaptation method termed DDATN.It introduces DDA for cross domain bearing intelligent fault diagnosis.The DDA is realized by the proposed IDMMD, which combines novel dynamic factor estimation method and instance-weighted conditional distribution discrepancy metric.The cross domain bearing fault diagnosis experiment is conducted to verify the effectiveness of DDATN.DDATN achieved better performance than other state-of-the-art cross domain fault diagnosis methods.The results demonstrate that the proposed conditional distribution discrepancy metric and the dynamic factor calculation method are effective and robust for DDA.Therefore, DDATN can effectively adapt the target domain features distributions to that of source domain for better cross domain bearing fault diagnosis.
transferred fault diagnosis model from laboratory bearings to locomotive

Table 1
Details of the CWRU bearing dataset

Table 2
Details of the OL bearing dataset

Table 3
Cross equipment tasks

Table 4
Structures of G f and G y

Table 5
Comparison resultsThe highest accuracies of each row are indicated in bold