Skip to main content
  • Original Article
  • Open access
  • Published:

A Dual-Task Learning Approach for Bearing Anomaly Detection and State Evaluation of Safe Region


Predictive maintenance has emerged as an effective tool for curbing maintenance costs, yet prevailing research predominantly concentrates on the abnormal phases. Within the ostensibly stable healthy phase, the reliance on anomaly detection to preempt equipment malfunctions faces the challenge of sudden anomaly discernment. To address this challenge, this paper proposes a dual-task learning approach for bearing anomaly detection and state evaluation of safe regions. The proposed method transforms the execution of the two tasks into an optimization issue of the hypersphere center. By leveraging the monotonicity and distinguishability pertinent to the tasks as the foundation for optimization, it reconstructs the SVDD model to ensure equilibrium in the model's performance across the two tasks. Subsequent experiments verify the proposed method's effectiveness, which is interpreted from the perspectives of parameter adjustment and enveloping trade-offs. In the meantime, experimental results also show two deficiencies in anomaly detection accuracy and state evaluation metrics. Their theoretical analysis inspires us to focus on feature extraction and data collection to achieve improvements. The proposed method lays the foundation for realizing predictive maintenance in a healthy stage by improving condition awareness in safe regions.

1 Introduction

The development of equipment maintenance techniques has experienced three stages so far, including reactive maintenance, preventive maintenance, and predictive maintenance, respectively [1]. Predictive maintenance effectively reduces maintenance costs [2], and its attainment relies on the awareness of state changes. Since the state change in the abnormal stage is more significant than the healthy stage, the existing research of predictive maintenance mainly focuses on the abnormal stage. Before the anomalies occur, anomaly detection provides a qualitative means of condition monitoring, which can reduce accidents before equipment damage [3]. Given its pivotal role in safeguarding the operation of industrial equipment and systems [4], anomaly detection has garnered extensive research attention. This paper organizes anomaly detection techniques into three groups of methods: reconstruction-based methods, classification-based methods, and distance-based methods.

Reconstruction-based methods define anomalies by meticulously analyzing deviations in domain mapping and the data reconstruction processes. Huang et al. [5] advanced a memory residual regression autoencoder to improve the detection accuracy; it used the reconstruction errors and surprisal values to indicate the abnormal condition of the bearing. Jiang et al. [6] proposed a generative adversarial network to realize sample reconstruction and common feature learning, which overcomes the problem of data imbalance. Mao et al. [7] presented a new self-adaptive mapping strategy for incipient fault online detection. An autoencoder was used to extract features in data reconstruction, and a classifier was introduced to distinguish these features. An adaptive threshold method was proposed in Ref. [8] based on the extreme theory, and the network reconstruction residuals were used for anomaly detection and location. This method’s reconstruction error is generally insensitive to the incipient anomaly [5].

Classification-based methods treat anomaly detection as a classification task, including one-class classification and two-class classification. The one-class classification method arises from the reality that the collected operation data often only includes healthy data. Vos et al. [9] combined a long short-term memory network and a one-class support vector machine to solve this problem for vibration data. Combining the two methods can also be used for data sequences with variable length [10]. Zhang et al. [11] advanced an end-to-end algorithm, which designed a novel loss function to jointly learn shapelets and support vector data description (SVDD) decision boundary. Zhao et al. proposed a dynamic radius SVDD [12] to detect the anomalies of aircraft engines; the angle calculation was introduced in feature space to solve the neglected irregularities of the hypersphere. When healthy and abnormal data are accessible, two-class classification methods can be applied for anomaly detection. Song et al. [13] used a meta-learning-based method to achieve few-shot anomaly detection. In Ref. [14], the oil and bearing temperatures were treated as two univariate variables to construct a dual support vector machine model and analyze the adaptive threshold of the binary classification model. In addition, the time-frequency analysis method can detect anomalies by diagnosing the specific fault [15]; it considers the equipment healthy when no faults are identified.

Distance-based methods detect anomalies by measuring distances in a designated metric space. Montechiesi et al. [16] advanced a modified artificial immune system to achieve anomaly detection through the similarity calculation between antigens based on Euclidean distance minimization. Liu et al. [17] decreased the false alarm rate of anomalies based on the information fusion of two distance dimensions: the spatial dimension considered the differences between different locations simultaneously. In contrast, the time dimension considered the differences between actual and prediction values at different times in the same location.

Regardless of the three types of methods, anomaly detection can be attributed to constructing a safe region. The differences between them are how they are constructed and the metric spaces they are constructed in. As illustrated in Figure 1, since the monitoring data in the healthy stage is generally stable, the safe region is treated as a black box where we do not care about its state changes before the anomalies occur. It assumes that the health state is immutable in a safe region. However, this qualitative way makes the identification of anomalies necessarily a sudden event. Correspondingly, preventive measures are taken to deal with this uncertainty, especially for scenarios where anomalies are not allowed or the abnormal stage is very short. These bring many disadvantages, including increased downtime, maintenance costs, safety risks, etc.

Figure 1
figure 1

Difference between the proposed scheme and the traditional scheme

To deal with these problems, quantifying the time-varying health state is considered in implementing anomaly detection. That is, the health state evaluation in the safe region (HSESR) is introduced into the construction of the safe region. The state information helps us understand the operating status of equipment better so that reasonable maintenance can be performed to extend equipment life, reduce safety risks, and improve production efficiency. For HSESR, the main difficulty is that the health indicators are generally stable and cannot reflect the changing trend of the health state. Based on the irreversibility of mechanical degradation [18], monotonic feature extraction is the key problem of HSESR.

Boundaries tend to tightly wrap the healthy data in constructing the safe region to prevent misjudgment. The anomalies of bearings always lead to significant changes in the distribution of monitoring data. This change makes abnormal data distinguishable from healthy data. In these cases, a tight envelope is no longer necessary, and it is very cost-effective to release the envelope for additional gain without affecting anomaly detection. Following this idea, the improvement of feature monotonicity is set as the ‘additional gain’ for HSESR.

Taking advantage of the good feature extraction performance in kernel space, we propose a new scheme of anomaly detection based on the variant of the classic SVDD modeling method. By introducing the time-varying health state into the construction of the safe region, the proposed scheme prevents the suddenness of anomalies without compromising anomaly detection. In this way, anomaly detection and HSESR are unified under the same framework. The contributions of this paper are summarized below.

  1. 1)

    This paper opens a new way of thinking to deal with the suddenness of anomalies, that is, to explore trend indicators to characterize the state information in a safe region.

  2. 2)

    This paper deduces the solution principle of the univariate SVDD model under the condition of a certain hypersphere center.

  3. 3)

    A new framework is proposed by unifying anomaly detection and HSESR under the same framework.

2 Methodology

The proposed method is inspired by SVDD, which has been proven effective in detecting anomalies. It constructs a hypersphere in high-dimension kernel space as the boundary of the safe region, and anomaly detection is achieved based on the distances defined in the kernel space. To achieve anomaly detection and HSESR simultaneously, we re-organize the hypersphere-solving process. In particular, the collaborative optimization of the center and boundary is changed to two independent optimizations. The center optimization obtains the health indicator (HI) sequence to evaluate the healthy state, and the boundary optimization combines the optimized center for anomaly detection. The framework of the proposed method is shown in Figure 2. It consists of four portions: date preparation, hypersphere center optimization, boundary solving, as well as the implementation of anomaly detection and HSESR. The following assumptions must be satisfied for application. (1) The chronological order of the data must be clear; (2) healthy and abnormal data can be distinguished well; (3) degradation features can be captured in a safe region.

Figure 2
figure 2

Framework of the proposed method

2.1 Data Preparation

Data preparation is to obtain the data required by the model. It consists of three steps: dataset division, feature extraction, and feature selection.

Dataset division is to divide the data into a training dataset and a test dataset. Since healthy data accounts for the majority, whereas abnormal data is rare in real industrial data, half of the abnormal data are randomly assigned as the test data to prevent the number of either dataset from being too small. At the same time, the same amount of healthy and abnormal data is randomly sampled as the test data. Then, the remaining data is regarded as training data. Feature extraction is to calculate the features to reflect the health condition of the bearing. Feature selection is to screen the features that meet specific requirements. To present state information alongside anomaly detection, two feature indicators need to be determined: one emphasizes the realization of anomaly detection, the ability to distinguish between healthy and abnormal phases is the key for this feature; the other focuses on reflecting changes in health state, the monotonicity is prioritized for it because the degradation process of mechanical components is theoretically irreversible [18], and the true inherent health condition is commonly assumed to deteriorate over time. The monotonicity is generally defined as follows:

$$Mon1 = \frac{{\left| {\sum\limits_{{K - 1}} {\delta \left( {H(t + 1) - H(t)} \right)} - \sum\limits_{{K - 1}} {\delta \left( {H(t) - H(t + 1)} \right)} } \right|}}{{K - 1}},$$

where K is the total number of samples; δ(·) is the simple unit step function; and H(t) refers to the value of HI at time t.

However, it merely focuses on the local monotonicity but neglects the influence of each point on the global monotonicity. Figure 3 provides two examples, and their corresponding metric results are shown in Table 1. Line 2 of Figure 3(a) is regarded as having better monotonicity than the other according to Eq. (1) because only point C weakens its monotonicity while points b and d weaken Line1’s. This contradicts the observable fact. The mistake stems from neglecting the affection of points D, E, and F to the monotonicity from a global view.

Figure 3
figure 3

Two cases for monotonicity evaluation

Table 1 The monotonicity metric results of the lines in Figure 3

Additionally, the monotonicity evaluation of each point in Eq. (1) is qualitative, which cannot precisely reflect the difference in their monotonicity. For another instance, the monotonicity of the two lines in Figure 3(b) is the same according to Eq. (1). They are not the same because the influences of the two local minimum points R and r to monotonicity differ.

To accurately describe the monotonicity, this paper introduces the inverse number as a supplement of Eq. (1). In mathematics, for a real array A, which contains N numbers, if i < j and A[i] > A[j], (A[i], A[j]) is called an inverse pair. The total number of inverse pairs in an array is called the inverse number. The inverse number considers the monotonicity from the perspective of adjacent points and the overall relationship among all points. On this basis, a new evaluation index for monotonicity has been devised, transforming the inverse number into a metric that falls within the [0,1] scope.

$$ Mon2 = 1 - \frac{{2 \times I({\textbf{A}})}}{{N\left( {N - 1} \right)}}, $$

where I(A) returns the inverse number of array A.

The metrics values for the four lines depicted in Figure 3 are presented in Table 1, corroborating the preceding analysis. In Figure 3(a), points D, E, and F all produce inverse pairs in Line2 while only points c and e in Line1. Thus, we can correctly judge with the Mon2 to consider the global monotonicity. In addition, the Mon2 achieves a quantitative assessment of monotonicity. In Figure 3(b), point R produces one more inverse pair than point r, which allows us to distinguish the subtle difference in the monotonicity of the two figures. Accordingly, the inverse number selects the feature indicator that best reflects the health state change in a safe region.

After determining the feature indicators, the proposed method fuses them into the HI. We draw on the idea of SVDD to map input features to the hyperspace and utilize the distances in the hyperspace to construct HIs. Since only the envelope is concerned, the traditional SVDD model is achieved by balancing two core elements: the center a and the radius R. The model construction of the proposed method is transformed into independent computations to consider both anomaly detection and HSESR. Accordingly, it is attained by hypersphere center optimization and boundary construction.

2.2 Hypersphere Center Optimization

Anomaly identification and state assessment depend on the HI array, while the HI array has a one-to-one correspondence with the hypersphere center. Therefore, the center optimization is a problem of HI array optimization.

2.2.1 Optimization Model

Let the sample feature set be denoted as {si}. Referring to the traditional SVDD model, the hypersphere center a is defined [19]:

$$ {\varvec{a}}\, = \,\sum\limits_{i} {\gamma_{i} \phi (s_{i} ),\quad {\text{s.t}}.\,\sum\limits_{i} {\gamma_{i} = 1,} } $$

where γi is a weight factor, it reflects the contribution of the feature si to the center; ϕ is a mapping function, and it implicitly maps the data to feature space.

Then, the HI array is expressed by the distances between samples and the hypersphere center with

$$d_{i}^{2} = (\phi (s_{i} ) - {\varvec{a}})^{2} .$$

Bring Eq. (2) into Eq. (3), we have

$$d_{i}^{2} = k(s_{i} ,s_{i} ) - 2\sum\limits_{j} {\gamma_{j} } k(s_{i} ,s_{j} ) + \sum\limits_{ij} {\gamma_{i} \gamma_{j} } k(s_{i} ,s_{j} ),$$

where k(si, sj) is the kernel function, and the Gaussian RBF kernel is adopted in this article.

$$k\,\left( {x_{i} ,x_{j} } \right) = \exp \,\left( { - \frac{{\parallel x_{i} - x_{j} \parallel ^{2} }}{{2\sigma ^{2} }}} \right),{\text{ }}\sigma > 0,$$

where σ is the bandwidth, controlling the radial range of action.

If we denote the HI array as H, then H = {d12, d22, …, dN2}, where N is the number of samples. Except for the hyperparameter σ, H is determined by the group of weight factors {γ1, γ2, …, γN}. Therefore, the problem of HI array optimization is further expressed by implicit function model G(γ1, γ2, …, γN).

The optimization of the model includes monotonicity optimization with only healthy data and distinguishability optimization with healthy and abnormal data.

  1. (1)

    Optimization Model Considering Monotonicity

    To reflect the condition degradation inside the safe region, monotonicity is utilized for the model construction. Denote the feature set of healthy training data as {sih, i = 1, 2, …, NH}; the subscript h to the corresponding parameters of the healthy data. Bring them into Eqs. (2) and (4), we have

    $${\varvec{a}}_{1} \, = \,\sum\limits_{i} {\gamma_{i}^{h} \phi (s_{i}^{h} )} ,\quad {\text{s}}.{\text{t}}.,\,\sum\limits_{i} {\gamma_{i}^{h} = 1,}$$
    $$\left( {d_{i}^{h} } \right)^{2} = k\left( {s_{i}^{h} ,s_{i}^{h} } \right) - 2\sum\limits_{j} {\gamma _{j}^{h} } k\left( {s_{i}^{h} ,s_{j}^{h} } \right) + \sum\limits_{{ij}} {\gamma _{i}^{h} \gamma _{j}^{h} } k{\mkern 1mu} \left( {s_{i}^{h} ,s_{j}^{h} } \right).$$

    Then, HI array is obtained as Hh = {(d1h)2, (d2h)2, …, (dNHh)2}. The inverse number is applied as the monotonicity metric, and its calculation function is marked as I(·); the optimization model is expressed as

    $$G_{1} \,\left( {\gamma _{1}^{h} ,\gamma _{2}^{h} ,...,\gamma _{{NH}}^{h} } \right) = \min I(H^{h} ),\quad \text{s.t.},{\mkern 1mu} \sum\limits_{i} {\gamma _{i}^{h} } {\mkern 1mu} = {\mkern 1mu} 1.$$
  2. (2)

    Optimization Model Considering Distinguishability

    Distinguishability is also introduced for the model construction to describe the difference between healthy and abnormal data. In addition to the healthy data, rare but valuable abnormal data is exploited to improve boundaries' distinguishability through center optimization indirectly. Specifically, the center described by the healthy data is kept as far away from the abnormal data as possible. Denote the feature set of abnormal training data as {spa, p = 1, 2, …, NA}, the subscript a refers to the corresponding parameters of the abnormal data. The distances from each abnormal point to the center can be expressed as

    $$\left( {d_{p}^{a} } \right)^{2} = {\mkern 1mu} \left( {\phi (s_{p}^{a} ) - \,\varvec{a}_{2} } \right)^{2} ,$$
    $$\varvec{a}_{2} \, = \sum\limits_{i} {\gamma _{i}^{a} \phi \,\left( {s_{i}^{h} } \right),\quad } {\text{s}}.{\text{t}}.,{\mkern 1mu} \sum\limits_{i} {\gamma _{i}^{a} } {\mkern 1mu} = {\mkern 1mu} 1.$$

    Bring Eq. (10) into Eq. (9), we have

    $$\left( {d_{p}^{a} } \right)^{2} = k{\mkern 1mu} \left( {s_{p}^{a} ,s_{p}^{a} } \right) - 2\sum\limits_{i} {\gamma _{i}^{a} } k{\mkern 1mu} \left( {s_{i}^{h} ,s_{q}^{a} } \right) + \sum\limits_{{ij}} {\gamma _{i}^{a} \gamma _{j}^{a} } k{\mkern 1mu} \left( {s_{i}^{h} ,s_{j}^{h} } \right).$$

    Their HI array is obtained as Ha = {(d1a)2, (d2a)2, …, (dNAa)2}. The abnormal points are expected to be as far away from the safe region to achieve distinguishability. Thus, the optimization model is expressed as

    $$G_{2} \,\left( {\gamma _{1}^{a} ,\gamma _{2}^{a} ,...,\gamma _{{NA}}^{a} } \right){\mkern 1mu} \, = {\mkern 1mu} \max {\mkern 1mu} \left( {\sum {H^{a} } } \right),\quad \text{s.t.},{\mkern 1mu} \sum\limits_{i} {\gamma _{i}^{a} } {\mkern 1mu} = {\mkern 1mu} 1.$$

2.2.2 Model Solution and Center Expression

  1. (1)

    Model Solving Based on Genetic Algorithm

    The models of Eqs. (9) and (13) are the problems of multivariate implicit function optimization. We introduce the genetic algorithm to automatically seek the optimal solution based on natural selection and genetic mechanisms. The optimizations are achieved by following processes.

    Step 1. Parameter initialization. Suppose the initial population is M, the individual P is expressed as

    $$\begin{gathered} P_{j} = [\gamma_{{1}} ,\gamma_{2} , \cdots ,\gamma_{N} ]_{j} = [\gamma_{1j} ,\gamma_{2j} , \cdots ,\gamma_{Nj} ],\forall j = 1,2, \cdots ,M, \hfill \\ {\text{s.t.}},\,\sum\limits_{i} {\gamma_{ij} = 1} ,\forall i = 1,2, \cdots ,N. \, \hfill \\ \end{gathered}$$

    Step 2. HI calculation. Based on the M groups of Pj, we can get the distance set {H1, H2, …, HM} by Eq. (5).

    Step 3. Fitness calculation. The fitness function of each population is calculated as the index of their fitness.

    Step 4. Convergence strategy. As a result, it fluctuates greatly with the σ and the maximum number of iterations allowed is selected instead of the specific threshold.

    Step 5. Survival of the fittest through natural selection. After the selection operations, crossover, and mutation, a more adaptable population is obtained for further evolution until it meets the stop criteria.

  2. (2)

    Fusion Representation of the Hypersphere Center

    After applying the genetic algorithm to solve the optimizations for monotonicity and distinguishability, we obtain two sets of optimized parameters. They are further used for two optimized centers with Eqs. (7) and (11). Subsequently, the weight coefficient coef is introduced to balance them, and we obtain the final hypersphere center:

    $$\user2{a} = \user2{f(a}_{1} \user2{,a}_{2} ,coef) = \,\left[ {coef \cdot P^{h} + (1 - coef) \cdot P^{a} } \right] \cdot \varvec{\phi} ^{h} ,$$

    where ϕh = [ϕ(s1h), ϕ(s2h), …, ϕ(sNHh)]T and the subscript T signifies the transpose symbol.

    $$P^{h} = \mathop {\arg }\limits_{{\gamma _{1}^{h} ,\gamma _{2}^{h} ,...,\gamma _{{NH}}^{h} }} \min I\,\left( {H^{h} } \right),$$
    $$P^{a} = \mathop {\arg }\limits_{{\gamma _{1}^{a} ,\gamma _{2}^{a} ,...,\gamma _{{NA}}^{a} }} \max \,\left( {\sum {H^{a} } } \right).$$

    The coef needs to be manually adjusted in the range [0,1] according to the characteristics of the data. The adjustment is to balance the monotonicity and distinguishability of HI. When the distinguishability of feature 1 is good, a large coef is feasible to improve the monotonicity of HI. Otherwise, a smaller coef is necessary to avoid the impact on anomaly detection.

2.3 Boundary Construction

With the optimized hypersphere center, the boundary is solved based on a variant of SVDD. As a is fixed, the variant is a univariate model converted from the original bivariate model:

$$\begin{gathered} \min {\text{ }}F(R) = R^{2} + C\sum\limits_{i} {\xi _{i} } , \hfill \\ {\text{s}}.{\text{t}}.,\,\parallel \phi \,\left( {s_{i}^{h} } \right) - \user2{a}\parallel ^{2} \le R^{2} + \xi _{i} ,{\text{ }}\xi _{i} \ge 0,{\text{ }}\forall i = 1,2, \ldots ,N, \hfill \\ \end{gathered}$$

where C is a trade-off to balance the volume and errors; and ξi is the slack variable to allow more points to be contained in the hypersphere with the constraint of ξi > 0.

The Lagrange multipliers method is applied to incorporate the constraints into the model. The minimization problem is transformed into a maximum one, as shown below:

$$\begin{gathered} \max {\text{ }}L\,\left( {R,\alpha _{i} ,\beta _{i} ,\xi _{i} } \right) = R^{2} + C\sum\limits_{i} {\xi _{i} } - \sum\limits_{i} {\beta _{i} \xi _{i} } \hfill \\ {\text{ }} - \sum\limits_{i} {\alpha _{i} \,\left( {R^{2} + \xi _{i} - \parallel \phi (s_{i}^{h} ) - a\parallel ^{2} } \right)} , \hfill \\ \end{gathered}$$

where αi and βi are Lagrange multipliers with the constraints αi ≥ 0 and βi ≥ 0, respectively.

Seeking the partial derivatives of the variables and setting them to 0, we have

$$\left\{ \begin{gathered} \frac{\partial L}{{\partial R}} = 0 \Rightarrow \sum\limits_{i} {\alpha_{i} = 1} , \hfill \\ \frac{\partial L}{{\partial \xi_{i} }} = 0 \Rightarrow C = \alpha_{i} + \beta_{i} . \hfill \\ \end{gathered} \right.$$

By introducing Eqs. (15) and (20), Eq. (19) is converted into Eq. (21):

$$\begin{gathered} max\,L\, = \,\underbrace {{k\,\left( {s_{i}^{h} ,s_{i}^{h} } \right)}}_{{{\text{constant}}}} + \underbrace {{{\text{sum}}\,\left( {\gamma _{{new}}^{{\text{T}}} \cdot \gamma _{{new}} \circ \,K} \right)}}_{{{\text{constant}}}} \hfill \\ - \,2\, \times \,\,{\text{sum}}\,\left( {A \cdot \,\gamma _{{new}} \circ K} \right), \hfill \\ \end{gathered}$$

where γnew = coef·Ph +(1-coef)·Pa; K = [k(sih, sjh)]NH×NH, and i, j = 1, 2, …, NH; A=[α1, α2, …, αi,…, αNH]T, and 0 ≤ αiC, sum(A) = 1; sum(·) is summation function and refers to the Hadamard product.

In Eq. (21), the first term equals 1 using the Gaussian RBF kernel. The second term is also a constant because γnew has been determined with a genetic algorithm, and K can be calculated with the training data and kernel function. Thus, the last term is the only changeable term that changes with the elements of A. As only the data sih with the αi > 0 describes the boundary, these data are called the support vectors of the description [20]. Assuming R2 is the radius square of the hypersphere, we have

$$R^{2} = \frac{1}{{N_{sv} }}\sum {\parallel \phi (s_{sv} ) - a\parallel^{2} } ,$$

where Nsv is the number of support vectors; and ssv denotes the support vectors.

2.4 Implementation of Anomaly Detection and HSESR

In the proposed method, HSESR and anomaly detection are the two goals we want to achieve simultaneously.

HSESR can be achieved with the HI array of healthy data. The HIs of the training and test data are all calculated to illustrate the implementation process and effect of the proposed method. After getting the optimized hypersphere center, the HI array of healthy data is acquired by the distances of their features from each point to the hypersphere center in kernel space. Sorting the elements of an array by time, the sorted sequence can be considered time-continuous. To make the trend more prominent, a smoothing of the moving average is adopted to remove the effect of volatility, and the smoothed indicator can display the state change of healthy data with time.

Anomaly detection needs to be judged by both HI array for healthy and abnormal data. Unlike HSESR, the obtained array in anomaly detection is temporal discrete. In addition, it requires the introduction of boundaries for anomaly judgment. In this way, anomaly detection and HSESR are unified into the same framework and implemented simultaneously.

3 Experimental Validation and Discussion

There are two dataset types for bearing prognostics and health management. One is a fault dataset whose data is collected under different bearing faults, and the other is a degradation dataset that consists of time-continuous operation data. The latter is selected for validation because it satisfies the data assumption of the proposed method. Accordingly, two bearing degradation benchmark datasets are used for experiments.

3.1 Experimental Dataset

XJTU-SY bearing dataset has large data amounts and abundant failure types [21]. Three different operating conditions were tested, including a 12 kN load at 2100 r/min, an 11 kN load at 2250 r/min, and a 10 kN load at 2400 r/min. Five bearings were tested under each operating condition, and two accelerometers were used to record vibration signals in the horizontal and vertical directions. The sampling frequency is 25.6 kHz, each time record collects 1.28 s of data, and the interval between two adjacent collections is 1 min.

PRONOSTIA bearing dataset is the most widely used bearing degradation dataset [22]. Three different operating conditions were tested, including a 4 kN load at 1800 r/min, a 4.2 kN load at 1650 r/min, and a 5 kN load at 1500 r/min. Seven bearings were tested under the first two operating conditions, and three bearings were tested under the last operating condition. Two accelerometers were installed to record vibration signals in the horizontal and vertical directions. The sampling frequency is 25.6 kHz, each time record collects 0.1 s of data, and the interval between two adjacent collections is 10 s.

3.2 Performance Evaluation Metric

This paper evaluates the experimental results concerning existing related research. The anomaly detection result is evaluated from the accuracy perspective, including the accuracy metrics of healthy data, abnormal data, and all data. The result of HSESR is judged from the perspective of monotonicity and correlation. The monotonicity is measured by both traditional metrics obtained by Eq. (1) and the newly introduced metric of inverse number; the correlation is estimated by two classic metrics: Pearson coefficient is used to show the linear correlation of the HI array with time, and the Spearman coefficient reflects the nonlinearity of the HI array through monotonicity.

3.3 Comparison Methods

Three distinguished methods from varied domains have been chosen for comparison: RMS, the negative entropy of the squared envelope spectrum (NESES), and the autoencoder. RMS is a cornerstone metric in vibration analysis, providing a perspective on system states by measuring vibration intensity. NESES is adept at discerning nuanced variations in early-stage faults by gauging signal complexity [23]. On the other hand, the autoencoder, a sophisticated approach widely employed in anomaly detection [24, 25], excels at seamlessly extracting anomalous features directly from raw data.

3.4 Experimental Results

The feature indicators are selected based on the calculation results of 35 classical statistical characteristics [26] commonly used as HIs. The feature of standard deviation frequency reflects the degradation in a safe region best, and it is chosen as the first feature. Besides, the RMS value is selected as the other candidate.

After specifying the features to be extracted, we use the dataset of XJTU-SY for the experiment first. The metric results of anomaly detection are shown in Table 2. The Bearing 1_4 and Bearing 3_5 are excluded due to their insufficient data volume.

Table 2 Anomaly detection accuracy with the XJTU-SY dataset (%)

Across most datasets, all four methods effectively distinguish between healthy and abnormal states. The proposed method, in particular, demonstrates exceptional robustness and consistently delivers the most outstanding average accuracy. Further, the healthy data is collated based on temporal continuity, and the corresponding metrics for HSESR are computed. The original SVDD method is also applied as an additional comparison to see the changes before and after the improvement. The monotonicity results are shown in Figure 4.

Figure 4
figure 4

Comparison of monotonicity metrics with the XJTU-SY dataset

The result reveals that the monotonicity metrics of the proposed method see enhancements across all datasets, with the majority showing substantial improvements. Specifically, the Mon1 metric surges with an impressive average growth of 281.8%, and the Mon2 metric increases by 39.2%. Further, the correlation results are illustrated in Figure 5, where Cor1 and Cor2 correspond to the Pearson and Spearman coefficients.

Figure 5
figure 5

Comparison of correlation metrics with the XJTU-SY dataset

Consistent with the data above, the results in Figure 5 highlight the proposed method's pronounced augmentation in the correlation of HIs over time. This encompasses both linear and non-linear correlations. To elaborate, the linear correlation metric Cor1 rises by 132.6%, and the non-linear correlation metric Cor2 increases by 157.1%. Post-enhancements, the two metrics have values of 0.797 and 0.842, respectively. This transformation suggests a shift from a previously weak correlation to a very strong correlation.

The PRONOSTIA dataset was also tested for the proposed method's generalization performance. The metric results of anomaly detection are shown in Table 3, and the metric results of HSESR are calculated in Figures 6 and 7.

Table 3 Anomaly detection accuracy with the PRONOSTIA dataset (%)
Figure 6
figure 6

Comparison of monotonicity metrics with the PRONOSTIA dataset

Figure 7
figure 7

Comparison of correlation metrics with the PRONOSTIA dataset

For anomaly detection, the outcomes closely align with the XJTU-SY dataset. The performance of RMS is decent but exhibits some inconsistencies; NESES displays significant fluctuations. Both the autoencoder and the proposed method exhibit impressive performance. However, the former shows performance declines on certain datasets.

For HSESR, the proposed method brings about substantial improvements across all metrics. In terms of monotonicity metrics, the Mon1 sees a remarkable average rise of 270.2%, while the Mon2 increases by an average of 51.3%.

About the correlation metrics, the linear correlation represented by Cor1 surges by 198.1%, whereas the non-linear correlation, indicated by Cor2, grows by 120.1%. Following these optimizations, the values of the two metrics settle at 0.806 and 0.827, respectively. This marks a transition from an initially weak correlation to a very strong one.

Upon consolidating the metric results of the proposed method from both datasets, the following insights emerge: In anomaly detection, the proposed method showcases a consistently stellar accuracy, surpassing 99%. When we shift our focus to state evaluation, the metrics tell a tale of significant advancement. One of the monotonicity metrics registers a surge of roughly 276%, while its counterpart experiences a rise of about 45.3%. Parallelly, two correlation metrics experience robust growth, with average increases of 165.4% and 138.6%, respectively. Post-optimization, both the metrics settle at impressive averages of 0.802 and 0.835. This remarkable evolution highlights a transformative leap from an initial weak correlation to a very strong one, attesting to the soundness of the state assessment. The proposed method performs state assessments during healthy intervals and maintains unparalleled accuracy in detecting anomalies.

3.5 Result Discussion

Based on the experimental results, this section analyzes the proposed method from two aspects: effectiveness and deficiency.

The effectiveness refers to the attainment of anomaly detection and state evaluation. The achievement of anomaly detection is based on the second assumption of the proposed method, that is, healthy and abnormal data can be distinguished well. When abnormal data are not easily distinguished from healthy data, anomaly detection requires a tight envelope, and the boundary adjustment may affect the anomaly detection. When the abnormal data are distinguished from the healthy data, the proposed method can sacrifice part of the envelopment to achieve state assessment. Although the adjusted boundary becomes looser, the optimization of distinguishability is added in the adjustment process to ensure the effectiveness of anomaly detection.

The effectiveness of HSESR is interpreted from two perspectives. Perspective 1: Parameter adjustment. According to Eq. (4), the HI array is determined by the location of the hypersphere center, while the group of weight factors determines the center in Eq. (3). After applying a genetic algorithm to optimize the weight factors, the obtained HI array better reflects the health state in the safe region. As shown in Figure 8, the example with the XJTU-SY dataset intuitively shows that the proposed method improves the monotonicity of HI.

Figure 8
figure 8

Comparison of HI array with Bearing 3_3 in the XJTU-SY dataset: (a) Training data, (b) Test data

Perspective 2: Enveloping trade-offs. The optimization of the HI sequence is based on the reduction of envelope requirements, and Figure 9 shows the envelop plot of the data in Figure 8. Compared to the original SVDD model, the envelop boundary of the proposed method becomes looser, as shown in Figure 9(b). Due to the good distinguishability of the data, the loose boundary can still separate healthy data from abnormal data very well, just as Figure 9(a) shows. At the same time, the released envelope is directly transformed into the monotonic gain of the HI sequence through model optimization, which ensures the effectiveness of HSESR.

Figure 9
figure 9

Envelope visualization with the Bearing 3_3 in the XJTU-SY dataset: (a) Data distribution and envelope visualization, (b) The envelope changes of the proposed method compared to the SVDD model

Two deficiencies exist in the experimental results. The first one is that the accuracy of several anomaly detection results is not very high, such as Bearing 1_1, Bearing 1_2, and Bearing 2_2 of the PRONOSTIA dataset. The reason is that the distinguishability of the feature indicators is not good enough; in other words, the data does not meet the applicable conditions of the proposed method.

Another shortcoming is that the HSESR metrics of some data are still not good enough, even though these trend features have been greatly improved. These unsatisfactory performances are also closely related to the application hypotheses of the proposed method. For instance, the abnormal data for Bearing 2_5 from the PRONOSTIA dataset is not distinctly differentiated from the healthy data, indicating a weak alignment with the second hypothesis. Accordingly, the data envelope has to be tight to prevent the misjudgment of abnormality, and the HI monotonicity cannot be well considered. Another example, the HSESR of Bearing 3_2 in the XJTU-SY dataset is invalid because all metrics of HSESE are too small to reflect the tendency. It can be attributed to the failure to satisfy the third hypothesis, i.e., the trend of extracted features in safe regions is too poor. The trend of the features is positively correlated with the metrics of the HSESE. For example, the extracted trend features do not reflect the condition degradation well in Bearing 1_2 in the XJTU-SY dataset and Bearing 2_1 of the PRONOSTIA dataset; the corresponding correlation metrics of their results are all lower than 70%, which is lower than the other data.

Therefore, future improvements can be made from the following aspects.

  1. 1)

    Better features must be explored to characterize the degradation in safe regions. As the degradation process of mechanical components is theoretically irreversible [18], it is commonly assumed that the true inherent health condition decreases with time [27]. Since the degradation inside the safe region is imperceptible, most studies assume no degradation exists. This degradation is theoretically inevitable. The study of healthy features is expected to improve the predictive maintenance inside the safe region.

  2. 2)

    The feature performances of generalization and robustness are expected to improve the indicator's effectiveness for more and wider data. Although the adopted features work well on most data, some cases still do not apply.

  3. 3)

    More data are expected to be collected. The amount of data used in the modeling is insufficient, making the boundary's generalization ability insufficient for unknown data.

4 Summary and Conclusions

In this paper, a dual-task learning approach is proposed to deal with the problem of suddenness in anomaly detection. By considering both the monotonicity and distinguishability of the HIs in model construction, the proposed scheme unifies anomaly detection and HSESR under the same framework. Experimental outcomes from two datasets reveal that the proposed method ensures an impressive average anomaly detection accuracy surpassing 99% and excels in state evaluation. The correlation indicators have surged upwards by over 150%, reaching values beyond 0.8. This signifies a shift in correlation from its initial weak correlation to an extremely strong one. Still, some data results were suboptimal because the application assumption was unsatisfied. Analysis of the results showed that the data amount and the extracted features are the key factors affecting the effect of the method. Accordingly, they enlighten us on the direction for further improvement in the future. The proposed method lays the foundation for implementing predictive maintenance throughout the life cycle by improving state awareness in safe regions.

Availability of Data and Materials

The datasets generated and analyzed during the current study are available in the XJTU-SY dataset repository,, and PRONOSTIA dataset repository, datasets/alanhabrony/ieee-phm-2012-data-challenge.


  1. A Bousdekis, B Magoutas, D Apostolou, et al. Review, analysis and synthesis of prognostic-based decision support methods for condition based maintenance. Journal of Intelligent Manufacturing, 2018, 29: 1303-1316.

    Article  Google Scholar 

  2. A Cubillo, S Perinpanayagam, M Esperon-Miguez. A review of physics-based models in prognostics: Application to gears and bearings of rotating machinery. Advances in Mechanical Engineering, 2016, 8(8): 1687814016664660.

    Article  Google Scholar 

  3. X L Ou, G R Wen, X Huang, et al. A deep sequence multi-distribution adversarial model for bearing abnormal condition detection. Measurement, 2021, 182: 109529.

    Article  Google Scholar 

  4. R N Liu, B Y Yang, E Zio, et al. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mechanical Systems and Signal Processing, 2018, 108: 33-47.

    Article  Google Scholar 

  5. X Huang, G R Wen, S Z Dong, et al. Memory residual regression autoencoder for bearing fault detection. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 1-12.

    Google Scholar 

  6. W Q Jiang, Y Hong, B T Zhou, et al. A GAN-based anomaly detection approach for imbalanced industrial time series. IEEE Access, 2019, 7: 143608-143619.

    Article  Google Scholar 

  7. W T Mao, J X Chen, X H Liang, et al. A new online detection approach for rolling bearing incipient fault via self-adaptive deep feature matching. IEEE Transactions on Instrumentation and Measurement, 2020, 69(2): 443-456.

    Article  Google Scholar 

  8. H S Zhao, H H Liu, W J Hu, et al. Anomaly detection and fault analysis of wind turbine components based on deep learning network. Renewable Energy, 2018, 127: 825-834.

    Article  Google Scholar 

  9. K Vos, Z X Peng, C Jenkins, et al. Vibration-based anomaly detection using LSTM/SVM approaches. Mechanical Systems and Signal Processing, 2022, 169: 108752.

    Article  Google Scholar 

  10. T Ergen, S S Kozat. Unsupervised anomaly detection with LSTM neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 3127-3141.

    Article  MathSciNet  Google Scholar 

  11. J T Zhang, B Zeng, W M Shen, et al. A one-class Shapelet dictionary learning method for wind turbine bearing anomaly detection. Measurement, 2022, 197: 111318.

    Article  Google Scholar 

  12. Y P Zhao, Y L Xie, Z F Ye. A new dynamic radius SVDD for fault detection of aircraft engine. Engineering Applications of Artificial Intelligence, 2021, 100: 104177.

    Article  Google Scholar 

  13. W B Song, D Wu, W M Shen, et al. Meta-learning based early fault detection for rolling bearings via few-shot anomaly detection. arXiv:2204.12637, 2022.

  14. H S Dhiman, D Deb, S M Muyeen, et al. Wind turbine gearbox anomaly detection based on adaptive threshold and twin support vector machines. IEEE Transactions on Energy Conversion, 2021, 36(4): 3462-3469.

    Article  Google Scholar 

  15. Y F Li, X Zhang, Z G Chen, et al. Time-frequency ridge estimation: An effective tool for gear and bearing fault diagnosis at time-varying speeds. Mechanical Systems and Signal Processing, 2023, 189: 110108.

    Article  Google Scholar 

  16. L Montechiesi, M Cocconcelli, R Rubini. Artificial immune system via Euclidean Distance Minimization for anomaly detection in bearings. Mechanical Systems and Signal Processing, 2016, 76-77: 380-393.

    Article  Google Scholar 

  17. Y Z Liu, Y S Zou, Y Wu, et al. A novel abnormal detection method for bearing temperature based on spatiotemporal fusion. Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit, 2021, 236(3): 317-333.

    Article  Google Scholar 

  18. Y Y Yin, Z L Liu, J H Zhang, et al. An adaptive sampling framework for life cycle degradation monitoring. Sensors, 2023, 23(2): 965.

    Article  Google Scholar 

  19. Z L Liu, M J Zuo, J L Kang, et al. Equipment health condition assessment method based on support vector data description. Chengdu: University of Electronic Science and Technology of China Press, 2022.

    Google Scholar 

  20. Z L Liu, J L Kang, X J Zhao, et al. Modeling of the safe region based on support vector data description for health assessment of wheelset bearings. Applied Mathematical Modelling, 2019, 73: 19-39.

    Article  MathSciNet  Google Scholar 

  21. B Wang, Y G Lei, N P Li, et al. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Transactions on Reliability, 2020, 69(1): 401-412.

    Article  Google Scholar 

  22. P Nectoux, R Gouriveau, K Medjaher, et al. PRONOSTIA: An experimental platform for bearings accelerated degradation tests. IEEE International Conference on Prognostics and Health Management, Denver, USA, June 18-21, 2012: 1-8.

  23. J Antoni. The infogram: Entropic evidence of the signature of repetitive transients. Mechanical Systems and Signal Processing, 2016, 74: 73-94.

    Article  Google Scholar 

  24. J U Ko, K Na, J S Oh, et al. A new auto-encoder-based dynamic threshold to reduce false alarm rate for anomaly detection of steam turbines. Expert Systems with Applications, 2022, 189: 116094.

    Article  Google Scholar 

  25. S Givnan, C Chalmers, P Fergus, et al. Anomaly detection using autoencoder reconstruction upon industrial motors. Sensors, 2022, 22(9): 3166.

    Article  Google Scholar 

  26. Z L Liu, J Qu, M J Zuo, et al. Fault level diagnosis for planetary gearboxes using hybrid kernel feature selection and kernel Fisher discriminant analysis. The International Journal of Advanced Manufacturing Technology, 2013, 67(5): 1217-1230.

    Article  Google Scholar 

  27. Z G Tian. An artificial neural network method for remaining useful life prediction of equipment subject to condition monitoring. Journal of Intelligent Manufacturing, 2012, 23(2): 227-237.

    Article  Google Scholar 

Download references


Not applicable.


Supported by Sichuan Provincial Key Research and Development Program of China (Grant No. 2023YFG0351), and National Natural Science Foundation of China (Grant No. 61833002).

Author information

Authors and Affiliations



YY designed the study, performed the assays, prepared the manuscript, and contributed to its application part; ZL conducted the optimization and assay validation studies; BG assisted with data analysis and processing; YY, ZL, and MZ participated in discussing the results and revised the manuscript; All authors read and approved the final manuscript.

Authors’ Information

Yuhua Yin, born in 1990, is currently a Ph.D. candidate at University of Electronic Science and Technology of China. He received an M.S. degree in mechanical engineering from Chongqing University, China, in 2014. His research interests include intelligent operation and maintenance, prognostic and health management.

Zhiliang Liu, born in 1984, is currently an Associate Professor at School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China. He received his Ph.D. degree in detection technology and automatic equipment from University of Electronic Science and Technology of China, in 2013. His research interests include intelligent maintenance for complex equipment using advanced signal processing and data mining methods.

Bin Guo, born in 1998, is currently pursuing an M.S. degree at University of Electronic Science and Technology of China. He received a B.S. degree in Mechanical Engineering from Chengdu University of Technology, China, in 2020. His research interests include signal processing and health maintenance of mechanical equipment.

Mingjian Zuo, born in 1962, is currently a principal scientist at Qingdao International Academician Park Research Institute, China, and a guest professor at University of Electronic Science and Technology of China. He received his Ph.D. degree in Industrial Engineering from Iowa State University, Ames, Iowa, U.S.A, in 1989. His research interests include system reliability analysis, maintenance modeling and optimization, signal processing, and fault diagnosis.

Corresponding author

Correspondence to Zhiliang Liu.

Ethics declarations

Competing Interests

The authors declare no competing financial interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, Y., Liu, Z., Guo, B. et al. A Dual-Task Learning Approach for Bearing Anomaly Detection and State Evaluation of Safe Region. Chin. J. Mech. Eng. 37, 4 (2024).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: