 Original Article
 Open Access
 Published:
Deep Spatiotemporal ConvolutionalNeuralNetworkBased Remaining Useful Life Estimation of Bearings
Chinese Journal of Mechanical Engineering volume 34, Article number: 62 (2021)
Abstract
The remaining useful life (RUL) estimation of bearings is critical for ensuring the reliability of mechanical systems. Owing to the rapid development of deep learning methods, a multitude of datadriven RUL estimation approaches have been proposed recently. However, the following problems remain in existing methods: 1) Most network models use raw data or statistical features as input, which renders it difficult to extract complex faultrelated information hidden in signals; 2) for current observations, the dependence between current states is emphasized, but their complex dependence on previous states is often disregarded; 3) the output of neural networks is directly used as the estimated RUL in most studies, resulting in extremely volatile prediction results that lack robustness. Hence, a novel prognostics approach is proposed based on a time–frequency representation (TFR) subsequence, threedimensional convolutional neural network (3DCNN), and Gaussian process regression (GPR). The approach primarily comprises two aspects: construction of a health indicator (HI) using the TFRsubsequence–3DCNN model, and RUL estimation based on the GPR model. The raw signals of the bearings are converted into TFRsubsequences by continuous wavelet transform and a dislocated overlapping strategy. Subsequently, the 3DCNN is applied to extract the hidden spatiotemporal features from the TFRsubsequences and construct HIs. Finally, the RUL of the bearings is estimated using the GPR model, which can also define the probability distribution of the potential function and prediction confidence. Experiments on the PRONOSTIA platform demonstrate the superiority of the proposed TFRsubsequence–3DCNN–GPR approach. The use of degradationrelated spatiotemporal features in signals is proposed herein to achieve a highly accurate bearing RUL prediction with uncertainty quantification.
Introduction
Bearings are widely used in rotating machinery, and their prognostic and health management (PHM) is crucial to the precision and reliability of mechanical systems [1,2,3]. As a significant aspect of the prognostics method, remaining useful life (RUL) estimation contributes significantly to the PHM of bearings [4]. Typical bearing RUL estimation methods primarily include two categories: modelbased and datadriven methods [5, 6]. Among them, modelbased approaches describe bearing degradation based on physical degradation models, e.g., crack propagation, corrosion, or wear [7]. However, modelbased methods are unsuitable for the prognostics of bearing with coupled fault modes or in complex mechanical systems [8, 9]. As a bottomtoup prognostics approach, datadriven methods can mine the degradation information in the acquired data automatically and have garnered wide attention recently.
In datadriven baring RUL estimation approaches, vibration velocity data are widely used to model the degradation process via statistical or machine learning methods. The primary statistical methods used are the Markov process [10], Kalman filter [11], Wiener process [12], and particle filter [13]. The rapid development of sensing and dataacquiring technologies has resulted in the significant increase in information to be employed for condition monitoring. However, high interference noise and complicated mapping render it difficult to model the statistical distribution of the data obtained [14, 15]. Recently, deep learning methods have provided a good alternative for predicting the RUL of bearings without requiring much prior knowledge regarding their degradation process [16, 17]. Many successful applications have been realized in the RUL estimation of bearings based on the deep belief network (DBN) [18], auto encoder [19], recurrent neural network (RNN) [20], long shortterm memory (LSTM) [21], and convolutional neural network (CNN) [22]. However, some problems remain in these methods.
Most previous studies use raw data or statistical features as model inputs only, which does not allow the extraction of complex information hidden in signals. However, signal analysis in a single domain exhibits limited faultrelated feature representation capability [23, 24]. An effective prediction depends on the wellinformed input for deeplearningbased approaches. However, prediction performance is still limited when powerful data analysis methods are used. Therefore, the time–frequency representation (TFR) method, which can reveal the complex structure of signals with multiple components, has been widely used to analyze nonstationary signals. Many studies have proven the effectiveness of TFRbased bearingfault diagnosis approaches. Klein [25] suggested a bearing diagnostics solution that applied imageprocessing techniques to TFRs. Shi [26] demonstrated that the effectiveness of time–frequency analysis depends on its capability in representing the signal energy. Recently, CNN models have demonstrated superiority for the analysis of images, and it can also be used in TFRbased RUL estimation. Hence, Zhu [22] performed the RUL prediction of bearings under one fixed operating condition based on the wavelet transform and the multiscale twodimensional (2D) CNN (MSCNN) model. However, two limitations exist in TFR–2DCNNbased bearing RUL predictions.

(1)
The degradation process is typically regarded as a nonlinear Markov process with nonGaussian characteristics in the 2DCNNbased RUL estimation of bearings, in which the complex temporal dependence of the data under previous states is disregarded [27, 28].

(2)
The output of neural networks is used directly as the estimated RUL in 2DCNNbased bearing RUL estimation approaches, which results in extremely volatile predictions that lack robustness. In practice, it is difficult to ensure the accuracy of each discrete predicted health indicator (HI) for the model [29].
Because the defect expansion of bearings is a gradually evolving process, the Markov property is unsuitable for RUL estimation tasks to some extent. Wang [30] constructed a degradation indicator for wear using the Monte Carlo method. For a specified time series, it is assumed that the generation of each observation result depends on an implicit variable that cannot be directly observed. Furthermore, Zhu and Liu [31] proposed a nonMarkovian hidden semiMarkov model method to provide prognostics that offered more powerful modeling capabilities for practical problems. These studies indicate the importance of temporal correlation for RUL prediction, which is disregarded in the 2DCNNbased bearing RUL estimation approach. In the gradual process of bearing degradation from normal to failure, each observation must be correlated significantly to several previous observations. Therefore, establishing a correlation between different degradation states is crucial for accurate RUL predictions. In addition, the overall degradation trend of the bearing obtained based on predicted HIs should be considered in the RUL estimation of bearings to compensate for the high volatility and poor robustness of the model.
In this regard, a novel bearing RUL estimation method is proposed based on the continuous wavelet transform (CWT) method, a threedimensional convolutional neural network (3DCNN) model, and a Gaussian process regression (GPR) model. It primarily comprises two aspects: 1) construction of HIs using the proposed TFRsubsequence–3DCNN model, and 2) bearing RUL estimation based on the GPR algorithm. A flowchart of the proposed TFR–3DCNN–GPR framework is presented in Figure 1. In practice, vibration signals in both the horizontal and vertical directions were monitored. It is meaningful to analyze signals in both directions because they can reflect the bearing status more comprehensively. The CWT method was used to convert the raw bearing signal into a TFR, and a set of TFR subsequences was created using an overlapping strategy. The 3DCNN model was applied to extract the hidden features from the TFRsubsequences and construct HIs. The GPR algorithm was adopted to model the distribution of HIs and calculate the confidence interval of the RUL prediction. The performance of the proposed approach was evaluated using vibration signals acquired from the PRONOSTIA platform.
The primary contributions of the proposed approach can be summarized as follows:

(1)
A novel TFRsubsequence–3DCNN framework is proposed for bearing degradation modeling. By effectively extracting the spatial features and temporal correlations of the TFRsubsequences generated from the measured vibration signals, the proposed method can support more accurate RUL estimation of bearings.

(2)
The GPR algorithm was adopted to model the distribution of HIs and calculate the confidence interval of the RUL prediction. Unlike most deeplearningbased bearing RUL estimation approaches, the proposed method can effectively mitigate prediction volatility and facilitate the quantification of estimation uncertainty.
The remainder of this paper is organized as follows. Section 2 presents the proposed TFRsubsequence–3DCNN model for HI construction comprehensively. In Section 3, the proposed GPRbased bearing RUL estimation approach is described. The validation experiments are presented in Section 4, in which a comparison of related studies is provided as well. The conclusions are presented in Section 5.
TFRSubsequence–3DCNN Model
The 3DCNN model was applied to extract the spatiotemporal features of the TFRsubsequences by performing a threedimensional (3D) convolution kernel [32] and constructing HIs that indicate bearing degradation. Compared with onedimensional time signal analysis or 2D time–frequency analysis, more hidden characteristic information of the bearing vibration signals can be extracted using the 3DCNN, and a more accurate RUL prediction effect is expected from using the proposed method.
ThreeDimensional Convolution Kernel for Spatiotemporal Feature Extraction
Figure 2 shows a comparison between 2D and 3D convolutions. The feature maps in each convolutional layer are connected with multiple consecutive TFRs in the upper layers in the 3D convolution kernel to capture more temporal information of the bearing TFRs. The 2D convolutional layers were connected to the local receptive field of the feature maps, where the weights were passed through activation functions. If we denote \(v_{ij}^{xy}\) as a unit at (x, y) of the jth feature map and the ith layer, then
where f refers to the activation function, b_{ij} the bias of the corresponding feature map, m the index number, and \(\omega_{ijm}^{pq}\) the value of the kernel at position (p, q); height P_{i} and width Q_{i} correspond to the ith convolution kernel. The Markov property indicates that the state of the model at t > t_{0} is independent of that before t_{0}.
The 3D convolution kernel adds a temporal scale hyperparameter R to the 2D convolution kernel to extract the information at different temporal scales. The value at \(\left( {x,y,z} \right)\) corresponding to the jth feature map of the ith layer is expressed as
where R_{i} refers to the temporal scale of the ith layer 3D kernel, and \(\omega_{ijm}^{pqr}\) is the value at (p, q, r) of the kernel corresponding to the mth feature map. A cube with fixed consecutive TFRs will be formed in the 3D convolution, and it contains both the spatial information of each single TFR and the temporal information along adjacent TFRs.
Constructed TFRSubsequence–3DCNN Model for HI Construction
The CWT enables a multiscale refinement analysis of signals through expansion and translation functions, thereby effectively extracting time–frequency domain information from signals. Compared with other time–frequency analysis methods, such as shorttime Fourier transform, WignerVille distribution (WVD), pseudo WVD (PWVD), and smooth pseudo WVD, the CWT offers advantages of low computational complexity and tight support [33]. Considering the basic wavelet function \(\psi (t)\), it can be scaled or expanded k times and translated into l steps as follows:
For a signal \(s(t)\), the CWT can be expressed as
where \(\psi^{*} (t)\) is the conjugate function of the basic \(\psi (t)\).
The proposed TFRsubsequence–3DCNN model is illustrated in Figure 3. Combining TFRs in two directions, i.e., the horizontal and vertical directions, into one subsequence with height H, width W, and channel number C=2, the TFRsubsequence for the current time can be obtained. The corresponding HI for the training dataset is defined as the degradation percentage at time t, i.e., HI values of 0 and 1 correspond to perfectly healthy and failure, respectively. Hence, a higher health indicator value indicates the worse health and lower RUL of the bearings. Using the runtofailure time T for the training data, the corresponding HI is expressed as
Details of the overall network architecture are listed in Table 1. The labeled TFRsubsequence was used as the 3D convolution input, which should be a fivedimensional tensor in the form of (N, H, W, L, C), where H and W represent the height and width, respectively; C refers to the channel number of the current TFRsubsequence; L is the overlapping number of the TFRsubsequences (L = 1 in 2D convolution); N refers to the number of samples. The outputs of the model, i.e. the HIs, can effectively reflect the degradation of the bearings. Two fully connected layers exist in the network after the three convolution layers. The sizes of the three 3D convolution kernels were 7×7×2, 3×3×2, and 3×3×2, separately. The stride of each convolution layer was set to 2×2×1. In addition, one dropout layer was added after the last convolution layer to prevent overfitting in the training. The sigmoid function was used as the activation function in the last layer of the network to fit the established HI. The rectified linear unit (ReLU) function was used as the activation function for the other layers to overcome gradient disappearance. For a variate v, the sigmoid and ReLU functions are expressed as follows [34]:
GPR for Prediction
GPR is an important tool for inferring the relationship between inputs and outputs and determining the conditional distribution of the target output. As a nonparametric, statistical, and Bayesian method, GPR was adopted in this study to model the evolution of uncertainty in timestream data over time to support the failure time prediction of the bearings. For a training set D = [(t_{i}, HI_{i} i=1, 2, …, n)], where t_{i} and HI_{i} denote the time number and target outputs, respectively. The prior distribution of the multivariate variance normal function f(t) can be specified through the mean function μ(x) and covariance matrix K.
Subsequently, the problem is converted to the prediction of the output in D^{*} using the observed training set. The covariance function must satisfy the Mercer condition, i.e., every semipositive definite symmetric function should be a kernel [35, 36]. Therefore, the optimal solution of the hyperparameters \(\phi\) can be adaptively obtained via the maximum likelihood method. Using the established regression model, the output can be obtained by substituting the input value \(t_{i}\) into the model. The GPR process in the experiment can be summarized as follows:

(1)
Establishment of the training dataset: The HIs predicted by the TFRsubsequence–3DCNN model and the corresponding time information were used to construct the training dataset. The GPR model is expected to achieve an accurate and stochastic RUL prediction.

(2)
Model training: The GPRbased model was trained using the predicted HIs of the training bearings. The mean value was set to 0. The covariance function was set as a combination of the square exponential and periodic covariance functions.

(3)
Prediction: The model was extrapolated in time to obtain the predicted HI values for RUL estimation. The failure time of this bearing was determined based on the physical meaning of the HI, and its RUL was calculated.
Case Study
Experimental Setting
The experimental data were obtained from the PRONOSTIA platform [37], and two accelerometers were mounted to measure the vibration signals. Figure 4 shows the PRONOSTIA platform. The sampling rate of the vibration signals was 25.6 kHz, and 0.1 s of data were recorded every 10 s. A total of 17 datasets under different conditions are presented in Table 2, and each data group contained signals in both the horizontal and vertical directions.
Vibration Data and TFRSubsequence
Figure 5 shows the horizontal signal and its corresponding TFRs for two samples (the first and last samples) of bearing 11. The accuracy of RUL prediction can be improved by correctly describing the nonlinear degradation features of the bearing vibration signals with different faults. In CWTbased analysis, it is crucial to use appropriate parameters. Many studies have shown that reasonable parameter selection can effectively improve the signal parsing ability of the CWT model, effectively counteract the interference of background noise, and improve the model robustness. The complex Morlet wavelet was proven to be effective in analyzing bearing degradation. According to the criterion of maximum energytoShannon entropy ratio, the complex Morlet wavelet [38] function was selected in the CWT with a bandwidth parameter of 3 and a center frequency of 3. The length of the CWT wavelet function scale was set to 256. Figures 5a and b show the bearing operating under stable conditions, and the center frequency was approximately 4000 Hz. Figures 5c and d show the bearing at the end of the liferecession phase with severe failure. The impact components are shown in Figure 5c. The high energy concentrations of these impact components in the time–frequency domain are shown in Figure 5d. Therefore, the CWTbased TFRs can effectively highlight faultrelated information in the bearing signals.
Furthermore, to improve the computational efficiency, the bilinear interpolation algorithm [39] was used to perform digital zooming on the TFRs. The dimensions of the original CWT TFRs, which were 256×2560, incurred a heavy computational burden. The inverse transform of the pixel in the bilinear interpolation was calculated as follows:
where \(A = I\left( {x_{i} ,y_{i} } \right)\), \(B = I\left( {x_{i} + 1,y_{i} } \right)\), \(C = I\left( {x_{i} ,y_{i} + 1} \right)\), and \(D = I\left( {x_{i} + 1,y_{i} + 1} \right)\). In addition \(x = x_{i} + x_{f}\) and \(y = y_{i} + y_{f}\), with integer components x_{i} and y_{i} as well as corresponding floatingpoint coordinates x_{f}, y_{f} (0 < x_{f}, y_{f}< 1). Using the raw TFR corresponding to Figure 5b as an example, which was reduced with dimension \([R_{row} ,R_{column} ]\), the unified value was set as \(R_{row} = R_{column} = R\) for convenience of calculation. Figure 6 shows the TFR visualization images for the four cases, where the reduced dimensionality R was set to 500, 250, 100, and 50. As shown, aliasing occurred in the TFR image when the reduced dimension was extremely small; this shows that bilinear interpolation results in the loss of highfrequency components in TFRs. To maintain sufficient information in the reduced TFRs, R was set to 100. Subsequently, the reduced TFRs were arranged in the chronological order for the network input. Figure 7 shows the process in which the signal is converted into TFRsubsequences. The dislocated overlapping method can ensure the validity of temporal information extraction for RUL estimation.
RUL Estimation
The key factors pertaining to the proposed model were investigated comprehensively, including the parameter optimization method selected and the effect of the overlapping number. The superiority of the GPRbased RUL prediction was analyzed.
Parameter Optimization Method
The proposed TFRsubsequence–3DCNN model was built using the Keras framework and implemented in an NVIDIA Tesla K80 GPU with a minibatch of 128 and an epoch of 100. To avoid overfitting in the training process, 20% of the training data were randomly shuffled as the validation dataset to determine the network parameters. Finally, the mean square error (MSE) was applied as the loss function in the 3DCNN model. By setting \({\varvec{y}} \in {\mathbb{R}}^{{{1} \times n}}\) as the true label and \({\hat{\varvec{y}}} \in {\mathbb{R}}^{{{1} \times n}}\) as the predicted label, the MSE loss is expressed as
For a training set \(Tr = \left[ {\left( {x_{i} ,y_{i} } \right)i = 1,2, \cdots ,n} \right]\), where \(x_{i}\) is the TFR subsequence and \(y_{i}\) is the predicted value, the structural risk minimization rule is used as the objective function, as follows:
where θ is the whole model parameter, f (x_{i}) the model output, and L (y_{i}, f (x_{i}); θ) the MSE loss function. To adaptively minimize loss during model training, the root mean square prop (RMSProp) [40] method was applied to the model. RMSProp uses differential squared weighted averages for the gradients of weights W and bias b to overcome significant swings in the update of the loss function, as well as to accelerate the convergence. During the jth iteration, the RMSProp is solved as follows:
The gradient momenta, \(S_{dw}\) and \(S_{db}\), were accumulated by the loss function during the previous (j1)th round of iteration. β is the index of gradient accumulation. When dw or db is a significant value, it is divided by the square root of the gradient accumulated during update to obtain a smaller update amplitude.
Effect of Overlapping Number L
The overlapped TFRsubsequences contain more useful degradation information and may be more advantageous than TFRs in improving the prediction performance when used as inputs of the 3DCNN model. In theory, the higher the value of L, the more timerelated information is contained in the constructed TFRsubsequences, and the more conducive is the model in modeling the time dependence of the sample. However, if L is extremely high, then the effect of a single sample will be weakened, and the distinguishability of the samples at different times will be weakened, which is not conducive to the capturing of faultrelated transient information in the signal. Owing to the effect of the overlapping number L on the prediction, experiments were performed on bearings 13 and 33 with different kernels (3DCNN and 2DCNN) and overlapping numbers (L was set as 1, 2, 3, 4, 5, and 6). As the interval time for each TFRgenerating signal was 0.1 s, the contrast signal time was set to 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6 s in the 2DCNN for comparison, respectively. It is noteworthy that when L = 1, the 3DCNN was the same as the 2DCNN. However, when \(L \ge 2\), they were different. For the 2DCNN, inputting TFRs corresponding to multiple times can increase the amount of information that can be obtained; however, the change information between adjacent TFRs at consecutive times cannot be extracted. The 3DCNN model can simultaneously extract the variation in consecutive TFRs. As criteria, the Spearman rank correlation coefficient, sign trend factor, and root mean squared error (RMSE) were compared to investigate the prediction performance, as shown in Figure 8.
For the predicted HIs \({\hat{\varvec{y}}} \in {\mathbb{R}}^{{{1} \times n}}\) of the CNN model and true HIs \({\varvec{y}} \in {\mathbb{R}}^{{{1} \times {\text{n}}}}\), the RMSEs of \({\varvec{y}}\) and \({\hat{\varvec{y}}}\) are
where y_{i} represents the measurement vector of a feature on the ith system, and \({\overline{\varvec{y}}}\) refers to the mean of y. The Spearman rank correlation coefficient [41] and sign trend factor [42] of the predicted HIs are calculated as follows:
where \({\varvec{t}} \in {\mathbb{R}}^{{{1} \times n}}\) in Eq. (17) represents the measured feature on the ith system, \({\text{rank}}\left( {{\hat{\varvec{y}}}_{i} } \right)\) is equal to the average of their positions in the ascending order of value \({\hat{\varvec{y}}}_{i}\), and \({\text{corr(}} \cdot {)}\) refers to the correlation coefficient. In Eq. (18), \({\text{sgn}} {(} \cdot {)}\) refers to the Sgn function.
As shown in Figure 8, the calculation error of the 3DCNNbased approach was less than that of the 2DCNNbased approach. The RMSE decreased first and then increased as L increased. Within a certain range, increasing L enables more information to be extracted. However, when L is extremely large, the difference in data at different times will be reduced. The TFRs during the stable operation of the bearing did not differ significantly, rendering it difficult to extract subtle faultrelated changes in the signal. Spearman’s rank correlation coefficient can be used as an indicator of the dependency between two variables, e.g., the HI and runtime. It uses a monotone equation to evaluate the correlation between two statistical variables. For bearings 13, the coefficient value obtained by the 3DCNN was significantly higher than that by the 2DCNN, and the maximum value was achieved when L was 3, 4, or 5. For bearing 33, the coefficient value obtained by the 3DCNN was higher than that by the 2DCNN, and the maximum value was achieved when L was 3 or 4. The sign trend factor represents the monotonicity of the data, whose trend was the same as that of the Spearman coefficient.
Furthermore, Figure 9 shows the HIs of bearing 13 calculated using the 3DCNN model with different overlapping numbers. As shown, when \(t \ge 10000{\text{ s}}\), the linearity of the HIs in the six states was extremely high; however, when \(t < 10000\;{\text{s}}\), the HIs fluctuated significantly. When \(L \le 4\), the fluctuation weakened. However, when L = 6, the prediction effect worsened. It was discovered that an increase in the overlapping number within a certain range resulted in a more accurate HI prediction. However, a high overlapping number resulted in more memory storage and calculation burden for the model. Therefore, the overlapping number L was set to 4.
Effect of TFR Generation Method Selection
An effective TFR generation method should be able to fully reflect the degradation information of the bearing; in fact, this is a prerequisite for the established model to fully utilize the temporal and spatial features in the bearing vibration signal. The CWT method has been proven to be effective for the time–frequency analysis of bearing vibration signals, and its parameter determination and use methods are relatively mature; therefore, we selected it as the TFR generation method in this study. An experiment was performed to compare and analyze the effects of typical time–frequency analysis methods, i.e., WVD and PWVD, on the model performance when used in the established model. Table 3 presents the RUL prediction performance of bearings 13 based on different TFR generation methods Among them, the Hamming window function with a window length of 25 was selected for the WVD and PWVD methods. Overall, the TFRs generated by the CWT method yielded the smallest prediction error when used for HI construction, which is conducive to obtaining higherprecision RUL estimation performance. The prediction performance obtained by the PWVDbased TFRs indicated a clear advantage in terms of the monotony of the constructed HIs. The model performance corresponding to the TFRs generated by the WVD method was not comparable to that of the PWVD method, as it demonstrated a certain inhibition effect on the crossinterference terms. The highquality TFR method is beneficial for accurately reflecting the degradation state of bearings and improving RUL prediction accuracy.
Table 4 shows the time consumption of different TFR generation methods based on the initial, middle, and terminus intervals of the time–frequency analysis for bearing 11. As shown, the CWT method is advantageous in terms of computational efficiency compared with the other methods. For RUL prediction tasks with a large sample size, high computational efficiency is extremely important. Considering the results show in Tables 3 and 4, we used the CWT method for the time–frequency analysis of bearing vibration signals.
Superiority of GPRBased RUL Prediction
Figure 10 shows the estimated HIs of the training bearings. The solid lines and dots correspond to the actual and estimated HIs, respectively. The Y and Xaxes correspond to the HI and time, respectively. The HIs estimated using the proposed model can be used to track the actual degradation trends of the four training bearings. The test data used in the experiment were obtained from the terminatebeforefailure bearings. The trained model was used to predict the HIs corresponding to the testing datasets.
The RUL can be obtained directly based on the definition of the HI. In Eq. (5), t refers to the current time, and T the runtofailure time of the bearings. For the testing datasets, the unknown T can be calculated using the known HI. This is the same as using the RUL as the model output directly, which has been reported previously. However, the predicted HI at every moment should be nearly accurate. When the model was built, it was regarded as a regression problem rather than a classification problem, and the overall objective was the minimum MSE of the entire MSE. The RUL estimation result will be extremely volatile if calculated directly using the discrete HI output value. As shown in Figure 9, the smoothed HI was easily affected by the individual points. Therefore, the GPR algorithm was used in this study for the RUL prediction of bearings. The algorithm considers not only a single HI, but also the mixed distribution form of all existing HIs and calculates the distribution function corresponding to an individual bearing. For the independent variable time t, we assume that HI = f (t) as an unknown distribution form. For a specified value, the HI was modeled as a normal distribution. The mean vector of this joint normal distribution was assumed to be zero. The covariance matrix was defined as the similarity measure of each element corresponding to two HIs and was determined based on the squared exponential kernel. In addition, using the Bayesian method, not only the distribution of each point can be obtained via the GPR method, but also the unknown function is inferred.
Using bearings 13 and 33 as examples, Figure 11 shows the estimation result and the corresponding confidence interval. The Gaussian maximum likelihood estimation function was used as the optimal solution method, the superposition of a linear mean value and a constant mean value was used as the mean value function, and the Materniso function was used as the covariance function. The convergence of the model was achieved in 50 iterations. The prediction of the GPR model showed a 95% confidence interval. The mean of the confidence interval bound, namely the red line in the figure, represents the predicted HI value at an unknown time. Using HI = 1 as the failure threshold, the intersection points of the middle line of the GPR prediction, and the failure threshold as the failure point, the bearing RUL can be further calculated.
The GPR method is a Bayesian approach that can be used to model streaming data. Because of the advantages of the GPR method in uncertainty prediction, it was selected as a regression method for RUL estimation in this study. The difference between GPR and other regression algorithms is that in the latter, only the output value corresponding to the input data is expected to be obtained in general regression algorithms, such as linear, logistic, multiple, and Lasso regressions. By contrast, the GPR model obtains the distribution of the function based on the input data. The estimated distribution is advantageous when it is used to predict the RUL estimation uncertainty, which is difficult to realize using general regression methods.
As shown in Figure 11, the estimation ability of GPR for model errors is well demonstrated. The GPR method is beneficial for the RUL estimation of bearings in terms of two aspects: 1) It offers better antiinterference ability for the mutation points in HIs. Because GPR obtains a 95% confidence interval by calculating the probability distribution of existing data, abnormal mutation points are naturally excluded from the confidence interval, thereby reducing the effect of mutation points on the prediction result; 2) compared with point estimation, the result obtained is more stable. Using the mean of the confidence interval as the predicted value reduces the effect of data fluctuation on the results. Using bearing 33 as an example, a significant fluctuation remained in the HIs near the end of the test data, thereby increasing the prediction error significantly. The predicted value represented by the red line was similar to the true value represented by the yellow line. In the future, HIs can be predicted based on the distribution form of the existing data, and the result be consistent with the true value. The experimental results indicate that effectiveness of using the GPR model to predict the RUL of bearings based on the HIs constructed using the 3DCNN model.
Comparison with Other Related Studies
The proposed method was compared with four recent stateoftheart methods: 1) the LSTM model proposed in Ref. [21], 2) the MSCNN model proposed in Ref. [22], 3) the DBN–DP model proposed in Ref. [18] based on the DBN and diffusion process; 4) the RNN–AM model proposed in Ref. [20] based on the RNN and attention mechanism; 5) the hybrid prognostics method based on systematically combining relevance support vector regressions, exponential degradation models, and the Fréchet distance proposed in Ref. [29]; and 6) an enhanced encoder–decoder method proposed in Ref. [17].
By defining \(R\hat{U}L_{i}\) and \(ActRUL_{i}\) as the ith estimated and actual RULs, respectively, the prediction error is expressed as
The RUL estimation score of the ith experiment is
The score of the entire model is expressed as
Table 5 shows the prediction performance of the five methods. The current time, actual RUL, predicted RUL, estimation error, standard deviation (SD) of the errors, and mean score are presented in the table. As shown, the prediction SD of the prediction model based on LSTM and the RNN was high, indicating the low robustness of the model. Both models use raw vibration signals as the input data of the network for analysis. As mentioned above, it is difficult to extract complex faultrelated information using raw signals. The CNNbased method uses TFRs as the network input and is more stable than LSTM and RNN models in prediction, as reflected by its lower SD of the error. However, the prediction accuracy of the CNNbased model is not ideal. Furthermore, the DBN–DP model considers the instability of the prediction results. After the DBN network, the DP method was added to model the HIs to improve the prediction stability. Therefore, the DBN–DP model yielded a lower SD of the prediction error compared with the three abovementioned models. The TFRsubsequence–3DCNN–GPR method achieved a significantly higher score of 0.77 than the other methods. In both operating conditions 1 and 3, the RUL estimation error was within 5%, which was significantly lower than those of the other methods. The results fully demonstrate the feasibility and superiority of the proposed prognostics approach. The highest prediction accuracy was due to the simultaneous consideration of the spatial characteristic information of the TFR and the temporal correlation of the temporal correlation of adjacent TFRs. The lowest SD of the prediction error was attributed to the GPR method since it can effectively model the HIs. The experimental results demonstrated the effectiveness of the proposed method.
Conclusions
To address the three challenges in existing datadriven bearing RUL estimation methods, namely, the management of TFRs, temporal dependency of degradation states, and prediction volatility and uncertainty, a novel TFRsubsequence–3DCNN–GPR prognostics approach was proposed herein. The proposed model uses novel TFRsubsequences as inputs for endtoend estimation, i.e., a new attempt to provide more spatiotemporal information for a more accurate RUL estimation of bearings. Subsequently, the 3DCNN was leveraged to model the HIs of different adjacent TFR subsequences as the first attempt for the RUL estimation of bearings. The proposed model demonstrated superiority in the state tracking of bearings by extracting spatial TFR information and temporal interrelated information of data synchronously. Furthermore, the GPR model was applied to estimate the failure time and uncertainty bounds. By deriving the HI value of the unknown point, it can be directly used to estimate the RUL of bearings. Using the mean value of the distribution as the predicted HI value, the proposed model was more resistant to abnormal and fluctuating points than other prediction methods. This study provides insights into the comprehensive use of bearing vibration signals, particularly degradationrelated spatiotemporal information in the data, to achieve a highly accurate RUL prediction with uncertainty quantification. A comparison with other stateoftheart methods demonstrated the superiority of the proposed approach.
References
M F Li, T Y Wang, F L Chu, et al. Component matching Chirplet transform via frequencydependent chirp rate for wind turbine planetary gearbox fault diagnostics under variable speed condition. Mechanical Systems and Signal Processing, 2021, 161: 107997.
T Han, C Liu, L J Wu, et al. An adaptive spatiotemporal feature learning approach for fault diagnosis in complex systems. Mechanical Systems and Signal Processing, 2019, 117: 170187.
F B Zhang, J F Huang, F L Chu, et al. Mechanism and method for the fullscale quantitative diagnosis of ball bearings with an inner race fault. Journal of Sound and Vibration, 2020, 488: 115641.
J Lee, F J Wu, W Y Zhao, et al. Prognostics and health management design for rotary machinery systems  Reviews, methodology and applications. Mechanical Systems and Signal Processing, 2014, 42: 314334.
Y F Li, M J Zuo, K Feng, et al. Detection of bearing faults using a novel adaptive morphological update lifting wavelet. Chinese Journal of Mechanical Engineering, 2017, 30: 13051313.
Y G Lei, N P Li, L Guo, et al. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing, 2018, 104: 799834.
K S Singleton, E G Strangas, S Aviyente. The use of bearing currents and vibrations in lifetime estimation of bearings. IEEE Transactions on Industrial Informatics, 2017, 13: 13011309.
S Kerst, B Shyrokau, E Holweg. A modelbased approach for the estimation of bearing forces and moments using outer ring deformation. IEEE Transactions on Industrial Electronics, 2020, 67: 461470.
M Cerrada, R V Sánchez, C Li, et al. A review on datadriven fault severity assessment in rolling bearings. Mechanical Systems and Signal Processing, 2018, 99: 169196.
T S Liu, K P Zhu, L C Zeng. Diagnosis and prognosis of degradation process via hidden semiMarkov model. IEEE/ASME Transactions on Mechatronics, 2018, 23: 14561466.
L L Cui, X Wang, Y G Xu, et al. A novel switching unscented Kalman filter method for remaining useful life prediction of rolling bearing. Measurement, 2019, 135: 678684.
X S Si, W B Wang, C H Hu, et al. A Wienerprocessbased degradation model with a recursive filter algorithm for remaining useful life estimation. Mechanical Systems and Signal Processing, 2013, 35: 219237.
Y G Lei, N P Li, S Gontarz, et al. A modelbased method for remaining useful life prediction of machinery. IEEE Transaction on Reliability, 2016, 65: 13141326.
D Wang, K L Tsui, Q Miao. Prognostics and health management: A review of vibration based bearing and gear health indicators. IEEE Access, 2017, 6: 665676.
D Z Zhao, L Y Li, W D Cheng, et al. Generalized demodulation transform for bearing fault diagnosis under nonstationary conditions and gear noise interferences. Chinese Journal of Mechanical Engineering, 2019, 32: 7.
R N Liu, B Y Yang, E Zio, et al. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mechanical Systems and Signal Processing, 2018, 108: 3347.
L Liu, X Song, K Chen, et al. An enhanced encoder–decoder framework for bearing remaining useful life prediction. Measurement, 2021, 170: 108753.
C H Hu, H Pei, X S Si, et al. A prognostic model based on DBN and diffusion process for degrading bearing. IEEE Transactions on Industrial Electronics, 2020, 67 (10): 87678777.
H Wang, M J Peng, Z Miao, et al. Remaining useful life prediction techniques for electric valves based on convolution auto encoder and long short term memory. ISA Transactions, 2021, 108: 33333342.
Y H Chen, G L Peng, Z Y Zhu, et al. A novel deep learning method based on attention mechanism for bearing remaining useful life prediction. Applied Soft Computing Journal, 2020, 86: 105919.
B Zhang, S H Zhang, W H Li. Bearing performance degradation assessment using long shortterm memory recurrent network. Computers in Industry, 2019, 106: 1429.
J Zhu, N Chen, W Peng. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Transactions on Industrial Electronics, 2019, 66: 32083216.
M F Li, T Y Wang, F L Chu, et al. Scalingbasis Chirplet transform. IEEE Transactions on Industrial Electronics, 2021, doi: https://doi.org/10.1109/TIE.2020.3013537.
H Huang, N Baddour, M Liang. Bearing fault diagnosis under unknown timevarying rotational speed conditions via multiple timefrequency curve extraction. Journal of Sound and Vibration, 2018, 414: 4360.
R Klein, E Masad, E Rudyk, et al. Bearing diagnostics using image processing methods. Mechanical Systems and Signal Processing, 2014, 45: 105113.
J J Shi, M Liang, D S Necsulescu, et al. Generalized stepwise demodulation transform and synchrosqueezing for time – frequency analysis and bearing fault diagnosis. Journal of Sound and Vibration, 2016, 368: 202222.
B Wang, Y G Lei, T Yan, et al. Recurrent convolutional neural network: A new framework for remaining useful life prediction of machinery. Neurocomputing, 2020, 379: 117129.
Z S Chen, X T Tu, Y Hu, et al. Realtime bearing remaining useful life estimation based on the frozen convolutional and activated memory neural network. IEEE Access, 7, 2019: 9658396593.
B Wang, Y G Lei, N P Li, et al. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Transactions on Reliability, 2020, 69(1): 401412.
D Wang, P W Tse. Prognostics of slurry pumps based on a movingaverage wear degradation index and a general sequential Monte Carlo method. Mechanical Systems and Signal Processing, 2015, 56: 213229.
K P Zhu, T S Liu. Online tool wear monitoring via hidden semiMarkov model with dependent durations. IEEE Transactions on Industrial Informatics, 2018, 14: 6978.
S W Ji, W Xu, M Yang, et al. 3D Convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35: 221231.
Ł Jedliński, J Jonak. Early fault detection in gearboxes based on support vector machines and multilayer perceptron with a continuous wavelet transform. Applied Soft Computing Journal, 2015, 30: 636641.
S Qian, H Liu, C Liu, et al. Adaptive activation functions in convolutional neural networks. Neurocomputing, 2018, 272: 204212.
F Kang, S X Han, S Salgado, et al. System probabilistic stability analysis of soil slopes using Gaussian process regression with Latin hypercube sampling. Computers and Geotechnics, 2015, 63: 1325.
L L Kang, R S Chen, N X Xiong, et al. Selecting hyperparameters of Gaussian process regression based on noninertial particle swarm optimization in Internet of Things. IEEE Access, 7, 2019: 5950459513.
P Nectoux, R Gouriveau, K Medjaher, et al. PRONOSTIA : An experimental platform for bearings accelerated degradation tests. PHM’12, United States: Denver, Colorado, 2012, 1–8.
T P Le. Use of the Morlet mother wavelet in the frequencyscale domain decomposition technique for the modal identification of ambient vibration responses. Mechanical Systems and Signal Processing, 2017, 95, 488505.
M Mastyło. Bilinear interpolation theorems and applications. Journal of Functional Analysis, 2013, 265: 185207.
J J Wang, J X Yan, C Li, et al. Deep heterogeneous GRU model for predictive analytics in smart manufacturing: Application to tool wear prediction. Computers in Industry, 2019, 111: 114.
K J Ashok, S Abirami. Aspectbased opinion ranking framework for product reviews using a Spearman’s rank correlation coefficient method. Information Sciences, 2018, 460461: 2341.
J Coble, J W Hines. Identifying optimal prognostic parameters from data: A genetic algorithms approach. Annual Conference of the Prognostics and Health Management Society, San Diego, CA, 2009, 1–11.
Acknowledgements
The authors acknowledge the support of the FEMTOST Institute for providing the rolling bearing data.
Funding
Supported by National Key Research and Development Project of China (Grant No. 2020YFB2007700), State Key Laboratory of Tribology Initiative Research Program (Grant No. SKLT2020D21), National Natural Science Foundation of China (Grant No. 51975309), Shaanxi Provincial Natural Science Foundation of China (Grant No. 2019JQ712), and Young Talent Fund of University Association for Science and Technology in Shaanxi (Grant No. 20170511).
Author information
Authors and Affiliations
Contributions
TW and FC were in charge of the whole trial; XW wrote the manuscript; AM, QH, WZ, and AL assisted with sampling and laboratory analyses. All authors read and approved the final manuscript.
Authors’ Information
Xu Wang, born in 1992, is currently a Ph.D. candidate of mechanical engineering at Department of Mechanical Engineering, Tsinghua University, China. He is also cocultured with aerospace science and technology at HighTech Research Institute of Xi'an, China. He received his M.S. degree in ordnance science and technology from HighTech Research Institute of Xi’an, China, in 2017. His research interests include condition monitoring and fault diagnosis of machinery.
Tianyang Wang, born in 1985, is currently an assistant professor of mechanical engineering at Department of Mechanical Engineering, Tsinghua University, China. He received the B.S. and Ph.D. degrees in mechanical engineering from Beijing Jiaotong University, China, in 2008 and 2015, respectively. His recent research interests include signal processing and fault diagnosis of rotating machinery.
Anbo Ming, born in 1986, is currently an assistant professor at HighTech Research Institute of Xi’an, China. He received the B.S. degree and Ph.D. degree in aeronautical science and technology from HighTech Research Institute of Xi’an, China, in 2007 and 2014, respectively. He was an assistant professor at HighTech Research Institute of Xi’an, China. His current research interests include condition monitoring and fault diagnosis of mechanical equipment.
Qinkai Han, born in 1982, is currently an assistant professor of mechanical engineering at Department of Mechanical Engineering, Tsinghua University, China. He received the B.S. and Ph.D. degrees in aviation propulsion engineering from Beihang University, China, in 2004 and 2010, respectively. His research interests include rotor dynamic analysis, machinery condition monitoring and fault diagnosis, and the development of selfpowered system for intelligent fault diagnosis.
Fulei Chu, born in 1959, is currently a professor of mechanical engineering at Department of Mechanical Engineering, Tsinghua University, China. He received the Ph.D. degree in mechanical engineering from Southampton University in 1993. He is the Vice President of the Chinese Society for Vibration Engineering (CSVE), the Executive Director of the Fault Diagnosis Chapter in CSVE, and a member of the Rotor Dynamics Chapter in the International Federation for the Promotion of Mechanism and Machine Science (IFToMM). His research interests include rotating machinery dynamics, machine fault diagnosis, nonlinear vibration, and vibration control.
Wei Zhang, born in 1963, is currently a professor at HighTech Research Institute of Xi’an, China. He received the B.S. degree in power machines from the HighTech Research Institute of Xi’an, China, in 1985, the M.S. degree in mechanics from Xi’an Jiaotong University, China, in 1995, and the Ph.D. degree in ordnance science and technology from the HighTech Research Institute of Xi’an, China in 2003. His research interests include nondestructive testing, condition monitoring, and fault diagnosis of machinery.
Aihua Li, born in 1966, is currently a professor at HighTech Research Institute of Xi’an, China. His research interests include condition aluminum air battery development, computer vision, and fault diagnosis of machinery.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, X., Wang, T., Ming, A. et al. Deep Spatiotemporal ConvolutionalNeuralNetworkBased Remaining Useful Life Estimation of Bearings. Chin. J. Mech. Eng. 34, 62 (2021). https://doi.org/10.1186/s10033021005761
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/s10033021005761
Keywords
 Bearing
 Remaining useful life
 Continuous wavelet transform
 Convolution neural network
 Gaussian process regression