 Original Article
 Open Access
 Published:
A Fast Multitasking Solution: NMFTheoretic Coclustering for Gear Fault Diagnosis under Variable Working Conditions
Chinese Journal of Mechanical Engineering volume 33, Article number: 16 (2020)
Abstract
Most gear fault diagnosis (GFD) approaches suffer from inefficiency when facing with multiple varying working conditions at the same time. In this paper, a nonnegative matrix factorization (NMF)theoretic coclustering strategy is proposed specially to classify more than one task at the same time using the high dimension matrix, aiming to offer a fast multitasking solution. The shorttime Fourier transform (STFT) is first used to obtain the timefrequency features from the gear vibration signal. Then, the optimal clustering numbers are estimated using the Bayesian information criterion (BIC) theory, which possesses the simultaneous assessment capability, compared with traditional validity indexes. Subsequently, the classical/modified NMFbased coclustering methods are carried out to obtain the classification results in both row and column tasks. Finally, the parameters involved in BIC and NMF algorithms are determined using the gradient ascent (GA) strategy in order to achieve reliable diagnostic results. The Spectra Quest’s Drivetrain Dynamics Simulator gear data sets were analyzed to verify the effectiveness of the proposed approach.
Introduction
In those largescale rotating machines, wear and tear always comes out in the teeth surface of driving gears if the pressure is not even or some extra impurities are mingled in the lubricating oil. Health monitoring technology of mechanical components has been proved to be effective at discovering early abrasion and reducing the failure rate [1,2,3,4,5]. As one of the major tasks in health monitoring, gear fault diagnosis (GFD) aims to assess the current gear state based on the obtained measurement data, then to inform users to take proper actions [6]. A GFD procedure generally consists of three main processes: (1) Data acquisition: data are collected from sensors to monitor the health status of gears; (2) Feature extraction: some feature extraction algorithms, such as wavelet transform (WT) [7] and least squares support vector machine (LSSVM) [8], are carried out based on the prior knowledge to provide recognizable features; (3) Fault recognition: classifiers are built to obtain gear faults with the analysis of the extracted features.
Clustering technology, one of unsupervised fault recognition approaches, has experienced long term development from partition based clustering to graph theory based clustering as listed in Table 1. Most of these algorithms were applied to fault diagnosis of rotating machinery. For instance, Yuwono et al. [9] combined particle clustering with a Hidden Markov Model (HMM) for bearing fault diagnosis; Pacheco et al. [10] classified gear fault severities using rough set theory. These researches have provided effective clustering applications related to machine fault diagnosis. However, they have a nonnegligible limitation: each feature vector is treated as independent and uncorrelated unit in these clustering methods. In fact, strong correlation exists between machine working conditions and targeted diagnostic tasks. There is no doubt that clustering analysis in one dimension will lose significant information hidden in another dimension. To overcome this limitation, a coclustering strategy for variable working conditions GFD is proposed in this paper. Coclustering was firstly utilized in the biology and medical domains since this concept was mentioned by Mirkin [11] in 1997. The joint clustering of genes shapes and locations promoted the discovery of genetic structure sequence [12]. Subsequently, coclustering has been expanded to other fields such as text analysis [13] and search engines [14], etc.
In this study, coclustering applications for gear fault diagnosis have been developed. Compared with previous GFD based on joint clustering methods, two highlights in this paper can be obtained: (1) when one varying working condition (such as rotating speed or load) and one diagnostic task (such as fault severity) are jointed in the same matrix, their correlations are extracted, and the classification accuracy of the latter can adjust with the range of the former; (2) when two diagnostic tasks (such as fault severity and fault type) are jointed in the same matrix, they can be classified at the same time, which improves the diagnosis speed compared with independent GFD strategy and offer a fast multitasking solution.
The remainder of this paper is organized as follows. Section 2 presents a brief summary about the applicability of coclustering. It also describes the principle and basic framework of coclustering to solve the GFD problem. Section 3 addresses the preparatory work of GFD, especially a shorttime Fourier transform (STFT)based feature extraction method. In Section 4, the coclustering numbers are estimated based on the Bayesian information criterion (BIC). Then in Section 5, the traditional and modified NMFtheoretic coclustering process is discussed in detail. To assess these algorithms, the gradient ascend algorithm is also implemented for parameters regulation in Section 6. Section 7 concentrates on the varying working condition GFD experiments using the Drivetrain Dynamics Simulator (DDS) system, which especially shows the superiority of coclustering compared with classical clustering strategy such as Xmeans algorithm. Conclusions are drawn in Section 8 with discussions on the future GFD application based on joint clustering.
Coclustering Framework of GFD
Traditional clustering can be defined as: dataset \(\varvec{X}\) exists in a limited data space, which can be represented with a \(n \times d\) matrix, is composed of \(n\) elements: \(\varvec{x}_{i} = \left( {x_{i1} ,x_{i2} , \cdots ,x_{id} } \right)^{\text{T}}\). The purpose of general clustering is to segment dataset \(X\) into p categories: \(C_{k} \left( {k = 1,2, \cdots ,p} \right)\).
where \(i \in \left\{ {1,2, \ldots ,n} \right\}.\)
Different from general clustering, coclustering could be defined as: dataset \(\varvec{X}\) exists in a limited data space, which can be represented with a \(m \times n \times d\) matrix, is composed of \(m \times n\) elements: \(\varvec{x}_{ij} = \left( {x_{ij1} ,x_{ij2} , \cdots ,x_{ijd} } \right)^{\text{T}}\). The purpose of coclustering is to segment dataset \(\varvec{X}\) into p and q categories in horizontal and vertical axis, respectively: \(C_{k}^{h} \left( {k = 1,2, \cdots ,p} \right)\) and \(C_{k}^{v} \left( {k = 1,2, \cdots ,q} \right).\)
where \(i \in \left\{ {1,2, \ldots ,n} \right\}\), \(j \in \left\{ {1,2, \ldots ,m} \right\}.\)
Currently, one challenge for fault diagnosis of gears is that they are often operated under the varying working conditions. To explain the influence of varying working conditions on the diagnosis performance, a typical example is given in Figure 1 to compare the classification results between classical and coclustering strategies. In the extracted feature matrix, different colors represent different feature value. C1‒C5 represent different fault categories; D1‒D5 represent different working conditions. R1‒R9 represent the 1st clustering result; r1‒r5 represent the 2nd clustering result. It can be seen that the classical clustering results may be distorted due to the fact that different working conditions have direct influence on the extracted features, and this makes difference between final clustering categories (R1‒R9) and real fault categories (C1‒C5). On the contrast, the coclustering can reflect the actual diagnosis results (R1‒R5), which shows the robustness of coclustering in GFD under interference environment.
Notice that, coclustering is not limited in the two dimensions. Theoretically, it can be generalized to higher dimension (n ≥ 3), thus giving an idea to solve the more complex problem such as the gear fault diagnosis problem under multiple working conditions. Figure 2 explains the mechanism of coclustering GFD models under two working condition factors, such as varying rotating speed and varying load. In Model (a), the 2D classification matrix is structured in two dimensions: row for rotating speed and column for GFD task. In Model (b), the 3D classification matrix is structured in three dimensions: length for rotating speed, width for load and height for GFD task. Structured high dimension matrix is classified in each scale at the same time, which offers an idea for GFD under more than one working condition.
According to the description above, a coclustering framework is constructed for gear fault diagnosis, which is shown in Figure 3. The main process can be divided into four subframes.
Feature extraction subframe: as the input of this model, the gear vibration signals are collected using several triaxial accelerometers installed in monitored mechanical equipment, which may be operated in varying working condition environment. Then, the feature vectors \(\left\{ F \right\}\) are obtained using the shorttime Fourier transform (STFT) approach, aiming to gain differentiable timefrequency features for effective coclassifier performance;
Clustering number estimation subframe: the Bayesian information criterion (BIC) strategy is adopted to characterize the distribution character of all feature vectors \(\left\{ F \right\}\) and then estimate their coclustering numbers k & l in row and column, respectively;
Coclustering subframe: given coclustering numbers, the conventional as well as modified NMFbased coclustering classifiers are put into practice to build the varying working condition GFD models and get the classification results in various tasks;
Parameter regulation subframe: aiming to those adjustable parameters involved in BIC and the NMF algorithm, such as the weight factor \(\lambda\) and the transitional dimension \(d\), the gradient ascent (GA) algorithm is implemented to find the optimal values, which reaches a reliable diagnostic accuracy.
More details of these four subframes will be given from Section 3 to Section 6.
STFTbased Feature Extraction
The performance of feature extraction algorithms depends critically on the characteristic of the input gear fault signals as well as the running environment of equipment. Because of the nonstationary property of the vibration signal, general algorithms, such as Fourier transform (FT), are not very useful in this regard [24]. As a kind of timefrequency analysis method, the main idea of shorttime Fourier transform (STFT) is to execute Fourier transform in short sequential signals, cut by a sliding temporal window function \(\gamma \left( t \right)\), such as a Hanning or Hamming window [25]. When analyzing the nonstationary vibration signal \(s\left( t \right),\left( {t = 1,2, \ldots ,t_{0} } \right)\), we supposed that it can be approximated as smooth among the window function \(\gamma \left( t \right)\). Therefore, the STFT of \(s\left( t \right)\) is calculated by timefrequency units \(STFT\left( {t,w} \right)\) and is given by
where \(\tau\) means the position of window function \(\gamma \left( s \right)\); \(w\) represents the frequency parameter of STFT. Figure 4 illustrates a STFT example for a chipped tooth signal with the Hamming window. It can be seen from Figure 4(d) that there are four main peaks (approximate 120 Hz, 500 Hz, 1160 Hz, and 1770 Hz) appearing in the frequency domain of the STFT spectrogram. The frequency at 1770 Hz possesses the most obvious peak, up to 0.5 V, compare to other peaks.
Since the short time Fourier spectrum reflects a distribution of energies among all frequencies and temporal intervals, we have to seek those ‘meaningful’ values from the STFT graph. Without any known knowledge of the running equipment, an effective method is to find each maximum in the partitions of the STFT spectrogram to represent the feature of subwindow. Figure 5 gives an example of extracted features in the chipped gear in Figure 4, including 10 × 10, 40 × 40 and 80 × 80 features. Generally, the row & column divided dimensions depends on the nonstationary degree of signal in time and frequency domain, respectively. This figure indicates that more details are emerged in high dimension features compared with low dimension features. However, the increase of dimension will bring the expansion of computational load as well as time consuming. Therefore, a middle dimension is suitable if the definition and dimension are both acceptable, such as the 40 × 40 features in Figure 5(c). Here \(\varvec{STFT}\) matrix is composed of \({N}^{2}\) submatrixes \(\varvec{st}_{ij} \in \varvec{R}\left( {n/N \times m/N} \right)\), where \(i \in \left\{ {1,2, \ldots ,N} \right\}, j \in \left\{ {1,2, \ldots ,N} \right\}{:}\)
where \(\varvec{st}_{ij}\) represents coefficient at the location (\(i,j\)) int the submatrix of the STFT graph; The dimension N will be updated in the real GFD experiments in Section 7. Finally, the feature vectors from STFT are obtained and are given by
BICbased Clustering Number Estimation
In most practical gear fault diagnosis applications, it is difficult to know the estimated number of clusters in advance, but the input clustering numbers always have direct influence in final clustering results. An optimal numbers estimation strategy based on the cluster validity indexes is found in historical literatures, including: the Calinski–Harabasz index [26], the Davies–Bouldin index [27], the weighted interintra index [28] and the ingroup proportion index [29]. However, two drawbacks exist in the cluster validity indexesbased estimation strategy: (1) The estimation precision depends on selected clustering algorithm, dataset as well as the validity index. For instance, both using the IGP index, the affinity propagation (AP) method has more credible optimal clusteringnumber compared with the kmeans strategy on account of the randomness of the latter [30]. (2) It is hard to apply the cluster validity indexesbased estimation strategy to 2D or higher dimension clustering like coclustering because these validity indexes are mainly calculated according to the distance algorithms between different classes or within the same classes.
To find optimal numbers of coclustering, an estimation algorithm based on Bayesian information criterion (BIC) is proposed. BIC is a statistical method which represents the descriptive power of a model to dataset [31], including: (1) the posterior likelihood of data estimation \(L\); (2) The model complexity \(\left \varTheta \right\). The computational formula of BIC is given by
where \(\lambda\) means the weight factor; \(N\) is the totality of samples. In clustering, the posterior likelihood of data estimation \(L\) is represented using the ratio of the mutual information entropy between afterclustering \({\text{I}}\left( {S^{*} ;F^{*} } \right)\) and beforeclustering \({\text{I}}\left( {S;F} \right)\). In Eq. (6), the entire meaning of \(L\) is that a good clustering must maintain original information entropy as possible as it can,
But in 2D coclustering, the BIC model requires to take row and column clustering into consideration at the same time. So we redefine the parameters as follows:
Direction  Sample length  Sample size  Clustering number 

Row  \(n\)  \(m\)  \(k\) 
Column  \(m\)  \(n\)  \(l\) 
According to these definitions, the BIC model complexity in coclustering can be reexpressed as
Substituting Eq. (7) and Eq. (8) into Eq. (6), we get Eq. (9) as follows:
Further, this Bayesian information criterion can be extended to 3D field and is given by
where \(p\) is the clustering number of the 3rd classification; \(q\) is the size of the 3rd dimension. Based on the description of theory above, the details of BIC algorithm is given as:
Input:
 a.
For single variable working condition GFD, the sample matrix \(C\) is structured by
$$C_{40 \times n} = \left[ {\begin{array}{*{20}c} {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{1} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{i} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{n} } \\ \end{array} } \right],$$(11)where \(\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } & \cdots & {\varvec{x}_{40} } \\ \end{array} } \right]^{\text{T}}\) represents the STFT values in continuous approximate one minute, which depends on the change speed of working condition; \(i \in \left\{ {1,2, \ldots ,n} \right\}\) represents the random samples collected from different fault type and \(n\) is the number of sample.
 b.
For double variable working condition GFD, the sample matrix \(C\) is structured by
$$C_{n \times 40 \times 40} = \left[ {\begin{array}{*{20}c} {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{11} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{i1} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{n1} } \\ \vdots & \ddots & \vdots & {\mathinner{\mkern2mu\raise1pt\hbox{.}\mkern2mu \raise4pt\hbox{.}\mkern2mu\raise7pt\hbox{.}\mkern1mu}} & \vdots \\ {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{1j} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{ij} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{nj} } \\ \vdots & {\mathinner{\mkern2mu\raise1pt\hbox{.}\mkern2mu \raise4pt\hbox{.}\mkern2mu\raise7pt\hbox{.}\mkern1mu}} & \vdots & \ddots & \vdots \\ {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{140} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{i40} } & \cdots & {\left[ {\begin{array}{*{20}c} {\varvec{x}_{1} } \\ \vdots \\ {\varvec{x}_{40} } \\ \end{array} } \right]_{n40} } \\ \end{array} } \right].$$(12)
Process:
 a.
Select an matrix construction method to build \(C\) matrix according to the number of running environments;
 b.
Initialize the row clustering number \(k = 1\), the column clustering number \(l = 1\), the weight factor \(\lambda = 0. 5;\)
 c.
For \(k\) from 1 to \(m\), \(l\) from 1 to \(n\):
$${\text{BIC}}\left( {k,l} \right) = \lambda {\text{I}}\left( {S^{*} ;F^{*} } \right)/{\text{I}}\left( {S;F} \right)  \left( {nk{ \log }m + ml{ \log }n} \right)/2,$$(13)$$\mathop {I\left( {S;F} \right) = \sum }\nolimits_{s} \mathop \sum \nolimits_{f} p\left( {s,f} \right){ \log }_{2} \left[ {p\left( {s,f} \right)/p\left( s \right)p\left( f \right)} \right],$$(14)where \(p\left( {s,f} \right)\) means the joint probability distribution between row and column; \(p\left( s \right)\) means the probability distribution in row vector; \(p\left( f \right)\) means the probability distribution in column vector.
 d.
Search the max value of BIC during the whole k and l domain:
$$\arg \mathop {\hbox{min} }\limits_{k,l} {\text{BIC}}\left( {k,l} \right).$$(15)
Output:
 a.
For single variable working condition GFD, it outputs the number of final clustering numbers \(k\) and \(l;\)
 b.
For double variable working condition GFD, it outputs the number of final clustering numbers \(k\), \(l\), and \(p.\)
Modified NMFbased Coclustering
NMF Theory
Nonnegative factorization (NMF) is a kind of efficient data compression strategy, aiming to describe the highdimensional data set using few base vectors with the help of the nonnegative theory [32]. Different from the global characteristics of vector quantization (VQ) and principle component analysis (PCA) theory, NMF offers a good description about the local features, so specializing in searching the small scale information. Nonnegative factorization has been investigated for feature extraction and recognition of rolling element bearing fault [33]. However, the unique advantage of coclustering has not been explored for 2D or higher dimensions application. The basic NMF problem is stated as the following equation:
where \(\varvec{C}_{n \times m}\) is a \(n \times m\) nonnegative matrix; the basis matrix \(\varvec{W}\) and the gains matrix \(\varvec{H}\) are factorized from \(\varvec{C}_{n \times m}\); \(d < \left( {n \times m} \right)/\left( {n + m} \right)\) represents the reduced rank. Therefore, \(\varvec{C}_{n \times m}\) can be linearly estimated by the subvectors \(\varvec{W}_{n \times d}\) and \(\varvec{H}_{d \times m}\). In order to obtain matrix \(\varvec{W}\) and \(\varvec{H}\), a large number of cost functions are defined to quantify the degree of approximation. In our strategy, the Euclidean distance is chosen as the cost function.
The purpose of NMF is to find the \(\varvec{W}\) and \(\varvec{H}\), which possesses the smallest cost function: \(\mathop {\hbox{min} }\limits_{{\varvec{W},\varvec{H}}} D\left( {\varvec{C}\varvec{WH}} \right)\)\({\text{s}}.{\text{t}}., \varvec{W},\varvec{ H} \ge 0\). An iterative multiplicative algorithm is carried out based on the updated rule of \(\varvec{W}\) and \(\varvec{H}\), and is given by
where \(\otimes\) is the elementwise multiplication, \({\varvec{\Theta}}\) is the elementwise division; \(r\) represents the iteration.
Classical NMFbased Coclustering
Recently, the clustering application based on NMF has attracted much attention. Particularly, KIM, etc., explored the effective combination between cluster and NMF [34]. This paper extends its application from single cluster to coclustering, aiming to solve the varying work condition or multitasks problem. Rely on the computational \(\varvec{W} \in \varvec{R}^{n \times r}\) and \(\varvec{H} \in \varvec{R}^{r \times m}\) above, two objective functions \(\varvec{J}_{k}\) & \(\varvec{J}_{l}\) are defined as follows:
where \(\varvec{C}^{r} = \left[ {\varvec{c}_{1} ,\varvec{c}_{2} , \ldots ,\varvec{c}_{k} } \right]^{r} \in \varvec{R}^{m \times k}\) and \(\varvec{C}_{2} = \left[ {\varvec{c}_{1} ,\varvec{c}_{2} , \ldots ,\varvec{c}_{l} } \right]^{c} \in \varvec{R}^{n \times l}\) represent the centroid matrix in row and column, respectively; The element \(\varvec{c}_{j} ,j \in \left[ {1,2, \ldots ,k} \right]\) of \(\varvec{C}^{r}\) means the cluster centroid of the jth cluster in Task I and the element \(\varvec{c}_{j} ,j \in \left[ {1,2, \ldots ,l} \right]\) of \(\varvec{C}^{c}\) means the cluster centroid of the jth cluster in Task II; \(\varvec{B}^{r}\) & \(\varvec{B}^{c}\) denote clustering assignment in Task I and Task II respectively. In Task I, \(\varvec{B}_{ij}^{r} = 1\) means the ith sample belongs to the jth cluster, otherwise \(\varvec{B}_{ij}^{r} = 0\), and so is Task II.
The purpose of coclustering is to find sparse matrix \(\varvec{B}_{ij}^{r}\) and \(\varvec{B}^{c}\), which has only one in each row, with others being zero. Taking Task I and Task II as example, we redefine \(\varvec{C}^{r}\) and \(\varvec{C}^{c}\) as
where \(\left( {\varvec{D}^{r} } \right)^{  1} = {\text{diag}}\left( {\left {c_{1} } \right^{  r} ,\left {c_{2} } \right^{  r} , \ldots ,\left {c_{k} } \right^{  r} } \right) \in \varvec{R}^{k \times k}\), and \(\left( {\varvec{D}^{c} } \right)^{  1} = {\text{diag}}\left( {\left {c_{1} } \right^{  c} ,\left {c_{2} } \right^{  c} , \ldots ,\left {c_{l} } \right^{  c} } \right) \in \varvec{R}^{l \times l}\). Meanwhile, we set \(\left( {\varvec{D}^{r} } \right)^{  1} = \varvec{D}_{1}^{r} \varvec{D}_{2}^{r}\), \(\left( {\varvec{D}^{c} } \right)^{  1} = \varvec{D}_{1}^{c} \varvec{D}_{2}^{c}\), then Eq. (19) can be expressed as follows:
where \(\varvec{M}^{r} = \left( {\varvec{D}_{1}^{r} } \right)^{\text{T}} \varvec{B}^{r}\), \(\varvec{N}^{r} = \left( {\varvec{D}_{2}^{r} } \right)^{\text{T}} \varvec{B}^{r}\), \(\varvec{M}^{c} = \left( {\varvec{D}_{1}^{c} } \right)^{\text{T}} \varvec{B}^{c}\), \(\varvec{N}^{c} = \left( {\varvec{D}_{2}^{c} } \right)^{\text{T}} \varvec{B}^{c} .\)
Finally, a second order NMF is applied in \(\varvec{J}_{k}\) and \(\varvec{J}_{l}\), aiming to factorize \(\varvec{W}\) to \(\varvec{W}\left( {\varvec{M}^{r} } \right)^{\text{T}}\) and \(\varvec{N}^{r}\), to factorize \(\varvec{H}^{\text{T}}\) to \(\varvec{H}^{\text{T}} \left( {\varvec{M}^{c} } \right)^{\text{T}}\) and \(\varvec{N}^{c}\). Therefore, the \(\varvec{B}^{r}\) and \(\varvec{B}^{c}\) matrix is obtained according to the second order NMF result. After that, the classifications in row and column are obtained from the \(\varvec{B}^{r}\) and \(\varvec{B}^{c}\) matrix.
Modified NMFbased Coclustering
As described in Section 5.1, nonnegative matrix \(\varvec{C}\) is factorized into two submatrices \(\varvec{W}\) and \(\varvec{H}\) in conventional nonnegative factorization. Although the physical meanings of \(\varvec{W}\) and \(\varvec{H}\) are clear: they represent the decomposition values in row and column respectively and promote the classification effect of coclustering, the relation between two directions is still illdefined. Hence, the typical NMF is improved and is given by
where \(\varvec{L}_{k \times l}\) is named as ‘the relation matrix‘, the value \(\varvec{L}_{ij}\) represents the link between the ith cluster in Task I and the jth cluster in Task II; \(k\) is the clustering number in the row vector and \(l\) is the clustering number in the column vector. In modified NMF, the cost function and the update functions can be rewritten as
where \(\otimes\) is the elementwise multiplication, \({\varvec{\Theta}}\) is the elementwise division; \(r\) represents the iteration; \(\varvec{W}^{ + }\) is the generalized inverse of \(\varvec{W}\): \(\varvec{WW}^{ + } \varvec{W} = \varvec{W}\); \(\varvec{H}^{ + }\) is the generalized inverse of \(\varvec{H}\): \(\varvec{HH}^{ + } \varvec{H} = \varvec{H}.\)
By introducing the matrix \(\varvec{L}\), the \(\varvec{W}\) and \(\varvec{H}\) are not required to be orthogonal in modified NMF strategy. Therefore, it expands the optional range of \(\varvec{W}\) and \(\varvec{H}\) and improves the factorization performance, which will be proved in Section 7.
GAbased Parameter Regulator
Three parameters need to be designed in the coclusteringbased GFD strategy, including: (1) the featuredimension \(N\) in STFT; (2) the weight factor \(\lambda\) in BIC algorithm; (3) the transitional dimension \(d\) in traditional NMFbased coclustering, which are listed in Table 2.
Among these three parameters, the featuredimension \(N\) in STFT can be decided from the GFD experiments. \(\lambda\) and \(d\) will be adjusted using the gradient ascent (GA) regulatory mechanism [35, 36], whose fundamental is shown in Figure 6. The main idea of gradient ascent algorithm is to follow the fastest changing direction to find the maximum of diagnostic accuracy. In Figure 6, four different initial points of \(\left( {\lambda ,d} \right)\) are listed as example, where the approximation curves ① and ② reach the global optimum while ③ and ④ are limited in the local optimum.
Meanwhile, three concepts are included in the gradient ascent model:
The step length \(sl\): it represents the speed along the gradient direction during the iteration. We initialize the step length as 0.02 in the parameter \(\lambda\) & \(d/{ \hbox{min} }\left( {m,n} \right)\), where \(0 \le \lambda \le 1\), \(0 < d/{ \hbox{min} }\left( {m,n} \right) \le 1;\)
The learning function: it has been designed in NMFbased coclustering classifier as:
$$\left\{ {L_{i}^{r} ,L_{i}^{c} } \right\} = NMF\left\{ {\varvec{x}_{i} ,\lambda , \left[ {d/{ \hbox{min} }\left( {m,n} \right)} \right]} \right\},$$(29)where \(\varvec{x}_{i}\) represents extracted STFT feature vector from the ith sample; \(NMF\left( \cdot \right)\) means the NMFbased coclustering classifier, with three input parameters \(\left\{ {\varvec{x}_{i} ,\lambda , \left[ {d/{ \hbox{min} }\left( {m,n} \right)} \right]} \right\}\); \(L_{i}^{r}\) and \(L_{i}^{c}\) means the clustering results of the ith sample in row and column.
The validity function: it is calculated by the sum of correct classifications, and it assesses the effectiveness of classification.
$$\begin{aligned} & Ac\left( {\lambda , \left[ {d/\hbox{min} \left( {m,n} \right)} \right]} \right) \\ & = \mathop \sum \limits_{i = 1}^{m} zer\left( {L_{i}^{r}  y_{i}^{r} } \right) + \mathop \sum \limits_{i = 1}^{n} zer\left( {L_{i}^{c}  y_{i}^{c} } \right), \\ \end{aligned}$$(30)$$zer\left( x \right) = \left\{ {\begin{array}{*{20}c} {0, x \ne 0,} \\ {1,x = 0,} \\ \end{array} } \right.$$(31)where \(y_{i}^{r}\) and \(y_{i}^{c}\) represents the label of the ith sample; \(zer\left( \cdot \right)\) means the zero sign function. Notice that, the validity function can only be obtained in those training samples, whose classification labels are known. The optimal \(\lambda\) and \(d/{ \hbox{min} }\left( {m,n} \right)\) is gained according to the training samples and is used in others, called testing samples.
It should be noted that in real GFD application, when performing GA algorithm: (1) If the step length is too large, the optimal parameter result might be skipped. But if the step length is too small, the iteration speed will be slow and cause too large computational load. (2) It is easy for the GA algorithm to be deep in the local optimum rather than the global optimum, which relies on the initial location of \(\left\{ {\lambda , d/{ \hbox{min} }\left( {m,n} \right)} \right\}\). Therefore, it is necessary to take these two factors into consideration to balance the computational accuracy and the time consumption.
Experiments and Performance Analysis
DDS Experimental System
The Spectra Quest’s Drivetrain Dynamics Simulator (DDS) was used in this study for experimental verification, as shown in Figure 7. This system is composed of six units including: (1) speed regulator; (2) the driving motor; (3) the planetary gearbox; (4) the reduction gearbox; (5) brake device; (6) brake regulator. The faults occur in those gears in planetary & reduction gearboxes, under varying rotating speed and load conditions, which are adjusted using the speed regulator and the brake regulator, respectively. Four types of gear faults are studied: (1) root cracks; (2) missing teeth; (3) chipped teeth; (4) surface wear. The purpose of GFD is to classify these faults through 7 vibration sensors (3axis for planetary gearbox; 3axis for reduction gearbox; 1axis for driving motor). In addition, in order to put the coclustering methods into effect, we define different levels in four task sets listed in Table 3, including: (1) the fault type task (C1‒C5); (2) the fault severity task (D1‒D4); 3) the speed regulator task (E1‒E5); (4) the load regulator task (F1‒F5). Specially, the rotating speed curve and the load curve in Figure 8 was also conducted.
GFD Experiments and Performance Analysis
Experimental Setup
The varying rotating speed and load are designed using the regulator curves in Figure 8 for the experiment. In order to enlarge the data analytical ability of algorithms, 10 repeat collections were implemented to increase the sample points to 10 times (5120 × 50× 10) in each group, which are segmented by the 2.5 s subsignals. Therefore, the sample number for fault type recognition can be gained in row clustering task (fault type) and column clustering tasks (rotating speed and load), and are listed in Table 4A and 4B.
In the fault type recognition experiments, several tests are compared using the models as follows: (1) Xmeans clustering; (2) Guassian Mixture Model (GMM) clustering; (3) NMFbased coclustering; (4) Modified NMFbased coclustering. For coclustering, the related parameters were set as: \(\lambda = 1\), \(d = 30.\)
Experimental Results and Discussion
Experiments are carried out, and the column task in Table 4 is considered to improve the effectiveness of row task in coclustering method. In order to assess the quality of models, the \({\text{Pr }}\left( {C_{i} } \right)\) and \({\text{Re }}\left( {C_{i} } \right)\) indexes of clustering results are calculated and given by
where the \({\text{Pr }}\left( {C_{i} } \right)\) index reflects the ability that a model identifies correct samples while the \({\text{Re }}\left( {C_{i} } \right)\) index reflects the ability that a model finds all correct samples.
At first, we observed the fault type recognition results using the NMFbased coclustering strategy under varying rotating speed and load environment, which are listed in Table 5. This table indicates that the diagnostic accuracy of coclustering strategy in various load conditions (97.8%) is better than that in various rotating speeds (96.3%). This can be explained by two possible reasons: (1) The classify boundary of the latter is more clear than the former since that the rotating speed presents a gradual change characteristic from 0 Hz to 40 Hz while the brake load jumps from 0 to 14.63 N·m; (2) Changing the rotating speed has a stronger interference effect than just changing the load to collected vibration signal, which have been verified in our previous study. Secondly, to illustrate clustering performance in the varying rotating speed model, the NMFbased coclustering results are listed in Table 6 (k = 6; l = 6) and Figure 9, where 200 samples are tested in each category. Some details can be seen here: (1) the misclassification cases always appear between ‘Health’ and ‘Surface’ or between ‘Root’, ‘Missing’ and ‘Chipped’ because the time domain features of the former are more similar but the frequency domain features of the latter are alike. For example, the ‘Chipped’ type is easy to be classified as the ‘Missing’ type if the crack of chipped tooth is large enough; (2) The 6th category (R2 and R3) occurs in the C2 type when the number of clustering is set as 6, which means the discrete ability and the inconsistency exists inside the ‘Root’ samples. Interestingly, although the differences exist in local precision and recall index of different categories, the total precision and recall are very nearly the same in these two tables.
Especially, we reclassify the fault types of testing samples in Table 4 using the modified NMFbased coclustering strategy. Here the normalized relation matrix \(\varvec{L}_{k \times l}\) (k = l = 5) can be gained and given by
Table 4A:  0.0801  0.0918  0.2563  0.0965  0.3236 
0.3285  3.7987  3.8255  3.2661  3.5314  
0.2155  6.9619  6.5298  6.6931  7.0714  
0.7031  6.8263  7.2954  6.7465  7.2176  
0.2250  0.1191  0.0545  0.2875  0.4563  
Table 4B:  3.7887  3.5302  4.1173  2.1937  2.4488 
3.7157  0.1592  3.4741  1.9078  2.2279  
1.9611  1.3846  1.5855  3.8276  3.2316  
3.2774  0.2309  4.7511  3.9760  3.5468  
0.8559  0.4857  0.1722  0.9344  3.7734 
Notice that, the element \(l_{ij}\) represents the relevance between the ith category in row task and the jth category in column task. In the \(\varvec{L}_{k \times l}\) matrix, the \(l_{ij}\) rises up to approximate 7 in ‘Missing’ & ‘Chipped’ fault type with the increase of rotating speed. The effect of speed regulator on different fault types presents the following rank: Missing\(\approx\)Chipped>Root>Health\(\approx\)Surface. However, there is a less link between load and fault types, seeing from the \(\varvec{L}_{k \times l}\) matrix under changing load environment. In order to further verify the necessity for modified NMF, Table 7 gives the evaluation indexes of this approach. It shows that an improved precision occurs in the varying rotating speed dataset but makes no difference in the varying load dataset. That is because the \(\varvec{L}_{k \times l}\) matrix in modified strategy cuts off the link between W and H in the former, which promotes the separation ability between row and column task. As can be seen in Table 7, the middle dimension of W and H is not limited in a single value, like traditional NMF method does, thus improving both the flexibility of selected dimension and the GFD precision (97.0% and 97.8%). Also, it can be known that the recognition performance of various loads (100%) is superior to rotating speed (95.3%) on account of the continuity of speed regulator.
Finally, concentrating on the varying working condition GFD, the traditional clustering strategies, including the Kmeans and the GMM methods, and coclustering approaches were compared using two selected indexes: the precision and the time consumption. The performance comparison results are listed in Table 8. According to this table, although the time complexity of single algorithm of 1D clustering is smaller than joint strategy (4.923 s < 7.991 s), the accumulation of computational load for two tasks is larger than coclustering proposed (4.923 s + 4.856 s > 7.991 s). Meanwhile, the coclustering have an apparent precision increase in varying working condition GFD, about 12.51% in varying rotating speed and 7.00% in varying load. Therefore, this experiment proves the superiority of coclustering in gear fault diagnosis under variable working conditions.
Parameter Regulation Experiments
STFT Dimension Adjustment Experiments
During the STFT dimension adjustment experiments, we adjusted the feature dimension N from 10 to 80 one by one, and then the diagnostic precisions of Task I as well as the time consumptions of coclustering model were observed and were drawn in Figure 10. It can be seen that the diagnostic accuracy increases from 78.76% to 97.54% when the feature dimension N increases from 10 to 80, meanwhile, the computational load indicates an exponential increase from approximately 16.78 s to 48.45 s. It can be seen from Figure 10 that N = 42 is considered as an appropriate dimension, in which the diagnosis accuracy is satisfactory enough (97.02%), while the time consumption keeps at a low level (25.12 s). Although the precision will continue to improve up to 97.54% if we continuously increase the feature dimension, but computational cost will also increase quickly.
BIC Algorithm Experiments
Based on the dataset from Drivetrain Dynamics Simulator (DDS) system, we tried to adjust the number of cluster in row and column, respectively. Generally, the search range of clustering number is from 2 to 10: 2 \(\le k \le\) 10; 2 \(\le l \le\) 10. Figure 11 illustrates an example of the BIC results in varying load GFD experiments when the clustering number changes from 2 to 10. It can be seen that the peak of BIC value (− 8124) exists at the point (5,5), which means that the optimal coclustering results happens when the row and the column clustering numbers both equal to five. According to the real dataset and the standard labels in Table 3, the BICbased estimation algorithm satisfies the requirements of practical gear fault diagnosis applications.
For further study on the BIC estimation algorithm, the BIC method was compared with a kind of traditional selfadapting classification estimation algorithm: Xmeans, which is an improved strategy of Kmeans. The estimations of clustering number in BIC as well as Xmeans algorithm are listed in Table 9. On one hand, for the BIC algorithm, with the increase of clustering number k, the diagnostic precision presents an increasing trend from 60.0% to 96.9% first, then it begins to slightly decrease when the clustering number k is larger than 5. On the other hand, by comparing the estimation results between Xmeans and BIC algorithm, we find the estimation of clustering number using the Xmeans is much larger than the normal (k = 12) and also its diagnostic precision is only 84.1%, which proves the superiority of BICbased clustering number estimation.
GA Parameter Regulator Experiments
As shown in Figure 4, the gradient ascent algorithm was used in the varying working condition datasets to obtain the weight factor \(\lambda\) and the transitional dimension \(d\). The regulator results of the gradient ascent algorithm are listed in Table 10. Here the final (\(\lambda\), \(d\)), the number of iterations as well as the precision of row task are observed using 4 initial (\(\lambda\), \(d\)) values: (0.5, 500), (0.5, 200), (0.7, 500), (0.7, 200). After gradient ascent, it could be seen that four initial values all reached the global optimum in varying load experiments. But in varying rotating speed experiments, only two points reach the global optimum, which achieves 99.3% diagnostic accuracy. In addition, it can be found that the less iterations occur when the distance between initial and final (\(\lambda\), \(d\)) point is short. Therefore, these tests prove that the performance of GA algorithm depends on the selected initial point to a great extent.
Conclusions
A NMFtheoretic coclustering strategy is presented in this paper to offer a fast multitasking solution to solve the gear fault diagnosis problem under variable working conditions. Here the timefrequency features are extracted from the STFT spectrogram, and are utilized to structure the 2D matrix for joint clustering. Experiments indicate that 97.02% diagnostic precision can be achieved when the STFT dimension is set as 42. Meanwhile, seeing from the results of the BICbased optimal clustering number estimation, they are close to the practical categories, no matter in varying rotating speed or varying load dataset. After NMF, row and column clustering task can be identified at the same time, with approximately 10% improved accuracy and less time cost compared with those single task clustering algorithms, such as Xmeans and GMM algorithm. There is an internal connection in most of gear failure signals. The proposed coclustering strategy has better performance than independent clustering strategy because the modified NMF helps to provide a relation matrix, which shows a strong correlation between different rotating speeds and fault types. Therefore, the NMFbased coclustering has a good potential to apply in the gear fault diagnosis of largescale rotating machines under varying working conditions. In the future, coclustering with higher dimension will probably apply in the more complex working conditions or more diagnostic tasks to improve the gear fault diagnosis performance.
Abbreviations
 GFD:

gear fault diagnosis
 NMF:

nonnegative matrix factorization
 STFT:

shorttime Fourier transform
 BIC:

Bayesian information criterion
 GA:

gradient ascent
 DDS:

Drivetrain Dynamics Simulator
References
G B Wang, Z J He, X F Chen. Basic research on machinery fault diagnosiswhat is the prescription. Journal of Mechanical Engineering, 2013, 49(1): 63–72. (in Chinese)
S P Yang, Z H Zhao. Improved wavelet denoising using neighboring coefficients and its application to machinery fault diagnosis. Journal of Mechanical Engineering, 2013, 49(17): 137–141. (in Chinese)
X H Jin, Y Sun, J H Shan, et al. Fault diagnosis and prognosis for wind turbines: An overview. Chinese Journal of Scientific Instrument, 2017, 38(5): 1041–1053.
MMM Islam, J Kim, SA Khan, et al. Reliable bearing fault diagnosis using Bayesian inferencebased multiclass support vector machines. The Journal of the Acoustical Society of America, 2017, 141(2): EL89–EL95.
A Singh, A Parey. Gearbox fault diagnosis under fluctuating load conditions with independent angular resampling technique, continuous wavelet transform and multilayer perceptron neural network. IET Science, Measurement & Technology, 2017, 11(2): 220–225.
S H Kia, H Henao, G A Capolino. Fault index statistical study for gear fault detection using stator current space vector analysis. IEEE Transactions on Industry Applications, 2016, 52(6): 4781–4788.
R Yan, RX Gao, X Chen. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Processing, 2014, 96(PART A): 1–15.
C Chen, F Shen, R Q Yan. Enhanced least squares support vector machinebased transfer learning strategy for bearing fault diagnosis. Chinese Journal of Scientific Instrument, 2017, 38(1): 33–40.
M Yuwono, Y Qin, J Zhou, et al. Automatic bearing fault diagnosis using particle swarm clustering and Hidden Markov Model. Engineering Applications of Artificial Intelligence, 2016, 47: 88–100.
F Pacheco, M Cerrada, R V Sánchez, et al. Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery. Expert Systems with Applications, 2017, 71: 69–86.
B Mirkin. Mathematical classification and clustering. Journal of the Operational Research Society, 1997, 48(8): 852.
A Tanay, R Sharan, M Kupiec, et al. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proceedings of the National Academy of Sciences, 2004, 101(9): 2981–2986.
M Ailem, F Role, M Nadif. Graph modularity maximization as an effective method for coclustering text data. KnowledgeBased Systems, 2016, 109: 160–173.
S Schmidt, S Schnitzer, C Rensing. Text classification based filters for a domainspecific search engine. Computers in Industry, 2016, 78: 70–79.
Y Xu, V Olman, D Xu. Clustering gene expression data using a graphtheoretic approach: An application of minimum spanning trees. Bioinformatics, 2002, 18(4): 536–545.
S Javadi, S M Hashemy, K Mohammadi, et al. Classification of aquifer vulnerability using Kmeans cluster analysis. Journal of Hydrology, 2017, 549: 27–37.
C Xu, P L Zhang, G Q Ren, et al. Engine wear fault diagnosis based on improved semisupervised fuzzy Cmeans clustering. Journal of Mechanical Engineering, 2011, 47(17): 55–60.
J Q Zhang, G J Sun, L Li, et al. Study on mechanical fault diagnosis method based on LMD approximate entropy and fuzzy Cmeans clustering. Chinese Journal of Scientific Instrument, 2013, 34(3): 714–720.
A J Gallego, J CalvoZaragoza, J J ValeroMas, et al. Clusteringbased knearest neighbor classification for largescale data with neural codes representation. Pattern Recognition, 2018, 74: 531–543.
T T Van, T M Le. Contentbased image retrieval based on binary signatures cluster graph. Expert Systems, 2017: e12220.
D Fitrianah, A N Hidayanto, H Fahmi, et al. STAGRID: A spatio temporal grid density based clustering and its application for determining the potential fishing zones. International Journal of Software Engineering and its Applications, 2015, 9(1): 13–26.
A H Pilevar, M Sukumar. GCHL: A gridclustering algorithm for highdimensional very large spatial data bases. Pattern Recognition Letters, 2005, 26(7): 999–1010.
C F Tsai, C W Tsai, H C Wu, et al. ACODF: A novel data clustering approach for data mining in large databases. Journal of Systems and Software, 2004, 73(1 SPEC. ISS.): 133–145.
K Y Chen, L S Chen, M C Chen, et al. Using SVM based method for equipment fault detection in a thermal power plant. Computers in Industry, 2011, 62(1): 42–50.
R E Precup, P Angelov, B S J Costa, et al. An overview on fault diagnosis and natureinspired optimal control of industrial process applications. Computers in Industry, 2015, 74: 1–16.
M Arumugam, J Raes, E Pelletier, et al. Enterotypes of the human gut microbiome. Nature, 2011, 473(7346): 174–180.
R Bhola, N H Krishna, K N Ramesh, et al. Detection of the power lines in UAV remote sensed images using spectralspatial methods. Journal of Environmental Management, 2018, 206: 1233–1242.
F Gasperini, J M Forbes, E N Doornbos, et al. Wave coupling between the lower and middle thermosphere as viewed from TIMED and GOCE. Journal of Geophysical Research A: Space Physics, 2015, 120(7): 5788–5804.
L Wang, Y Zhang, S Zhong. Typical process discovery based on affinity propagation. Journal of Advanced Mechanical Design, Systems, and Manufacturing, 2016, 10(1): JAMDSM0001–JAMDSM0001.
W Zhang, X Wu, W P Zhu, et al. Unsupervized image clustering with SIFTbased softmatching affinity propagation. IEEE Signal Processing Letters, 2017, 24(4): 461–464.
A Barcaru, H G J Mol, M Tienstra, et al. Bayesian approach to peak deconvolution and library search for high resolution gas chromatography – Mass spectrometry. Analytica Chimica Acta, 2017, 983: 76–90.
J K Liu, H M Schreyer, A Onken, et al. Inference of neuronal functional circuitry with spiketriggered nonnegative matrix factorization. Nature Communications, 2017, 8:149, https://doi.org/10.1038/s41467017001569.
Yuguang Niu, Shilin Wang, Ming Du. A combined Markov chain model and generalized projection nonnegative matrix factorization approach for fault diagnosis. Mathematical Problems in Engineering, 2017, 2017: 1–7.
H Kim, H Park. Sparse nonnegative matrix factorizations via alternating nonnegativityconstrained least squares for microarray data analysis. Bioinformatics, 2007, 23(12): 1495–1502.
A V Gorshkov, T Calarco, M D Lukin, et al. Photon storage in type optically dense atomic media. IV. Optimal control using gradient ascent. Physical Review A  Atomic, Molecular, and Optical Physics, 2008, 77(4).
G Rancan, T T Nguyen, S J Glaser. Gradient ascent pulse engineering for rapid exchange saturation transfer. Journal of Magnetic Resonance, 2015, 252: 1–9.
Funding
Supported by National Natural Science Foundation of China (Grant No. 51575102) and Jiangsu Postgraduate Research Innovation Program (Grant No. KYCX18_0075).
Author information
Authors and Affiliations
Contributions
RY designed the experiment, FS and CC analyzed the data. All authors read and approved the final manuscript.
Authors’ Information
Fei Shen received his B.Sc. and M.Sc. degree from Southeast University, China, in 2014 and 2016 respectively. Now he is pursuing his PhD degree at School of Instrument Science and Engineering, Southeast University, China. His main research interest is machine fault diagnosis.
Chao Chen received his B.Sc. and M.Sc. degree from Jiangsu University, China, in 2011 and 2014 respectively. Now he is pursuing his PhD degree at School of Instrument Science and Engineering, Southeast University, China. His main research interest is machine fault diagnosis.
Jiawen Xu is currently an associate researcher at School of Instrument Science and Engineering, Southeast University, China.
Ruqiang Yan received his B.Sc. and M.E. degree from University of Science and Technology of China in 1997 and 2002 respectively, and received his Ph.D. degree in 2007 from University of Massachusetts, USA. Now he is a professor and a Ph.D. supervisor at Xi’an Jiaotong University, China. His main research interests include machine condition monitoring and fault diagnosis, signal processing, and wireless sensor networks.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing financial interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shen, F., Chen, C., Xu, J. et al. A Fast Multitasking Solution: NMFTheoretic Coclustering for Gear Fault Diagnosis under Variable Working Conditions. Chin. J. Mech. Eng. 33, 16 (2020). https://doi.org/10.1186/s10033020004373
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/s10033020004373
Keywords
 Gear fault diagnosis
 Nonnegative matrix factorization
 Coclustering
 Varying working conditions