Digital Twin-based Quality Management Method for the Assembly Process of Aerospace Products with the Grey-Markov Model and Apriori Algorithm

The assembly process of aerospace products such as satellites and rockets has the characteristics of single- or small-batch production, a long development period, high reliability, and frequent disturbances. How to predict and avoid quality abnormalities, quickly locate their causes, and improve product assembly quality and efficiency are urgent engineering issues. As the core technology to realize the integration of virtual and physical space, digital twin (DT) technology can make full use of the low cost, high efficiency, and predictable advantages of digital space to provide a feasible solution to such problems. Hence, a quality management method for the assembly process of aerospace products based on DT is proposed. Given that traditional quality control methods for the assembly process of aerospace products are mostly post-inspection, the Grey-Markov model and T-K control chart are used with a small sample of assembly quality data to predict the value of quality data and the status of an assembly system. The Apriori algorithm is applied to mine the strong association rules related to quality data anomalies and uncontrolled assembly systems so as to solve the issue that the causes of abnormal quality are complicated and difficult to trace. The implementation of the proposed approach is described, taking the collected centroid data of an aerospace product’s cabin, one of the key quality data in the assembly process of aerospace products, as an example. A DT-based quality management system for the assembly process of aerospace products is developed, which can effectively improve the efficiency of quality management for the assembly process of aerospace products and reduce quality abnormalities.

rate, and direct data such as product length, weight, and assembly errors. Assembly quality data include data that can reflect product quality, such as quality cost loss, production batch, inventory backlog, and invalid operation time [3]. These data are the basis for the evaluation of assembly quality, which can be used to measure product quality and provide guidance for the continuous improvement of assembly quality.
Quality management is a process of organizing and coordinating related activities to meet certain quality requirements, i.e., to manage and control the above quality data. The complete quality management process aims to grasp the status of quality data, predict its future status, and adjust the assembly process accordingly, so as to maintain quality within a reasonable range. Assembly quality data are collected, stored and managed through a manufacturing execution system (MES) [4]. However, assembly quality management adopts post-inspection, i.e., checking and controlling the quality of the final product. If the data meet the standard, the product is warehoused or it enters the next process, and otherwise it is returned for repair or scrap. Assembly process quality management for aerospace products has issues to be resolved.
(1) Quality prediction when the sample volume of data is small. A large amount of quality data is generated in the assembly of aerospace products. Their variety is wide, and the volume of a particular type of data may be insufficient for forecasting. Accuracy of data forecasting and sample data volume show some positive correlation. Hence, it can be difficult to obtain accurate predictions. (2) Quick location of abnormal causes when influence factors of quality data are complicated. The assembly quality of aerospace products is affected by the factors of man, machine, material, method, measurement, and environment (5M1E), each having several influencing factors, making it difficult to directly determine specific causes of abnormal quality data. (3) Quality control is time-consuming and has poor real-time performance. Assembly quality control is implemented based on the quality management process of post-optimization, including problem definition, investigation and measurement of relevant factors, analysis and determination of key factors, and control and improvement of influence factors. This can take a long time, precluding the avoidance of imminent quality problems.
The concept of digital twin (DT) was first proposed by the Air Force Research Laboratory in 2011, and was first applied to industry [5]. Its purpose is to optimize the simulation through virtual environment before production to avoid undesirable conditions. With the deepening of research, DT technology has been considered the key to realizing the interaction and fusion between the physical and information worlds [6,7]. It provides a clear path for the implementation of a cyber-physical system (CPS) and introduces the idea of making full use of digital space to mirror, predict, guide and control the physical space [8]. Additionally, it provides a novel way to assist humans in understanding the physical world from a multi-time dimension including the past, present and future. Hence, the application of DT technology to the quality management in the assembly process of aerospace products can realize quality status monitoring, prediction, and anomaly traceability in virtual spaces based on the collected real-time data and can enable adjustments in physical spaces. It conforms to the basic ideas of status monitoring and prediction, anomaly traceability, and the timely regulation of quality data. Therefore, DT provides a feasible approach to realize quality management of assembly process for aerospace products.
The rest of this paper is organized as follows. Section 2 summarizes the state-of-the-art of quality data prediction, traceability technology, and DT applications in the manufacturing phase. In Section 3, the implementation framework of quality management for the assembly process of aerospace products based on DT is established. Three key technologies are discussed in Section 4, including numerical prediction of quality data based on Grey Markov model, assembly system statues prediction based on T-K statistical control chart, and association rule mining for quality abnormalities based on the Apriori algorithm. In Section 5, taking the centroid data of an aerospace product's cabin, one of the key quality data in the assembly process of aerospace products, as an example, the application process of the proposed method is elaborated in detail. Additionally, a DT-based assembly process quality management system for aerospace products is developed and verified. Finally, the main contributions of the paper and future work are summarized in Section 6.

Related Work
This paper focuses on the use of DT technology to achieve quality management for the assembly process of aerospace products. We start by reviewing the quality management technology including quality data prediction and traceability, and pointing out the future trend of taking full advantage of digital space, including low cost, high efficiency, multiple iterations, and predictability, to promote it. And then the state-of-the-art of DT applications in the production phase is overviewed.

Quality Data Prediction and Traceability Technology
The key to quality management is to forecast the future development trend of quality data, and even its specific value. Data prediction includes qualitative prediction, i.e., of change trends, and quantitative prediction, i.e., of specific values. We study quantitative prediction for assembly quality management of aerospace products. Quantitative data prediction includes time series analysis and causal analysis. Time series analysis methods include the simple sequential time mean, weighted sequential time mean, moving average, weighted moving average, exponential smoothing, seasonal trend prediction, and market life cycle prediction. Taylor et al. analyzed several time series algorithms based on intraday power demand data from 10 European countries and found the prediction results of exponential smoothing to be best [9]. Guo et al. presented a chaotic time series prediction algorithm to predict wind speed, modified the model with the parallel rule algorithm, and verified the method with real wind data [10]. Time series analysis focuses on predicting large amounts of time series data, but the volume of assembly quality data for aerospace products is generally small and does not necessarily have strict time series. Causal analysis includes linear regression, support vector machines, neural network prediction, Grey prediction, and Markov prediction. To improve the data prediction accuracy, Zhou et al. investigated the fine tuning approach for the parameters of least-squares support vector machines to predict one-step ahead short-term wind speed [11]. For the case of TFT-LCDs which is a small sample size prediction problem, Li et al. developed a new approach involves three-steps. They are K-means clustering, attribute extension using the fuzzy membership function in each cluster, and put the data with new generate attributes into a backpropagation neural network (BPNN) machine learning algorithm [12]. Additionally, grey prediction well also discovers the intrinsic correlation behind the hazy phenomenon and is suitable to predict small datasets. Li et al. used trend price tracking to extract hidden information on the behavior of manufacturing sample data and constructed an adaptive Grey prediction model, AGM(1,1), to forecast industrial data [13]. Chang et al. presented a modified grey forecasting model to forecast the short-term manufacturing demand [14]. Markov models can also be used to predict the trends of small datasets and analyze the future trends of discrete random processes, i.e., to predict the future state of a variable from its present state and change trend [15]. However, how to improve prediction accuracy based on small volumes of sample data is an issue. The most commonly used approach for quality data tracing, the relational data model, builds the corresponding management model and database in the assembly process to trace abnormal quality data. Materials are marked to trace quality data. Zhuang et al. realized material traceability and management in the assembly process based on workflow technology [16]. Besides, pictures and videos data are collected to initiatively discover the existing quality problems and trace the causes of quality abnormities. On this basis, vision-based recognition methods from a feature perspective are applied to discover the defect [17]. Such methods achieve traceability of quality data through correlation but do not mine the complicated association between multi-source heterogeneous scattered data, so the efficiency of quality problem traceability is low.
However, due to the unstable process, long assembly and adjustment period, and strict quality control in the R&D stage, quality management still has problems such as much rework and repair, and difficulty tracing quality issues. With the rapid development of new-generation information technology such as the Internet of Things, big data, and artificial intelligence, the deep integration of digital and physical space has become a common bottleneck in various countries' industrial strategies, including Industrial 4.0, the Industrial Internet, and "Made in China 2025. " Therefore, to fully utilize the advantages of digital space to improve the quality management of the assembly process for aerospace products will be the development trend [18][19][20].

Overview of DT Applications in the Production Phase
DT technology has been widely applied in all stages of the product lifecycle [21,22], including design [23], manufacture [24], and service [25]. In the production stage, Tao et al. introduced the digital twin shop-floor (DTS), a paradigm for shop-floor instances of cyber-physical production systems (CPPSs) [26]. Leng et al. incorporated DT in the parallel control of automated manufacturing systems [27]. Park et al. built a DT-based CPPS architectural framework for personalized production [28]. Bao et al. investigated an ontology-based modeling and evolution method of DT for the assembly shop-floor to deal with for the issues such as discreteness of assembly process, diversity of assembly resource, and complexity of dataflow in the assembly task execution [29]. Son et al. presented a DT-based CPS for abnormal scenarios involving automotive body production lines, which can forecast whether a product can be manufactured where abnormal scenarios occur [30]. Zhang et al. presented an information modeling method for a CPPS based on DT and AutomationML, which integrated various physical resources into CPPS to support information interaction between resources [31]. Yildiz et al. discussed the demonstration and evaluation of a DT-based virtual factory [32]. Additionally, Sun et al. investigated DT-based assembly commission approach for high precision products to solve the issues of low assembly efficiency and poor-quality consistency caused by traditional manual method [33]. Zhang et al. discussed the hybrid prediction approach of physical model and data to achieve quality assurance for composite components [34]. DT technology is the key in the realization of the virtual-physical fusion of CPS, which makes full use of the digital and physical spaces, and can improve the assembly quality control of aerospace products. In the assembly process of aerospace products, not only the DT model can truly and dynamically mirror the quality status and process of the physical counterpart, but also can be applied to achieve quality anomaly traceability and quality status prediction. Hence, the DT can be used to assist shop-floor managers in physical space to carry out quality optimization to effectively avoid some quality anomalies and reduce quality problem processing time. The above research uses DT in the production phase but does not address its application to assembly process quality management and control.

Framework of DT-based Quality Management for the Assembly Process of Aerospace Products
There are two main quality abnormalities in the assembly process of aerospace products: abnormal product quality data and uncontrolled assembly systems. The first case indicates quality problems in the assembly process. The latter is a hidden quality problem. If the assembly system is uncontrolled, quality anomalies are more likely in the subsequent assembly process. Considering these two anomalies, we propose a framework of quality management for the assembly process of aerospace products based on DT, as shown in Figure 1.
On the physical assembly shop-floor, a shop-floor Internet of Things (IoT), including RFIDs, sensors, barcodes, industrial Ethernet and wireless network, is constructed to realize real-time perception of manufacturing resources, quality data collection and transmission.
On the shop-floor data layer, real-time quality data are dynamically collected based on the built shop-floor IoT and managed based on product assembly BOM, which include inspection data, measurement data, assembly process parameters, environmental data, equipment operation data, material usage data, process completion data, and technical problem data. To ensure the realtime quality data, the inspection data and measurement data, such as centroid, moment of inertia, and weight, are automatically collected and transmitted using the corresponding inspection and inspection equipment. The assembly process parameters, environmental data and equipment operation data are obtained using the corresponding sensors such as displacement, speed, temperature and humidity sensors. The material usage data are acquired through RFID (Radio Frequency Identification) and scanning the barcode corresponding to the material. The process completion data and technical problem data are collected with human computer interaction.
On the virtual assembly shop-floor, a DT model of quality management for the assembly process of aerospace products is constructed. The DT model is composed of two parts: one is the shop-floor visualization model, which can be used for quality monitoring, and the other is the quality prediction and traceability model, which can be used for calculation and decision-making. Therefore, the virtual shop-floor contains two levels.
At the DT-based monitoring level, based on the shopfloor visualization model, data and model visualization are applied to mirror and monitor the product quality status based on the built DT model of the shop-floor, and warnings are issued if abnormalities occur.
At the DT-based quality prediction and traceability level, a Grey-Markov model forecasts the future value of quality data according to historical and current data. An abnormal predicted value indicates quality abnormality in the current assembly process, triggering cause traceability; if the predicted value is normal, verification will continue. Then, according to the Grey-Markov forecasting results and historical quality data values, observation samples are selected from different batches, a T-K statistical control chart is established according to the sample data, changes of the mean and standard deviation of quality sample data are observed, and it is predicted whether the assembly system will be within the control range at the next moment. If an uncontrolled assembly system is forecast, quality anomalies are highly likely during subsequent assembly, triggering the cause traceability process of quality abnormality; if the assembly system is forecast to be in control, the process is repeated until no products are available. When the cause tracing process of quality abnormality is triggered, it is necessary to analyze factors that may affect the quality data on the assembly shop-floor according to 5M1E. The values of these factors, together with quality data values and assembly system status, form a project set. We then use the Apriori algorithm to mine the strong association rules related to quality data abnormalities and an uncontrolled assembly system. Through these strong association rules, the influencing factors related to quality anomalies are traced and stored in the quality management knowledge base.
The key technologies illustrated in Section 4 focus on relevant technologies included in the level of DT-based quality prediction and traceability, which contains numerical prediction of quality data using a Grey-Markov model, status prediction of the assembly system based on a T-K control chart, and mining of association rules for quality abnormities based on the Apriori algorithm.

Grey Model
When only part of the information in a system is known, the unknown information can be predicted using the Grey model (GM), i.e., a Grey system theory model. While appearing to be random, the quality data generated during the assembly process of aerospace products is actually ordered and time-dependent, so the Grey model can be used.
For the quality data of the assembly process of aerospace products, a GM(1,1) model is established, i.e., a first-order linear differential equation.
The GM(1,1) modeling process for assembly quality data of aerospace products is as follows. Assume the original series of quality data is: Calculate the scale ratio of the original series of quality data If all scale ratios of quality data fall within the accept- , the original series of quality data y (0) can be used to establish the GM(1,1) model. Otherwise, the data must be transformed, such as through a shift transformation where c is a constant that can bring all scale ratios fall within acceptable coverage.
New series can be obtained by ratio validation and processing of original data: Accumulate the original quality data to weaken fluctuation and randomness that may exist in the random series to obtain the quality data accumulation series Because the solutions of first-order differential equation show an exponential growth trend, similar to that of the sequence x (1) (t) , the sequence x (1) is considered to satisfy the first-order differential equation.
where a is the development coefficient, the effective interval is (− 2, 2), and u is the Grey action, and both are undetermined factors. As long as the parameters a and u are obtained, x (1) (t) can be obtained, as can the predicted value of x (0) .
According to the definition of derivative, i.e., (1) The following matrices are obtained: Let Solving the Grey parameters by the least squares method, we obtain We substitute a into Eq. (6) to obtain which is the time response function model of GM(1,1). On this basis, the forecasting equation of the original quality data is obtained as After establishing the Grey model, it is necessary to check whether it can be used to forecast the target data. Three test methods are selected: residual, correlation, and posterior. (

1) Residual Test
Residual test is relatively intuitive and only needs to compare the predicted and original values and observe whether the relative error can meet the requirements.
The residual of the original data column x (0) (t) and predicted data column The relative error t and average relative error are calculated as The fitting accuracy is The Grey prediction model of quality data passes the residual test if p is greater than 80%. (

2) Correlation Degree Test
The correlation degree test is a geometric test to study the similarity of model curves of the original and predicted value. The more similar the geometry of the two curves, the more the values are correlated.
The correlation coefficient of the original quality data column x (0) (t) and predicted quality data column where ρ is the resolution coefficient, usually taking a value in (0,1). A larger ρ indicates a smaller difference between correlation coefficients, and weaker discrimination ability.
The correlation degree between the original quality data column x (0) (t) and predicted quality data column The closer r is to 1, the better the forecast accuracy. If r is greater than 0.6, then Grey prediction passes the correlation degree test. (

3) Posterior Variance Test
A posterior variance test is based on the probability distribution of the residual predicted by quality data.
We calculate the variance of the original quality data s 1 2 , and of the prediction residuals s 2 (20) Then the ratio of the mean squared error (MSE) is The residual probability is As shown in Table 1, a smaller C and larger P indicate a more accurate Grey model. Grade I indicates the highest prediction accuracy, and Grade IV the lowest. Generally, if a prediction model is evaluated as Grade I, II, or III, it can be considered to pass the posterior variance test.

Revision of Predicted Residuals Using Markov Model
The assembly quality data of aerospace products are greatly affected by external information such as operators and the operating environment. This external information is considered random, so the correlation between the changes of quality data is not strong. Therefore, the Markov method is used to forecast and correct the value residuals in the Grey model.
The residuals of Grey predicted values are divided into different states, and a state transition matrix is established, which is composed of all one-step transition probabilities of random processes, where p ij is the one-step transition probability from state i to state j.
It is worth noting that all elements in the state transition matrix are nonnegative, and they sum to 1. The (21) (22) C = s 2 s 1 .
(23) P = P e (0) (t) < 0.6745s 1 .  Figure 2, is convenient for observing the transition probability of each residual state of quality data. Figure 3 shows the process of correcting residual values of Grey forecast results using a Markov prediction model.
After confirming that the variation of residuals of assembly quality data is a Markov process, it is necessary to collect residual data and classify the residual state. The state transfer matrix is built dynamically according to the specific changes of quality residual data and is used to solve the prediction state of the assembly quality data residuals of aerospace products. On this basis, the quality data obtained by the Grey forecasting model are corrected.

Assembly System Status Prediction Using T-K Statistical Control Chart
The T-K control chart does not require a large sample and is independent of the standard deviation of the parent. It is applicable to predict the assembly system status.

T-control Chart for Monitoring the Mean of Quality Data
The T-statistic is used to monitor the fluctuation of the mean value of quality data and is suitable for small samples. In the actual assembly process of aerospace products, the mean value of certain quality data is usually uncertain and will change constantly. Therefore, when constructing T statistics, it is assumed that the mean value of quality data is unknown. X is set as the quality data items. 15 observation samples are selected for each batch of products and {X (r) i,j,1 , . . . , X (r) i,j,n } as the group i sample, where i=1, 2, …; j indicates the serial number of the product type corresponding to the batch sample, j=1, 2, …, P; n is the sample size; and the superscript r indicates the serial number of a batch of the same type of product.
Quality data within a batch and between batches are independent, sample data of the same variety obey the same normal distribution, and sample data of different varieties obey different normal distributions, i.e., (25) X i,j,k ∼ N (µ j , σ j ), i = 1, 2, . . . , j = 1, 2, . . . ; j = 1, 2, . . . , P; k = 1, 2, . . . n.  where G −1 t (·|n − 1) is the inverse function of the cumulative T-distribution function with degree of freedom n−1, and α is the significance level. According to statistical process control theory, the upper and lower control limits correspond to the positions of ±3σ, so α is 0.0027.
During the assembly of aerospace products, if an assembly system is in a controlled state and the mean value of quality data does not deviate, for different kinds of products, as long as the sample size of each group is the same, T-statistics calculated from groups of sample data with the same sample size will be independent of each other, subject to the same T-distribution, and with the same control limits. T-statistics and control limits calculated from each batch of quality data can be used to plot the T-control chart and monitor assembly quality.

K-control Chart for Monitoring the Standard Deviation of Quality Data
The K-statistic is used to monitor the fluctuation of the standard deviation of quality data and is suitable for small (26) samples. Similar to the process of calculating T-statistics, the mean of quality data is considered unknown.
The mean of the variance of sample quality data for the first r−1 batches of different product types is

The intermediate variable is defined as
The K statistic is where F ν1, ν2 are cumulative distribution functions of the F distribution with first and second degrees of freedom ν1 and ν2, respectively.
Since the intermediate variable in Eq. (33) follows the F-distribution with first degree of freedom (n−1) and second degree of freedom (n−1) (r−1), the K-statistics are independent of each other and follow a normal distribution, and the control limits of the K-control chart are:

Analysis of Quality Influencing Factors
Quality abnormalities occurring in the assembly process of aerospace products may be caused by human, equipment, or environmental factors in the preassembly, assembly, or inspection stage, and the influencing factors of quality abnormalities differ by their type. We use the 5M1E analysis method to classify the influencing factors of quality anomalies into six categories. can affect quality. Because the volume of aerospace products is large, the equipment of each workstation is usually fixed, so machine factors relate to an assembly workstation.

Principle of Apriori Algorithm
An association rules algorithm is used to mine the hidden correlation behind complex data, the classic being the Apriori algorithm, which is used as follows. Filter all items in a transaction dataset that are greater than or equal to the minimum support degree. Then, association rules are generated based on the most frequent item set and filtered according to the minimum confidence level to obtain strong association rules. According to shop-floor assembly data, Apriori was selected to determine the relationships between influencing factors and assembly quality data values and system status. An association rule is evaluated by support, confidence, and lift. Support is the probability that items X and Y occur simultaneously in a project set, i.e., the ratio of the number of items including X and Y to the number of all items. It describes the universality of association rules and is calculated as Confidence is the probability that item Y will occur when item X occurs, i.e., the ratio of the number of items containing X and Y to the number of items containing X.

It describes the authenticity of association rules and is calculated as
Lift is a parameter that describes the probability change of item Y occurrence due to the emergence of item X, i.e., the ratio of confidence to support of item X → Y , If the lift is 1, then there is no correlation between the two events; if it is less than 1, then events X and Y are incompatible.

Data Selection
Taking the assembly process of an aerospace product as an example, the centroid offset data of cabin A of model S1 are collected to verify the proposed key technologies and method. It is known that cabin A of model S1 has completed three batches of assembly since the last production environment change. In each batch 15 products are assembled, and the assembly of the 15th product in the fourth batch is currently underway. If the assembly time of each product is the same, the allowable error of the centroid offset of cabin A is 15 mm. The collected centroid offset data of cabin A of the fourth batch are shown in Table 2.
From the above data, we use 13 centroid offset data items, from A4-1 to A4-13 as raw data, A4-14 as validation data, and A4-15 as predicted data. The original series of centroid offset are obtained: The scale ratio of the original series x (0) for centroid offset is calculated as According to the Grey prediction model, the tolerant coverage interval of the ratio is obtained as By observing the scale ratio of the centroid offset, it is found that each item of the ratio series falls within the admissible coverage interval, which indicates that the GM(1,1) model is applicable for the original series of centroid offsets. (38)

Establishment of Grey Forecasting Model
The original sequence of the centroid offset is accumulated to weaken the volatility and randomness that may exist in the random sequence, and the accumulated sequence of centroid offsets is obtained as According to the Grey model, the cumulative sequence of centroid offsets is processed, and the matrix B and constant vector y n are obtained: The Grey parameter matrix â is calculated as Substituting â into the prediction model of the centroid offset, the prediction function of X (1) (t) is obtained as (41) (42) . . .
Expressions x (1) (t + 1) and x (1) (t) are discretized and restored to the original sequence of centroid offsets, whose prediction sequence is The predicted serial value of centroid offsets is obtained as

Grey Prediction Model Test
(1) Residual Test The predicted and original series of centroid offsets are put together, and the residual and relative error of each data item is calculated to obtain the fitting results in Table 3.
The fitting curve of centroid offsets drawn from the above fitting results is shown in Figure 4.
The calculated average relative error is 4.16%, and the fitting accuracy is 95.84%. Because the fitting accuracy is more than 80%, the forecast result passes the residual test.
.314 , and ρ = 0.5 . According to Eq. (18), the corresponding correlation coefficients are The correlation degree between the two sets is 0.794, which is greater than 0.6; hence, the prediction result passes the correlation degree test. (

3) Posterior Deviation Test The column of residuals extracted from the test results is
We calculate that the variance of the original series y (0) (t) of the centroid offset is s 1 2 = 8.284 , the variance of the residuals is s 2 2 = 0.165 , and the ratio of the mean variance is C = 0.141 . According to the definition, the probability of small error is P = 1.
Referring to Table 1, we can see that the mean variance and small probability error calculated above are in the range of Level I, but the average relative error is in the range of Level II.

Case of Grey-Markov Model
Based on the Grey prediction results, it can be found that the minimum residual error of the predicted and   Table 4.
It is assumed that the predicted result of centroid offset will transfer between States I and V with a certain probability. The state transition diagram of the predicted results is shown in Figure 5. If the current forecasting result is State I, then the next forecasting result has a Z11 probability become State I, a Z12 probability become State II, a Z13 probability become State III, a Z14 probability become State IV, and a Z15 probability become State V.
From the column of residual error values of the centroid offset, we can see that State I occurs three times,    The state transition of residual error is shown in Table 5. It is assumed that the time interval of the measurement for each two cabins' centroid offset is one assembly cycle, i.e., the step size of the Markov model, which is the time required for state transition. According to the statistics above, the state transition matrix of the residual error can be obtained as According to the state of residual error, the residual error of prediction values for A4-13 belongs to state IV, so the initial state can be considered to be (0, 0, 0, 1, 0). According to the state transition matrix of Eq. (50), the state after one-step transition is (2/5, 0, 1/5, 1/5, 1/5), so the residual error of the prediction value of A4-14 is state I.
We correct the predicted centroid offset value of A4-14 from the Grey model: The results are compared with those of the Grey prediction model, as shown in Table 6.
It is found that, unlike the Grey prediction model, the Grey-Markov prediction model takes into account the fluctuation of data, and it reduces the relative error of centroid offset prediction from 3.44% to 1.38%, which improves the accuracy of prediction.
We correct the Grey forecast value of A4-15: Therefore, it forecasts that the value of the centroid offset for cabin A of the 15th product will be 16.890 mm, exceeding the allowable error of 15 mm, and abnormal quality will occur soon.

Raw Data Preprocessing
Based on 14 centroid offset data items of the fourth batch of cabin A of model S1 in Table 2 and the predicted data of product 15, we can obtain 15 data items for the fourth batch of cabin A. With the centroid offset data of the first three batches of cabin A of model S1, 60 pieces of centroid offset data are obtained, as shown in Table 7, which are used as part of the sample data to   forecast the status of the assembly system by means of statistical process control. Drawing on the idea of group technology, we collect the data of cabin A of model S2 similarly to those of cabin A of model S1. The other half of the sample data, the centroid offset data of four batches of cabin A of model S2, are shown in Table 8.
All the sample data required by the statistical process are obtained so that T-K control charts can be used to monitor the mean and standard deviation of the centroid data and evaluate the stability of the assembly system.

Case of T Control Chart
In the selected case, we know the allowable error range of the centroid offset of cabin A, but we do not know the mean value, which satisfies the condition for the use of the calculation formula of the T statistic in Section 4.2.1. According to the original data in Tables 7 and 8, we first calculate the mean and standard deviation of group i   samples, the mean X j of the first r-1 batches of samples, and then the T statistics, as shown in Table 9. According to Eq. (30), we set n to 15, and the upper control limit of the T control chart is 3.636, the lower control limit is −3.636, and the center line is 0. We draw the T control chart of the centroid offset of cabin A, as shown in Figure 6, from which it can be found that the mean of the centroid offset for cabin A of models S1 and S2 is within the allowable range.

Case of K-control Chart
Similar to the calculation of the T-statistic, we only know the allowable range of the error of the centroid offset, and we do not know the average, which satisfies the condition of calculating the K-statistic as in Section 4.2.2. Based on the original data in Tables 7 and 8, we calculate the mean S 2 j of the variance of the first r-1 batches of sample data for different product types, and then calculate the intermediate variable (r) i,j and the K statistic, as shown in Table 10.
According to Table 10 and Eq. (33), the K-control chart of the centroid offset is shown in Figure 7.
Observing the K-control chart, it can be found that the standard deviation of the centroid offset is within the allowable range. Combined with the results of the T-control chart in Section 5.2.2, the assembly system can be considered to be in a controlled state according to the sample data. Since the centroid offset data of the 15th product of the fourth batch are a predicted value, the predicted results based on the T-K control chart indicate that the status of the assembly system at the next moment is controlled.

Mining Association Rules of Influence Factors for Abnormal Quality Data
Assuming that the assembly process of cabin A in this case has three processes, all kinds of materials required in each process are from the same batch, and all equipment of each station is fixed. According to the six categories of quality influencing factors mentioned above, the fishbone diagram of the influencing factors for the centroid offset of cabin A is shown in Figure 8, which includes 28 factors. Although 28 factors are identified from the analysis, how these are combined to cause an abnormal centroid offset still must be determined. We take 150 pieces of quality data collected from the assembly process of a certain cabin A as an example, as shown in Additional file 1: Appendix 1.   To keep the centroid offset within the qualified range as much as possible, the ideal error range of centroid offset is determined as within 12 mm after the influencing factors are analyzed. Over 12 mm is considered to be abnormal, which requires attention. The data encoding is shown in Figure 9.
Each encoding consists of three bytes. Byte 1 represents the process to which it belongs, such as processes 1-3. Byte 2 represents the attributes of influencing factors, such as operators and inspectors. Byte 3 represents the instantiation of influencing factors, such as operator A and inspector B. It is worth noting that when bytes 2 and 3 of two encodings are the same, the corresponding objects are the same regardless of whether byte 1 is the same. For example, 1OPE and 2OPE both represent operator E, who performs process 1 in the first case and process 2 in the second. A total of 108 codes appear in Additional file 1: Appendix 1, and their meanings are given in Additional file 2: Appendix 2.
According to the principle of the Apriori algorithm, association rules are mined from 150 pieces of assembly quality data in Additional file 1: Appendix 1 by R system. We set the support degree to 0.16 and the minimum length of item to 2, to obtain 7250 frequent item sets, from which 15 strong association rules with RHS item "AN" are mined, as shown in Figure 10. The depth of color indicates the degree of lifting, and the size of circle indicates the size of support degree.
Strong association rules with RHS item "AN" are arranged in descending order of lifting in Table 11.
Observing the above rules, the lifts of the top 14 rules are greater than 3, which means they are effective. It is found that the probability of abnormal centroid offset is relatively high when the material batch of process 3 is E. Also, when the material batch of process 1 is A, the product model is S1, the material batch of process 2 is C, or the pressure in the cabin is A. Managers can try to avoid the combination of the above rules.

Mining Association Rules of Influence Factors for Assembly System Status
Because no strong association rule with RHS item "NCT" is found, 30 uncontrolled data items of the assembly system are filtered in 150 collected pieces of data.
Mining association rules from this data, we set the support to 0.66, and the minimum length of item set to 2, from which 7250 frequent item sets are obtained. From these, 31 association rules with RHS item "NCT" are mined, as shown in Figure 11.

System Implementation
The proposed quality management method is achieved through data value prediction, assembly system status prediction, and association rule mining based on the constructed DT model, providing early warning for the assembly shop-floor and avoiding future quality abnormalities. According to the comparison of the results in Table 6, the Grey-Markov model considers the fluctuation of data compared with the general Grey prediction model, thus improving the accuracy of data prediction. A DT-based quality management system for the assembly process of aerospace products is developed, which achieves the monitoring of quality information and quick tracing of quality problems, thereby reducing the occurrence frequency and improving the processing efficiency of quality problems on the shop-floor. Currently, the system has been applied in an aerospace enterprise, and the quality data value of more than seven types of key indicators on a final assembly shop-floor and their trend for different products are monitored and predicted. The interface for monitoring assembly shop-floor quality based on the DT is shown in Figure 12. The lower left quarter quadrant enables the monitoring of the operating status of the equipment, the bill of arrived materials, and the values of key quality indicators of the current   The interface for monitoring all of the assembly quality data on the assembly shop-floor is as shown in Figure 13, including the prediction results at the next moment based on the built Grey-Markov model. On the left side, the assembly quality data monitoring for different stations on the shop-floor can be realized, including the loading preparation area, empty cylinder treatment area, and docking area. For a specific station, the quality data are monitored on the right side of the interface, and the warning prompt appears in a red font on the interface for monitoring shop-floor quality data on the left side, and on a red background on the right ride of the interface for monitoring station quality data. Quality abnormality records are automatically generated.
The interface for monitoring assembly shop-floor status is shown in Figure 14. The left side is an assembly quality data list for batches of aerospace product A. In the list, the values of key quality data of historical batches can be viewed, along with collected real-time data and predicted data of the current batch. Moreover, users also can monitor the mean, standard deviation, and T and K statistics of quality data in each batch. Meanwhile, users can view the T-K statistical diagram on the right side to observe whether the current status of the assembly system is controlled. If the T-K statistics of a batch exceed the upper or lower limits, then quality abnormality records will be automatically generated.
When dealing with quality problems, a craftsman or inspector can switch to the interface of managing strong association rules for quality exceptions, as shown in Figure 15. The left side is product BOM, through which quality data and rules are managed. The upper right is a search bar. Select product model, quality data, support, confidence, and other conditions to search and view these rules which are listed in the lower right of the interface, and then the reference methods of processing quality problems can be provided. The list of strong association rules contains the combination of shop-floor resources that lead to abnormal quality data, which are obtained through the proposed approach and imported into the knowledge base in Excel format.

Conclusions and Future Work
The assembly process of aerospace products has the characteristics of single-or small-batch production and many exceptions. How to control assembly quality and improve production efficiency has been an issue in engineering applications. By introducing DT technology, the advantages of digital space, including low cost, high efficiency, and predictability, can be fully utilized to improve the management and control capability of the quality in the assembly process of aerospace products. The main contributions of this article include the following: (1) To obtain more accurate predictions in the small sample volume of aerospace product quality data, the fluctuations of data are considered and a Grey-Markov model-based quality data prediction algorithm was presented. Moreover, group technology was used to increase the sample size of quality data and a T-K control chart was applied to obtain the mean and standard deviation of quality data, real-izing the prediction of assembly system status and avoiding quality problems caused by an uncontrolled assembly system. (2) To improve the efficiency of quality problem tracing and handling in the assembly process of aerospace products, an Apriori algorithm-based traceability method of quality anomaly influencing factors was proposed. Strong association rules related to quality data anomalies and uncontrolled assembly systems were mined to trace the influencing factors of quality anomalies, which can assist related personnel in quickly locating abnormal causes and improving the efficiency of quality control. (3) A DT-based quality management system for the assembly process of aerospace product is developed and has been applied in an aerospace enterprise, which promotes the application of DT in the assembly quality management.
This paper explores the application of DT to improve the quality management and problem tracing capability for the assembly process of aerospace products. Future research will focus on two areas: ① deeply studying the feedback link in quality control for the assembly process of aerospace products, so as to make the feedback process in real time and more automatic and intelligent; ② using text mining algorithms to obtain more quality problem processing knowledge, so as to assist in the rapid processing and decision-making of quality problems.