Advanced Data Collection and Analysis in Data-Driven Manufacturing Process

The rapidly increasing demand and complexity of manufacturing process potentiates the usage of manufacturing data with the highest priority to achieve precise analyze and control, rather than using simplified physical models and human expertise. In the era of data-driven manufacturing, the explosion of data amount revolutionized how data is collected and analyzed. This paper overviews the advance of technologies developed for in-process manufacturing data collection and analysis. It can be concluded that groundbreaking sensoring technology to facilitate direct measurement is one important leading trend for advanced data collection, due to the complexity and uncertainty during indirect measurement. On the other hand, physical model-based data analysis contains inevitable simplifications and sometimes ill-posed solutions due to the limited capacity of describing complex manufacturing process. Machine learning, especially deep learning approach has great potential for making better decisions to automate the process when fed with abundant data, while trending data-driven manufacturing approaches succeeded by using limited data to achieve similar or even better decisions. And these trends can demonstrated be by analyzing some typical applications of manufacturing process.


Introduction
The concept of "smart manufacturing" (also known as intelligent manufacturing) is experiencing an explosive propagation. According to Tan et al. [1], smart manufacturing is the fusion of intelligent technologies and manufacturing technologies. Smart manufacturing is an umbrella term of manufacturing technologies and paradigms aiming to automate production and transaction by taking full advantage of advanced information technologies [2]. Key technologies involved in smart manufacturing include but not limited to the Internet of Things (IoT), cyber-physical system (CPS), cloud computing and big data analytics. These technologies, when integrated with manufacturing capabilities, initiated various paradigms belonging to the smart manufacturing family, such as IoT-enabled manufacturing, digital twin, cloud manufacturing. Manufacturing data can be captured at all stages in a product life, ranging from explicit values such as material properties, process temperature, vibration to implicit ones like supply chain resource and customers' preferences. The volume of data generated by manufacturing systems is growing rapidly with over 1000 EB [3] collected in the year 2015, and is expected to increase 20-fold in the next ten years. Data has been playing a crucial role since the fourth industrial revolution initiated in Germany [4]. Data-driven decisions, on the other hand, distinguishes modern manufacturing from traditional ones in that decisions are made based on data of facts, not theoretical physical models, opinions and guesses.
So far, researchers have been investigating in the area of data-driven manufacturing for decades and have published a great number of review articles about the latest achievements from different aspects. It all came about when large volumes of data were generated as an outcome of digital manufacturing, along with data mining techniques developed since the 1990s [5]. Later, upon wide acceptance of data-driven methods, process diagnosis techniques were adopted to automate fault detection in industrial processes [6]. Wuest et al. [7] gave a comprehensive review of machine learning methods utilized in manufacturing tasks. Since most manufacturing data are labeled data, supervised learning played a dominant role in practical applications. Zhong et al. [2] surveyed associated topics in the context of Industry 4.0, including Internet of Things, cloud manufacturing, cyber-physical systems, etc. based on which they provided detailed analysis on how these key constitutive technologies together can realize Industry 4.0. When data became available everywhere, data fusion techniques were also developed to facilitate industrial prognosis [8]. Kong et al. [9] reviewed the latest multisensor measurement and data fusion technology in precision monitoring systems. Tao et al. [10] summarized the state-of-art development of new technologies through the lifecycle of manufacturing data, including data collection, storage, processing, visualization, etc. These technologies altogether initiated smart manufacturing applications, such as smart design, smart maintenance and quality control.
As an increasing number of researchers started to realize the importance of manufacturing data, data collection and analysis have been broadly studied and incorporated into modern manufacturing. Nevertheless, few of the aforementioned review articles focused the evolution pattern of data collection and analysis towards modern manufacturing processes. As depicted in Figure 1, data collector, e.g., sensor is designated to capture useful physical values generated by manufacturing event. The acquired data is further analyzed and interpreted into optimal decisions to enhance the performance of the manufacturing system. This closedloop form of manufacturing inscribed a fundamental paradigm of data-driven manufacturing, as opposed to conventional model-based manufacturing. As the development of various sensoring technologies and relevant infrastructures, modern manufacturing systems are equipped with a large number of sensors capturing data at an unprecedented rate [12]. New challenges are thus raised: First, erroneous or patchy data can distort results and lead to faulty decisions [11]. Maintaining the veracity of data with respect to the concerned target is challenging, because most data captured via generic sensors cannot directly reflect the actual onsite situation. Secondly, transforming these data into useful knowledge and decision is also challenging, as the volume, variety and velocity of the captured data are already beyond normal capacity [13]. Inaccurate methods to analyze only partial information from collected data can also mislead the final decision and performance.
To deal with these challenges, researchers have been centralized in this area and yielded rich outcomes. The main purpose of this paper is to summarize the development and trend of data collection and analysis in the era of data-driven manufacturing by conducting a thorough review of the state-of-art. The rest of this paper is organized as follows: Section 2 will elaborate the framework of data-driven manufacturing and demonstrate some representative applications in different aspects. The evolution of data collection and analytics will be separately discussed in Sections 3 and 4. Section 5 will summarize and give outlook of future trend in advanced monitoring systems in modern manufacturing.

Manufacturing event Data Decision
Data collector Data analyzer

Data-Driven Manufacturing
The major distinction between the paradigm of modern and conventional manufacturing can be viewed in Figure 2. Conventional manufacturing automation can be regarded as model-based manufacturing [14]. Experts gain experience by making physical observations, such as visual inspection, noise recognition from manufacturing systems. These experiences together with human intelligence will derive physical models using theoretical, experimental and numerical methods, to better understand the mechanism behind. Although great achievements have been made and applied in various applications, such as simulation [15] and performance evaluation [16], these model-based methods are nevertheless inferior with their limited effective range and accuracy. This is mainly because a great deal of simplifications and assumptions are made when deriving the physical models, while the human experts are not assured to be mentally stable and impartial towards all gained experiences.
Modern manufacturing, on the other hand, is datadriven [10], in the sense that data generated through manufacturing activities is fully utilized to positively enhance manufacturing quality and thus enrich flexibility and autonomy of the system. The framework of datadriven manufacturing is outlined in Figure 3 consisting of four layers. The bottom layer is known as the manufacturing layer comprising different types of manufacturing processes, through which a product is designed, manufactured, assembled and evaluated from scratch. Data layer locates on top of manufacturing layer via sensor interface. Various types of sensors are integrated into the manufacturing system to monitor and inspect during the manufacturing process. In data layer, data is collected, stored and visualized for the preparation of data processing. In knowledge layer, raw data transformed into insightful features and knowledge via data processing technologies. In decision layer, through the utilization of intelligence, knowledge eventually becomes decisions to make accurate simulation, evaluation and prediction, etc. to facilitate smart manufacturing.
The major distinctions between data-driven manufacturing and conventional manufacturing are the generation, collection and utilization of data, which have been regarded as the key enabler to realize smart manufacturing [17]. As can be implied from Figure 3, data eventually becomes decisions to automate the manufacturing process and enhance its performance. In this manner, accuracy of the decision predominates the manufacturing outcome, e.g., a false decision could potentially jeopardize the delicate product or even the manufacturing system. It is conceivable that through all these layers, accuracy and fidelity of the final decision will decrease according to several reasons. In data layer, data acquisition may cause accuracy loss depending on the specification of sensors adopted. The correlation between the acquired data and the actual physical value actually involves certain assumptions/simplifications. In knowledge layer, the extraction of knowledge from raw data further induces error since extracted features may not perfectly define the overall profile of the original data. In decision layer, improper data analysis could lead to (2020) 33:43 misunderstanding of the features, thus mislead the final decision. In total, the aggregated error can be tremendous, which poses great challenge to develop advanced manufacturing systems. While detailed evolutional trend of data-driven manufacturing process will be demonstrated in Sections 3 and 4, we first analyze some typical applications in Section 2.2.

Product Design
Product design is an iterative decision-making process to seek optimal solutions to the target customer needs. The cost of product design can go up to 75% of the entire product cost, according to Li et al. [13], which was mainly due to the constant trial-and-error iterations during the product design phase until the customers get satisfied. It was particularly difficult for customers to monitor and dominate the designing process which truly reflected their needs until the popularization of rapid prototyping and Internet. As an additive manufacturing technology, rapid prototyping [18] revolutionized the way how 3D product can be fabricated quickly from virtual design, offering the most intuitive feedback to the designers as well as to the customers. Internet increases direct communications between the customers and the company, through which customers can directly post their demands, share first-hand experience and even participate in a customized designing process of a product. Regarding the state of the art, a new paradigm named cloud-based design (CBD) was established to let design engineers conduct market research more effectively and efficiently through spreading feedbacks and reviews in social media [19]. Cloud-based CAx software such as AutoCAD 360 was also invented to enable realtime monitoring and collaboration among design teams that are geographically apart. In addition to the cloudbased infrastructure, high performance cloud computing and big data analytics have enabled expensive computations such as analysis of market preference and customer demands at a reduced cost [20].

Logistics and Supply Chain Management
Manufacturing supply chain refers to the flow of raw materials from distributed original suppliers to manufacturing sites, and finally to places of consumption. Traceability of the supply chain is an important feature for modern manufacturing enterprise to reduce logistic cost and increase its production efficiency in a long run. Aiming at a better supply chain visibility and tracking, radio frequency identification (RFID) and GPS work together to provide a seamless and detailed trace of the product [21]. Supply chain analytics (SCA) has been extensively investigated to assist decision makers in identifying and assessing supply chain risks, and improving supply chain flexibility and capability. According to a latest review in this subject [22], analytic techniques in SCA include statistical analysis, simulation and optimization, which take full advantage of big data to analyze the supply chain performance and to make appropriate decisions.

Shop Floor Monitoring
Shop floor monitoring is essential to keep track of the running state of each machine, make adaptive scheduling and maintenance. Modern manufacturing shop floors are  (2020) 33:43 equipped with smart sensors, among which RFID sensors are widely adopted to enable enormous data acquisition [23]. Data pre-treatment and analysis using machine learning algorithms are then applied for shop floor scheduling and fault prediction [24]. Real-time machine availability and execution status monitoring is also an important issue to render distributed process scheduling in shop floor and cloud-based manufacturing. Wang [25] proposed a tiered system architecture with function blocks for monitoring the machine availability and execution status in real time, such that a closed-loop information flow can be established. As the variety and volume of data keeps increasing, the integrated data becomes too intricate to handle and perceive. Cyber physical system (CPS) provides an ultimate solution to this issue by establishing a synchronized virtual shop floor to the actual one [26]. In this way, a series of smart operations can be realized, including smart interaction, smart control and management, etc. making the networked machines perform more efficiently, responsively and collaboratively.

Manufacturing Process
The importance of manufacturing process can never be overstated. In the last few decades, new types of manufacturing process, such as high speed machining (HSM), additive manufacturing (AM) and hybrid manufacturing, have been rapidly emerged to satisfy growing demands of product. The complexity of modern manufacturing process has already gone beyond the level to be manually observed and controlled. Monitoring the manufacturing process via dedicated monitoring system has become a critical and essential target to avoid anomalies and reduce the maintenance cost. Towards this target, machining monitoring system became a research hotspot in recent literature [27]. Machining monitoring system encompasses signal acquisition, signal processing and decision making steps, in order to identify tool conditions, chip conditions, processing conditions [28] and part surface conditions [29]. Monitoring system for additive manufacturing, especially for metal-based AM, has been fully investigated to enhance the part quality and repeatability in order to satisfy the stringent requirements from aerospace and healthcare sectors [30]. Vision and camerabased monitoring systems are widely adopted for in-situ metrology inspection and closed-loop control of additive manufacturing [31].
As the development of advanced sensors and artificial intelligence, data-driven manufacturing process is also in its evolution. There has been two diverse trends in modern manufacturing process. The first one is to devise smart sensors for direct measurement of those high-value data. These direct measurement approach can effectively bypass the tedious data processing stage and improve the fidelity to a whole new level. The other one is to excavate valuable knowledge from low-value data using advanced machine intelligence. As mentioned earlier, data collector and data analyzer are the key components to achieve these two targets. Pertaining to the former, the acquired manufacturing data from advanced sensors is of unprecedented fidelity and accuracy compared to the one from legacy data collectors. On the other hand, data analyzers utilizes the latest data processing and machine learning technologies to make better decisions than ever before. In the remainder of this paper, a thorough investigation is made to review the state-of-the-art development of data collection and data analysis towards data-driven manufacturing process, and to discuss the future trend of datadriven smart manufacturing.

Advanced Data Collection in Data-Driven
Manufacturing Process Modern manufacturing system is equipped with advanced sensors collecting sequential data from different physical events. These data are of low value density if treated individually, but they together form great value for the system to keep track of the manufacturing process, in order to make simulations, evaluations and predictions, etc. Therefore, high-quality data collection is a desirable target in modern manufacturing by means of various types of sensors. As alluded earlier, the lifecycle of manufacturing data consists of data collection and data analysis. In data collection stage, manufacturing data is generated and collected from equipment, human operators and products. These data can be classified into structured, semistructured and unstructured data [32], depending on the selection of sensors and their working principles. In data analysis, the target is to extract informative knowledge/ decision from the raw data end to end. High dimensional raw data sometimes needs a prior feature extractor to extract low dimensional representative features in either time domain or frequency domain [29], the extracted features from different data sources are fused together to make valuable decisions to control the manufacturing process. Figure 4 summarizes two typical workflow of datadriven manufacturing process, which are based on direct and indirect data measurement. In direct measurement, sensors are specifically designed to measure the physical value or its direct covariant during the process. These sensors are usually expensive and exclusive to certain working environment. The captured data from direct monitoring is of high fidelity and accuracy. For example, the touch-trigger probes offer a direct way to precisely measure the coordinates of the part by discrete physical contact points. Alternatively, indirect measurement offers a more cost-effective way to collect indirect but correlated value using generic sensors, such as current sensor, accelerometer. The major difference from direct monitoring is that the captured physical value is no longer the target one but a correlated value through a non-deterministic transfer function. For example, large spindle current sometimes implies a large cutting force during metal cutting process, and sometimes only indicates an accelerating spindle speed, which is hard to tell unless more information is provided. Building up the exact inverse transfer function to deduce the target value is impossible, which always involves simplifications and assumptions, inevitably leading to accuracy loss, which has become the major issue for indirect monitoring.
Abellan-Nebot and Subiron [33] gave a comprehensive review of machining monitoring systems developed so far, including sensors, signal processing and feature extraction. In their point of view, they argued that indirect measurement was more prevalent for its costeffectiveness and versatility. Lauro et al. [27] in their latest review suggested to take great care of the choice of measurement due to implementation cost and requirements. For tool condition monitoring, such as tool wear diagnosis, direct methods, e.g., optical and radioactive sensors were deprecated due to the inaccessibility of the cutting area during the cutting process [34]. However, direct vision/camera-based systems were widely used for monitoring in-situ metrology for additive manufacturing process [31], in order to achieve a close-loop identification of material discontinuities and failure modes. As manufacturing processes are becoming more and more complicated, generic sensors may not satisfy the increasing demands of high accuracy because of the inevitable simplifications and assumptions between the target and the captured value. The following section will demonstrate the evolutional trends for various monitoring tasks in manufacturing process.

Data Collection in Manufacturing Process
Extensive studies in manufacturing data collection have been conducted for various applications. From the perspective of machining process, as depicted in Figure 5, people care mostly about the real time condition of the process, the tool and the part. Applications include cutting force monitoring, chatter detection, tool wear/breakage diagnosis, online inspection of surface roughness and dimensional accuracy.

Cutting Force Monitoring
Cutting force monitoring is among the earliest achievable capabilities in numerical controlled machining, for its high correlation with the in-process workpiece and tool status. Large cutting force is detrimental to the part accuracy as well as to the cutting effectiveness [35]. Initially, the measurement of cutting force value was conducted indirectly by using current signals of servo motors [36] or feed motors [37]. These methods are cheap and easy to implement, but with very limited upper bound in terms of accuracy. Albrecht et al. [38] proposed an innovative indirect force measurement by integrating capacitance displacement sensor into the spindle. The sensor was capable of measuring deflection of the tool and finally converted to the value of force. At certain frequency (650 Hz), the sensor reliably measured cutting force  (2020) 33:43 with around 10% error in magnitude. The major drawback of this indirect sensor was its limited bandwidth, which, even after applying a Kalman filter, can only reach 1000 Hz. These indirect methods were either of low accuracy or with limited frequency bandwidth. Direct force sensors were developed which was equipped with sensing elements to convert external force load into deformation of the elastic element. Piezoelectric transducer and strain gage are two major branches in modern cutting force dynamometers. Strain gauges force transducers offer high frequency response and long-term stability of deformation under an external force. Yaldiz et al. [39] developed a table dynamometer using strain gauge to measure static and dynamic milling forces. An octagonal ring was manufactured to locate the strain gauges, whose orientations and locations were carefully determined to maximize the overall sensitivity. After calibration, the final accuracy can reach up to 98.5% in real milling process. Piezoelectric sensors, as compared to strain gage, are superior for dynamic force measurement for their high dynamic range and sensitivity [40], which were usually mounted on the spindle side for dynamic force measurement. As for the state-of-the-art, advanced measurement apparatus were developed to measure micro-cutting force in wireless manner, and with higher accuracy up to 99.8% [41]. Polyvinylidene fluoride (PVDF) sensors were embedded in each inserts of the cutter to estimate realtime working condition for separate insert in a wireless manner [42]. Some recent studies tried to develop a socalled smart tool with built-in piezoelectric sensor array [43,44], which could be a future trend towards smart manufacturing process. Table 1 lists the evolution of cutting force measurement.

Machining Chatter Inspection
Machining chatter has always been an important issue in manufacturing process, for its complex physical mechanism and negative effects leading to poor surface finish, tool damage, etc. [45]. Real time chatter monitoring has also been classified into indirect and direct methods. As the outcome of chatter is usually in the form of selfexcited vibration, direct methods using microphone [46] and accelerometer [47] have been demonstrated as efficient and effective solutions for chatter recognition. They however suffer from a common drawback that the ambient sound/vibration could introduce noise to the target signal. Especially for the microphone, the suppression of environmental noise is mandatory to make it truly applicable. Later on, indirect methods came out focusing on the correlated effect of chattering and utilized relevant signals for chatter detection, such as using cutting force signal [48], motor current signal [49], acoustic emission [50] and the fusion of multiple signals [51]. The correlation between these signals and the chatter occurrence needs to be meticulously analyzed to achieve feasible results. The accuracy of these indirect measurements has been greatly enhanced after adopting machine learning algorithms, such as in [49], using a support vector machine to recognize the chatter pattern based on servo motor current signal can reach over 95% in terms of accuracy rate. Nevertheless, the frequency bandwidth of these generic indirect sensors may not suffice the detection of chatter, especially in high-speed machining. Consequently, direct measurements using microphone has been revived after the reliability of sensor was improved in monitoring milling operations. Specifically, the microphone response inside the machine-tool chamber was sensibly corrected using equalization filters to ensure adequate accuracy in chatter detection task [52]. Optical measurement such as using a laser beam and an optical position detector (OPD) to identify the vibration of the in-process tool was also regarded as a direct method [53]. In this study, the laser beam was reflected on the rotating cutter and captured by the OPD, by which the displacement of the cutter can be recorded in real time.
The development of high-resolution vision system also facilitated the online measurement of chatter by analyzing the surface texture/marks in real time [54]. Ding et al. [55] invented an active control system to detect and suppress machining chatter. Chatter was detected by directly sensing the workpiece displacement using a displacement sensor and then controlled via a voice coil motor. In terms of offline chatter identification, chatter stability diagram offers a scientific reference for a proper choice of chatter-free machining parameters [56], the generation of which relies hugely on the frequency response functions (FRF) at the tool tip. Accelerometers are widely adopted for FRF measurement [57] based on standard impact test using a hammer integrated with force sensor. The impact test is nominated as a direct measurement for FRF determination but requires tedious setting-ups for pose-dependent tool tip dynamics in bi-rotary milling head [58]. Table 2 lists the evolution of machining chatter inspection.

Table 1 Evolution of cutting force measurement
Year

Tool Condition Monitoring
Tool condition monitoring (TCM) is vital to keep track of the remaining useful life (RUL) of the tool. Late replacement of dull or broken tool may decrease the accuracy and quality of the final part and cause machine breakdown [34]. Though with much effort spent in the past [59], direct inspection of the in-process tool condition was developed in the first place which includes the usage of proximity sensors, radioactive sensors and vision sensors. Proximity sensors, such as ultrasonic sensor [60] estimate the differential of distance between cutting edge and workpiece, whose accuracy is highly affected by the thermal expansion and cutting force induced deflection. Radioactive sensors [61] detect the amount of residue radioactive materials implanted on the flank face of the cutting tool in order to estimate the wear percentage, which was regarded detrimental and thus limited for lab usage. Vision-based tool condition monitoring, especially using structured light [62] was also patented long ago, but required an ideal condition of lighting and cutting environment to achieve acceptable accuracy. Deficiencies of these early direct TCM methods lead to the prosperity of indirect TCM methods, which utilized correlated signals, such as cutting force [63], acoustic emission [64], vibration [65], current [66][67][68] and surface roughness [69]. These representative indirect methods advanced the development of signal processing and sensor fusion techniques to enhance the prediction accuracy. Though many review articles highly voted for the indirect TCM methods [29,34] as the future trend due to the increasing accuracy, the major drawback is still prominent in that these methods are case-sensitive and requires fine-tuning and calibration to achieve high authenticity. Direct TCM methods, especially for the vision-based, can bypass this issue by directly inspecting the geometric change of the tool. Two-dimensional [70] and three-dimensional vision systems [71,72] were developed for direct TCM and achieved sub-pixel accuracy using advanced image processing techniques. These vision-based TCM systems all require a pause between two sequential operations to capture a steady image of the tool. This inconvenience has been fully addressed in recent studies, among which, Ramirez-Nunez et al. [73] came up with a smart sensor consisting of an infrared camera and a temperature sensor, which facilitates the in-process tool breakage inspection even with the existence of coolant fluid. The tool condition is well estimated by processing the infrared thermography. Dai and Zhu [74] in their recent study proposed an integrated vision system for micro-milling TCM. The system was designed with a telecentric lens, light source and a 3-DOF motion platform to achieve uniform image quality and high automation. As the availability of powerful image processing algorithms, direct TCM using smart sensors and integrated systems is believed to have a promising future. Table 3 lists the evolution of tool condition monitoring.

Part Condition Monitoring
The condition of in-process part (a.k.a. workpiece) needs to be monitored to take timely adjustment of the process, in order to yield high-quality part. Surface finish and dimensional accuracy are the two dominant factors of the workpiece condition to determine the final quality of product. Especially for the surface finish metrology, which has been overwhelmingly concerned as a direct indicator to the capability of modern manufacturing system. Conventional surface inspection methods [75] are usually conducted subsequently to the manufacturing process. These post-processing based methods can usually achieve higher accuracy using dedicated instruments, such as the stylus profilometer [76], but are inactive to take responsive actions to prevent further accuracy loss

Table 2 Evolution of machining chatter inspection
Year

Indirect chatter inspection Direct chatter inspection
Before 2000 Acoustic emission sensor [50] Microphone [46] Accelerometer [47] 2001-2010 Dynamometer [48] Sensor fusion [51] Optical sensor [53] 2011-2020 Current sensor [49] Integrated microphone transducer [52] Vision-based system [54] Displacement sensor [55]  Radioactive sensor [61] Proximity sensor [60] Structured light [62] 2001-2010 Current sensor [66][67][68] 3D metrology [72] 2011-2020 AE sensor [64] Surface roughness inspector [69] Dynamometer [63] 2D vision [70] 3D vision [71] Infrared thermography [73] Integrated vision system [74] during the process. Therefore, quality control based on in-situ monitoring offers a more practical solution. However, it is intractable to incorporate sophisticated roughness scanners into the harsh operating environment with metal chips, lubricants and vibrations. Consequently, indirect methods for the inspection of in-situ surface roughness took over the mainstream in the past [77], which contain the usage of accelerometer [78], dynamometer [79], acoustic sensor [80], ultrasonic sensor [81], etc. Prevalent shortcoming of these indirect methods is lower achievable accuracy due to the nature of uncertainty. Currently, vision-based surface roughness evaluation system [82] has been developed for efficient and accurate in-situ surface inspection. The essence behind was the usage of graph theory-based image analysis to achieve real-time identification of surface roughness distribution without interrupting the machining process. For metal additive manufacturing process where quality matters, vision-based systems are also the primary choice for in-situ metrology monitoring [31]. It is also suggested to apply hybrid instrumentation as a future direction to overcome the compromise between spatial resolution and the field of view, in which low resolution sensor detects the whole area while high resolution sensor focuses on the area of interest.
Real-time dimensional accuracy monitoring is vital to render in-process quality control. Dimensional accuracy is prone to be violated for parts consisting of thin-wall features, due to either the deflection by external cutting force or the internal release of residual stress. As the inprocess part is usually securely mounted by fixture and hard to access by exotic instruments, integration of fixtures with sensing technology will be a potential direction according to the state-of-art review [83]. In terms of the deflection caused by external force, Azouzi and Guillot [84] predicted the workpiece dimensional deviation in turning process via cutting feed, depth of cut and cutting force signal. Cutting force and vibration signal were fused together for the prediction of deviation in turning a slender part [85]. For thin-walled part, such as a blisk, a common method to identify its in-situ deflection is by simulation using cutting force value and modeling techniques [86], which is not only time-consuming but also uncertain in terms of accuracy. As for the deflection error caused by the release of residual stress, it was particularly tricky to predict such error since each piece of raw stock has its own stress pattern. Instead, people strived to characterize the residual stress field distribution via nondestructive methods, such as using ultrasonic devices [87] and X-ray diffraction [88]. The key to these nondestructive methods is to formulate the stress gradient with respect to the center frequencies, which can achieve plausible accuracy in workpiece with simple geometries.
However, for realistic complex parts, the distribution of residual stress can be elusive especially when the workpiece profile is constantly changing during the process. In light of this concern, direct measurement would be a better choice.
To directly measure the deflection, the on-machine measurement system using a touch-trigger stylus was adopted to inspect the workpiece deformation, and adaptively change the subsequent tool path for compensation [89]. The utilization of inspecting stylus was a prevalent choice for online measurement, it yet required the suspension of the manufacturing process, which prolongs the overall processing time and is technically incapable of real-time monitoring. More advanced instruments were developed recently to address these issues. Luo et al. [90] devised a thin film PVDF sensor attached to the nonmachining side of the thin-walled part to monitor the deflection and vibration caused by machining force. The change of output voltage faithfully reflected the high-frequency deflection of workpiece during different machining stages. Real-time surface normal measurement for maintaining high accuracy of thickness is indispensable in machining freeform thin-walled part. Yuan et al. [91] established an online surface normal measurement using four eddy current displacement sensors installed in the frontend of the spindle, achieving a remarkable reduction in displacement errors (from 12% to 1%) after compensation. A more intractable case of deformation is caused by the release of residual stress during the removal of raw material, such deformation remains obscure as long as the workpiece is securely fixed on the machine table. Indirect prediction model of the residual stress distribution [92] is too complicated to be accurate, due to a large set of remaining uncertainties. In light of this issue, Li et al. [93] inaugurated a novel responsive fixture apparatus for direct inspection of in-process deformation of large aerospace parts. This smart fixture automatically opens up to release the deformation once the built-in stress sensor reaches its threshold. In this way, adaptive adjustment of the process can be made as long as the final shape is still enveloped by the remaining workpiece. Inspired by this idea, Hohring and Wiederkehr [94] followed up with a similar intelligent fixture for the purpose of mitigation of chatter and compensation of workpiece distortion to achieve high performance machining. Table 4 lists the evolution of workpiece condition monitoring.

Discussion and Future Trend
Data collection and analysis are two essential stages in data-driven manufacturing process. Depending on the correlation of captured and target value, manufacturing data can be collected via direct and indirect measurement. Although indirect measurement offer more possibilities and larger scalability in diversified applications and are more cost-effective, they usually require the establishment of a physical transfer function to indicate the correlation between the measured and target value, which inevitably induce error as long as such correlation contains physical uncertainty. Consequently, the accuracy of indirect measurement is undermined so long as the correlation is not rigorously and mathematically identified. This gap encouraged more studies to improve the accuracy by developing various sensor fusion and data analysis methods [95]. On a different perspective, direct measurement using dedicated sensors can achieve high fidelity and accuracy. Although it seems to be contradictive to the big data scenario where obtained data is usually trivial and individually inaccurate, the design of exclusive sensor is still one important trend in the manufacturing field to facilitate accurate process monitoring, and thus to make precise decision and control. This pictures one possible future of intelligent manufacturing.

Advanced Data Analysis in Data-Driven Manufacturing Process
Manufacturing process is decisive to the whole product life cycle. As elaborated in the previous section, various sensors are being devised and integrated onto the machine to enable in-process monitoring by capturing target or correlated values. Nevertheless, the data acquired by these sensors, no matter directly or indirectly, only gives partial view of the manufacturing process. These data still needs post analysis to be converted into perceptible knowledge and decisions. Making decisions from data rather than human knowledge has become the dominant trend in data-driven manufacturing. In data analysis, we believe there have been at least three paradigms so far, as depicted in Figure 6, which also illustrates the evolution of modern manufacturing process.
In the first paradigm of data analysis, a physical model describing the mechanism is developed by the expert. Once the input data and information is imported into the physical model as prior knowledge, a mathematical solver is established to find the optimal solution, i.e., the decision. For example, finite element analysis is a typical data analysis which employs linear solver to solve a partial differential equation, e.g., deformation of the part. Obviously, two simplifications are involved in this pipeline. First, the physical models developed by human experts are usually based on certain assumption and simplification which deviates from actual scenario. Secondly, solving the physical model with limited input data is sometimes ill-posed, which can possibly lead to faulty results. However, it should not be denied that when data is scarce and expensive to acquire, this paradigm effectively offers a plausible way to interpret the process.
The second paradigm of data analysis utilizes machine learning techniques to train a shallow encoders which consist exponentially greater number of unknown parameters than the physical model. Through sufficient training stage using paired feature-result set, the trained model is capable of producing sensible answers on new input features. Due to the high generalization ability of machine learning models, it successfully bypasses the model simplification encountered in the previous paradigm. Nevertheless, the ability for a shallow encoder to directly process high-dimensional raw data is still limited, it thus requires careful feature engineering and considerable domain knowledge to reduce the input dimension.
As the density and dimension of manufacturing raw data is experiencing a rapid growth, the key factor to the final accuracy is how the data is processed in the first place. Motivated by this need, the third paradigm using deep learning can potentially eradicate the error-prone handicraft of feature extraction, which instead is achieved automatically using a general learning procedure. In this way, feature extraction induced error can be reduced to a great extent. It is expected that this paradigm will give the best performance on data analysis as long as the deep model is fed with sufficient data.
In the following sections, we will first provide a comprehensive review on existing methods for feature extraction from manufacturing raw data, given that feature extraction is an essential stage for the first two paradigms. Some typical manufacturing applications using data analysis will then be elaborated according to the above three paradigms.

Pre-processing of Manufacturing Data
Manufacturing raw data can be regarded as a sequential of digital bits if not further processed. Data processing is an essential stage especially for indirect data to convert  [80] Dynamometer [84] 2001-2010 Dynamometer [79] Accelerometer [78,85] 2011-2020 Simulation [86] Ultrasonic sensor [87] X-ray diffraction [88] Vision based system [82,31] Touch-trigger stylus [89] Thin film sensor [90] Responsive fixture [93,94] [98] calculated the RMS value of the AEDC signal which was observed as the most sensitive feature to the tool wear. The average RMS feature of the current signal also contributes the estimation of tool wear [99]. Frequency domain data processing can extract more intrinsic features from a cyclic data series, especially when such data contains background noise which is hard to distinguish in time domain. Altintas et al. [100] analyzed the cutting force and chatter stability during dynamic cutting process using Nyquist law in frequency domain. The analysis of tool vibrations using fast Fourier transform (FFT) was proved an effective mean for the prediction of surface roughness [101]. Frequency spectra of the AE signal was identified to evaluate the tool condition in broaching process [102]. By analyzing the motor current in frequency domain, the sensorless automated condition monitoring was achieved for predictive maintenance of machine tool [103]. FFT was also utilized to filter out noise from the audible energy sound to achieve better monitoring performance [104].
The FFT gives the entire frequency spectrum with the average frequency composition. Practically, the sensory data is dynamically changing over time. Therefore, a time-frequency data processing gives a more reasonable outcome by partitioning the time series data into short time intervals for frequency analysis [29]. Specifically, wavelet analysis and short time Fourier transform (STFT)

Input data
Feature Decision (2020) 33:43 are the two prevalent techniques to analyze cutting force [105], vibration [106], AE [107,108], current [109] and sound signal [51]. Statistical data processing offers a better way to indicate short term impulses and analyze variance between different factors. In particular, Aouici et al. [110] utilized a statistical analysis of variance (ANOVA) to predict the surface roughness during hard turning. Similar approach was found in Ref. [111] using vibration signal. Kannatey-Asibu and Dornfeld found that the skew and kurtosis of the AE signal was sensitive to the tool wear [112]. Lu and Wan [113] studied the high-frequency sound signal for tool wear monitoring using class mean scattering criteria. Table 5 lists the processing methods for different manufacturing signals.
No matter what data processing strategy is utilized, it is a primary stage in the entire data analysis for the following two benefits. First, raw sensory data is usually of high dimension and contains stochastic noise, data processing can tremendously reduce the dimension and filter out disturbance without losing much valuable information. On the other hand, the extracted low-dimensional features are more comprehensible in terms of developing analytical algorithms to make decisions accordingly, which will be discussed in the following section.

Tool Condition Analysis
Analyzing in-process tool condition from limited data is an important issue through which manufacturing process can be more precise and efficient. Tool wear is the most phenomenal condition people cared. Identifying the tool wear mechanism is intractable as it involves physical and chemical process, such as abrasion, adhesion, diffusion and other types of wear during cutting process. A few pioneered studies strived to understand the wear mechanism as the addition of brittle fracture, mechanical abrasion, physicochemical mechanism and others [114]. A recent study [115] developed a fundamental wear model by using a dedicated tribometer, which consists of cutting and thermal simulations. However, the formulation of all these factors is subject to certain simplifications and assumptions, and calibrating the pending coefficients of the model using limited testing data would introduce more statistical errors, which together make the physical model inaccurate and unstable towards real complicated machining process.
Instead of formulating complex and error-prone physical models for tool wear mechanism, most researchers intended to estimate tool wear in a more statistical manner, i.e., to estimate remaining useful life by fitting historical data into an empirical model. Endeavors to estimate the tool life can be traced back to the early 20th century when FW Taylor [116] proposed the well-known Taylor equation, which is an empirical model with two unknowns. Ever since then, various empirical models [117][118][119] and experimental studies [120,121] were presented targeting at different tool-workpiece combinations. A comprehensive list of variant tool wear empirical models for dry machining can be found in Ref. [122]. Their procedures were in similar fashion: first a nonlinear formula describing the tool condition based on the observer's domain expertise was established ahead of time, then factorial design of physical experiments were conducted to calibrate the unknowns of the formula, experimental validations were eventually conducted to prove the feasibility. Although the prediction accuracy reported in these works can reach as high as 95% in their experimental setups, it is perceived that any slight change of the actual cutting condition would devastate the accuracy. As the demand for accuracy and the complexity of manufacturing process keep growing rapidly, physical and empirical models have been widely deprecated. Zhao et al. [123] argued that this was mainly due to the following reasons: First, the performance of these models was highly dependent on the domain expertise of the observer, whose robustness was unsecured due to the uncertainty and complexity of working conditions. Secondly, these models were unable to evolve along with the accumulation of data, and thus insensitive to the changing conditions, which lead to limited effectiveness and flexibility in real cases. These two deficiencies of modelbased approach would introduce considerable amount of error, not to mention the error from the feature extractor, which together makes the physical model-based data analysis hardly compatible with wider applications.
The advance of volume and veracity of data makes it possible to adopt various machine learning algorithms to predict tool condition more accurately. Prevalent choices of machine learning techniques for tool condition analysis include support vector machine (SVM), artificial neural networks (ANN), Hidden Markov models (HMM) and decision tree. With sufficient training data, trained ANN model using back propagation can be comparatively accurate for tool wear estimation [124]. Palanisamy [125] compared ANN against classic regression model in terms of the capability in tool wear estimation, ANN was found to be more robust and accurate for its powerful fitting ability. As a statistical learning approach, SVM is superior for non-linear classification of data by mapping them into higher dimensional feature space, by which discretized state of tool wear can be classified. Support vector regression (SVR) is a variant of SVM for continuous regression of tool wear value. Tool breakage detection [126] and tool wear estimation [127] were successfully carried out via SVM/SVR with over 99% success rate when the design parameters of the SVM model was fine-tuned. It was also noticed in a more recent study that a hybrid estimator combining analytic fuzzy classifier (AFC) and SVM can reach higher accuracy in tool wear estimation [128]. Other learning techniques, such as decision tree classifier [129] and HMM [130], were also applied in application of tool wear estimation and achieved plausible performance. It was however stated in Ref. [129] that the performance of decision tree classifier combined with a PCA was case-sensitive. It is noticed that there has always been a hidden trade-off issue between the complexity of learning model and the training cost. To achieve high accuracy, a more complex learning model would thus require a larger training data set and heavier computational load, otherwise overfitting issue would lower the performance. Most of the aforementioned tool condition analysis is majorly dependent on time series data such as cutting force and vibration. Shallow function approximators like ANN and SVM are technically incapable of dealing with such high-dimensional data and thus require a dedicated feature extractor beforehand [131], as already elaborated in Section 4.1 and illustrated in Figure 6(b).
Conceivably, the quality of the extracted features directly affect the accuracy of subsequent operations. Improper choice of feature extractor may fundamentally suppress the eventual performance. Therefore, it would be better if one can directly handle the raw data series and bypass the feature extraction stage. The development of deep neural networks such as convolutional neural network (CNN) [132] and long short-term memory (LSTM) network [133] can fully satisfy this requirement. Specifically in the application of tool condition analysis, Li et al. [134] adopted CNN to detect tool breakage by spindle current signal, which achieved higher accuracy (93%) than that of the traditional BP neural network (around 80%). However, time-domain feature extraction was still adopted in this work, CNN was thus only regarded as a traditional machine learning methods with higher achievable accuracy. Another recent study [135] was to monitor the tool wear level based on audio signal using CNN, which strived to eradicate the need of feature extraction by using the absolute values of Fourier transformation as input. As a result, the tool wear prediction accuracy reached to as high as 96.3%. A Convolutional Bi-directional Long Short-Term Memory networks (CBLSTM) was designed in Ref. [123] to eliminate feature engineering in tool health monitoring. In this network, CNN was served as local feature extractor, while LSTM was to address sequences of varying length data and capture long-term dependencies, in that tool wear was a timevariant sequential progress.
The extrusive challenge for the adoption of deep learning to make accurate analysis is the demand of large volume of labeled data, the acquisition of which is extremely costly and time-consuming for many manufacturing applications. For example, the identification of tool tip dynamics for a newly inserted tool needs hundreds of impact tests at different tool postures. In this situation, the utilization of historical data to facilitate the training of a new case becomes a potential and appealing solution. Chen et al. [136] proposed a transfer learning-based prediction for pose-dependent tool tip dynamics in five-axis machine, by which the number of required impact tests is highly reduced. Sun et al. [137] utilized deep transfer learning to predict tool life, by taking advantage of the learnt similar characteristic across different objects. A recent study on tool wear prediction based on metalearning was proposed by Li et al. [138]. Meta-learning has the ability of learning the hidden rules behind a variety of different but similar tasks/models. The adoption of meta-learning in this study successfully predicts the tool wear status in changing cutting conditions with enhanced accuracy, while only a few training samples are needed upon a new learning task. This meta-learning approach provides a new perspective to solve manufacturing problems where the acquisition of data samples are expensive and time-consuming. Table 6 lists the evolution of tool condition analysis.

Process Condition Analysis
Process condition analysis is a typical classification task. In machining process, the condition can be categorized into idling, stable and chatter state. Timely and precise identification of process condition is always desired to make adaptive adjustment of process control. Previous studies made some important progress in identifying the mechanism of cutting process. Budak and Altintas [139] explored the mechanism of chatter during milling process and came up with a physical model to identify the chatter stability induced by the dynamic milling forces. According to this study, the cutter is simplified as a two degree-of-freedom system subject to a dynamic radial force, based on which the theoretical chatter stability lobe was derived. On the other hand, the calculation of dynamic cutting force is also simplified using numerical method. This plausible offline solution may not practically satisfy real machining cases [45], as it requires a complete analysis of machine dynamics including the spindle, tool holder, tool and the workpiece, which is not only intractable to precisely identify but also requires tedious calibration works for different process conditions. The simplifications and unpredictable systematic bias further reduced the accuracy in offline analysis. Although researchers carried forward this theory to adapt to more complex situations, e.g., five-axis machining [140], they were still of limited usage since the fundamental gap was not completely filled.
When it comes to online identification of process condition, the preferred option is to make diagnosis as early as possible, in order to prevent workpiece damage ahead of time. Traditional estimation algorithm, such as maximum likelihood [141] though achieved great successful rate, but lacked the ability for early prediction. The main reason is that subtle features are prone to be overlooked before they become phenomenal. Machine learning methods have been employed in this task for the superiority in classification, especially in those hardto-recognize scenario. In particular, acceleration signals were analyzed based on wavelet transform and SVM, this combination was able to detect transition state between stable and chatter state, showing excellent performance with over 95% accuracy rate [142]. In this way, chatter could be firmly suppressed in its infancy stage. Later on, neural network approaches were also developed for process condition classification using vibratory signal [143]. In addition to the feature generation which is mandatory for traditional machine learning approaches, this work introduced a feature selection strategy based on envelope analysis to rank the features according to their entropy, and only those high-ranking features were selected for classification. This operation essentially reduced the error from irrelevant features and hence increased the final accuracy.
To further reduce the error induced by feature extraction, deep learning methods were also utilized for machining process condition analysis. Among existing deep learning algorithms, CNN is known for its powerful image (second order tensor) processing and classification capability. However, most captured data from machining process is in the form of first order tensor (time sequence), which is not practical to be processed via CNN. Fu et al. [144] innovatively transformed measured signals into plotted image and employed convolutional neural network to achieve real-time identification of cutting vibration state. This work realized directly use of the original signal sequence for cutting state monitoring with significant performance of over 99.5% accuracy in most testing cases. Deep Belief Network (DBN) has been majorly dealing with voice and speech recognition [145]. The in-process vibration signal is similar to the voice. Fu et al. [146] got inspired by this and came up with a DBN approach for cutting state monitoring. It turned out that DBN can steadily achieve high performance on the raw vibration signal without much data preparation.
Since data is relatively convenient to acquire during the manufacturing process, most deep learning approaches can already achieve very promising accuracy in their case studies. Still, conditions can be quite different in real machining situation where various materials, tools and parameters are combined in each individual task. Transfer learning has been attracting more attention to deal with varying conditions [147] and proved to be effective for chatter detection with accuracy up to 95%. This new learning technology will not only reduce the data needed for training a deep model, but also increase model versatility to adapt to complex manufacturing process scenario. Table 7 lists the evolution of process condition analysis.

Part Condition Analysis
The well-being of in-process part directly affects the quality of final product. Surface roughness [77] and part dimensional error [148] are the two most concerned aspects, since they respectively reflect the manufacturing quality in microscopic and macroscopic view. For the formal one, physical models and experimental data based regression are the two mainstream solutions people utilized to understand the surface roughness mechanism. Lin and Chang [149] established a surface topography simulation model incorporating the effects of tool  [101]. Regression analysis was adopted to handle the experimental data.
The dimensional error of in-process part can be categorized into plastic deformation caused by residual stress and elastic deformation caused by large cutting load. Finite element method (FEM) was a primary choice for the evaluation of these two types of deformation, due to the large uncertainty of part shape and stress distribution during the process. The distortion of thin-walled workpiece induced by machining residual force was predicted using a modified finite element model [151]. The combination of experimental results with FEM was proposed to predict the shape deviation of complex geometry [152]. Elastic deflection also induces machining error, especially for thin-walled part. Wan et al. [153] estimated the cutter deflection using a simple cantilever beam model, and the workpiece deflection using FEM simulation. The induced error was compensated accordingly [154].
Both analytical model and FEM have to make a great deal of simplifications since accurate prediction of surface roughness and part deformation require tedious trial-and-error process and excessive computing power. Targeting at online analysis, trade-off between model complexity and its performance has always been a puzzling task. In light of this issue, machine learning algorithms started to take over online quality analysis with higher performance. As for the surface roughness prediction, although people spent great effort investigating its mechanism, it however varied with different processes and conditions. Any sophisticated physical models will only take effect in a limited range of applications. In this case, ANN has been widely adopted [155,156] in both turning and milling process. Using a small number of training samples, ANN is capable of generating accurate prediction values but would essentially require a good design of network structure. As compared to linear and exponential regression model [155], neural networks were found to be capable of better predictions for surface roughness. Support vector regression (SVR) method was also utilized for the prediction of roughness. A comparison of three types of SVRs and ANN was conducted in Ref. [157], results showed that SVR can achieve prediction accuracy as high as 95%, while for ANN it was slightly lower (91.4%) and required more computational time at the same time.
When it comes to dimensional error prediction, online prediction and real time compensation has always been a preferable choice. Li et al. [158] developed a soft-touch sensor which provides proximity information when the tool is approaching the workpiece, and a neuro-fuzzy network for predicting machining errors. This hybrid learning system succeeded in precise prediction of the aggregate sum of thermal error, force-induced deflection error and other source error in turning process. Another dimensional error prediction in milling process was achieved using ANN [159]. In this work, data set of process parameters that can affect dimensional errors was yielded via experiments. The large number of influencing parameters led to the choice of ANN, which generated more accurate models than the previous empirical models after training process.
Conventional machine learning approaches suffice the demand for real-time prediction of surface roughness and part deformation. A foreseeable trend in this section would be more precise identification of part conditions, such as the types of defect and crack, by further exploiting the advanced vision-based sensors. Towards this goal, traditional shallow learning approaches require artificially defined feature descriptors from the captured raw pixels, while deep networks are able to directly process raw data. In particular, CNN serves as a primary choice for surface inspection task. A max-pooling CNN was developed in Ref. [160] to identify steel defect with an error rate of 7%, which outperformed the best trained classifier using artificial feature descriptors (15%). Part et al. [161] showed that using CNN can achieve 250 times faster inspecting speed compared to manpower inspection, without sacrificing the accuracy. Ren et al. [162] proposed a CNN based feature extractor for pixel-wise surface inspection, which did not require large-scale training data using pretrained model. The heat map showing distribution of defects was then generated for the identification of seven types of defects using image processing algorithms. This work showed improved accuracies in both classification and segmentation tasks for all seven defect types. Crack identification was also realized using a deep RBM from consumer-grade camera images [163], which provided an alternative option in addition to CNN. In terms of part deformation prediction and control, the utilization of responsive fixture made it possible to measure and accumulate online deformation data for different parts in different machining stages. Such data potentiates the training of a mixed deep learning model, as proposed by Zhao et al. [164], to predict the part deformation and make process adjustments in an early stage. As can be concluded from previous studies, most deep learning based part condition analysis takes image as raw input. It is conceivable that when the amount of training data is limited, deep neural network, such as CNN, can be easily over-fitted to jeopardize the accuracy. In order to reduce data dependency, Ferguson et al. [165] trained a CNN using openly-available image datasets and leveraged transfer learning to adapt the pre-trained CNN model to the detection of defects, by using small X-ray dataset. Cheng et al. [166] applied a parameter-based transfer learning in modeling shape deviations during additive manufacturing, as one particular example to represent the future trend. Table 8 lists the evolution of part condition analysis.

Discussion and Future Trend
Data analysis has been comprehensively reviewed from three aspects in manufacturing process: tool condition, process condition and part condition. The evolutions of data analysis in all three aspects follow the same routine from physical modeling to machine learning and reaching deep learning in the state-of-the-art.
Due to the great complexity of manufacturing process, the establishment of physical models would induce noticeable errors. First of all, the construction of physical model requires domain expertise which may contain cognitive bias to the actual mechanism. These manufacturing process usually comprise intricate and unstable physical/chemical processes that are hard to precisely constructed, which inevitably require certain level of assumption and simplification. Mathematical solution based on limited observable data is sometimes ill-posed, making the final physical model barely accurate to deliver satisfactory results. The development of machine learning techniques inaugurated a new paradigm to analyze manufacturing processes, without needing to manually develop complicated but inaccurate physical models. After some crafted feature extraction and training process, complex manufacturing process can be established in a more unified, efficient and effective way. Through the training process, hidden and obscure correlations between the input and output can be unveiled. Nevertheless, even the most powerful feature extractor still cannot guarantee zero discrepancy and error from the raw data, which directly affects the final accuracy. This dilemma is well resolved by deep neural networks, in which these features are automatically extracted rather than by a third-party agent. Consequently, when data is abundant, deep learning achieves better performance than conventional machine learning approaches.
On a different perspective, utilizing deep learning may be troublesome in manufacturing field, since the acquisition of meaningful manufacturing data is not as convenient as data from internet. Advanced machine learning technologies, such as transfer learning [136] and meta learning [138], already left some successful marks in manufacturing applications where data acquisition is expensive and slow. It is foreseeable in the future that more advanced machine learning methods dealing with insufficient data will emerge and apply in manufacturing process.
In the state-of-the-art development of machine learning techniques, new types of machine learning algorithms for various tasks are being developed. Specifically, deep reinforcement learning using deep Q-network was proposed by Google DeepMind [167], which opened up a new era to learn successful policies directly from high-dimensional inputs and achieve human-level performance in game play. The same group later proposed a meta-reinforcement learning system inspired from the activity of dopamine system in human brain [168], which expedited the learning process from past experience. These new findings in reinforcement learning would potentially render new possibilities for manufacturing systems to understand rules from source data and realize true automation [169]. The lately reported domaintransform manifold learning made a huge success in noise-reduced image reconstruction from raw sensory data [170], which could also be a promising tool in manufacturing data pre-processing stage for higher fidelity.

Conclusions
Manufacturing data collection and analysis are the key enablers to realize data-driven manufacturing. As the two crucial components in manufacturing monitoring system, they have been evolving to cater to increasing demands in modern manufacturing. The development of these two components have been thoroughly investigated from literature, with conclusion depicted in Figure 7. In terms of data collection, in most manufacturing circumstances valuable data is measured via sensors. Direct and indirect measurement are the two categories in this stage. While indirect measurement has been more widely adopted in recent manufacturing applications for its cost-effectiveness and high compatibility, it is still facing a considerable amount of discrepancies in terms of accuracy. Although people have made great efforts to reduce the error, it is theoretically incapable to achieve high precision measurement due  33:43 to the uncertainty and simplification of the correlation between the target value and the measured value. On the other hand, direct measurement though encountering incompatibility issue in some manufacturing cases, it will be ultimately adopted for its high fidelity and achievable accuracy. There have already been sporadic developments of advanced sensors that can directly measure the in-process data without violating the process condition.
Data analysis is another crucial phase in data-driven manufacturing to make diagnosis, predictions and other decisions based on the obtained data. Three paradigms of data analysis, i.e., physical modeling, conventional machine learning and deep learning based data analysis have been investigated in this paper. The development of physical modeling to describe high dimensional nonlinear manufacturing process is challenging and experiencing low accuracy. These physical models are usually oversimplified, containing observation bias, and thus theoretically incapable of accommodating the increasing demand for accurate data analysis. Machine learning based data analysis was developed to resolve this issue by using generic model such as neural networks, and training process, which can successfully overcome the model simplification issue. As long as for a sufficient training procedure, the learning model can achieve high nonlinearity to describe an arbitrarily complex process. While shallow learning models can only deal with low-dimensional data due to the limited capacity, deep learning based data analysis can achieve an end-to-end modeling from the raw sensory data to the final decision. The deep stacks of layers combined with dedicated training process can automatically learn to extract useful features without human intervention. Practically however, since the amount of manufacturing data is usually limited, advanced machine learning techniques such as transfer learning and meta learning that require fewer training samples are investigated in some recent studies to achieve better results and handle varying conditions.

Outlook
Thanks to the development of advanced sensoring and data analyzing technologies, modern manufacturing outperforms with higher efficiency, accuracy and selfdiagnosis by the extensive use of data. Direct process monitoring combined with advanced machine learning technologies have achieved remarkable effectiveness and will perhaps trend the development of data-driven manufacturing. Though deep learning obtained huge success in a variety of fields, training a deep model in manufacturing scenario remains challenging due to the prolonged time and cost needed for collecting sufficient labeled data. To overcome this crucial deficiency, there are two suggested directions. First, it is though theoretically impractical to train a deep model with high performance using insufficient sample data, one can adopt few-shot learning to extract common rules from existing well-trained knowledge, instead of training from scratch. Another potential direction is to combine physical mechanism, such as Newton's law and energy conservation, with machine learning models in order to take advantage of both, which would significantly reduce the amount of training data and enhance the generalization of the trained model. These are perhaps among the future shapes of data-driven smart manufacturing.