Internal Defects Detection Method of the Railway Track Based on Generalization Features Cluster Under Ultrasonic Images

There may be several internal defects in railway track work that have different shapes and distribution rules, and these defects affect the safety of high-speed trains. Establishing reliable detection models and methods for these internal defects remains a challenging task. To address this challenge, in this study, an intelligent detection method based on a generalization feature cluster is proposed for internal defects of railway tracks. First, the defects are classified and counted according to their shape and location features. Then, generalized features of the internal defects are extracted and formulated based on the maximum difference between different types of defects and the maximum tolerance among same defects’ types. Finally, the extracted generalized features are expressed by function constraints, and formulated as generalization feature clusters to classify and identify internal defects in the railway track. Furthermore, to improve the detection reliability and speed, a reduced-dimension method of the generalization feature clusters is presented in this paper. Based on this reduced-dimension feature and strongly constrained generalized features, the K-means clustering algorithm is developed for defect clustering, and good clustering results are achieved. Regarding the defects in the rail head region, the clustering accuracy is over 95%, and the Davies-Bouldin index (DBI) index is negligible, which indicates the validation of the proposed generalization features with strong constraints. Experimental results prove that the accuracy of the proposed method based on generalization feature clusters is up to 97.55%, and the average detection time is 0.12 s/frame, which indicates that it performs well in adaptability, high accuracy, and detection speed under complex working environments. The proposed algorithm can effectively detect internal defects in railway tracks using an established generalization feature cluster model.


Introduction
thresholding. Furthermore, Zhang et al. [11] proposed an automatic railway visual detection system (RVDS) for surface defects, and presented an algorithm for detecting and extracting the region-of-interest, which enables the identification and segmentation of defects on the rail surface. Compared with the rail internal defects, the rail surface defect image is easier to acquire using a highspeed camera to make the imaging standard consistent. In contrast, rail internal defects cannot be visualized, but can be detected using ultrasonic testing, eddy current testing, and magnetic flux leakage testing [12]. Currently, ultrasonic testing is widely employed in rail internal defect detection because of its high penetration, excellent directivity, and high sensitivity [13]. However, owing to the differences in the ultrasonic acquisition equipment, it is difficult to obtain a unified imaging standard for rail ultrasonic B-scan images. It is noteworthy that the feature expression of the defect is universal and can be described by a generalized feature. The B-scan image is obtained by the echo of the ultrasonic probe which can display a cross-section of the rail. Previous studies have been conducted on the internal defect detection of railway tracks based on ultrasonic images. Cygan et al. [14] analyzed the advantages of B-scan image processing compared with A-scan signal analysis for rail internal defect detection. The A-scan signal is a flaw-detection method employed to evaluate the size and position of defects based on the amplitude and position of the defect waves. However, the defect geometry cannot be determined directly. Huang et al. [15] and Sun et al. [16] employed a neural network model and deep learning, respectively, to analyze rail B-scan images and achieve internal defect detection. Liang et al. [17] proposed an improved imaging algorithm for rail defect identification, which is beneficial for the acquisition of high-quality B-scan images and the design of a high-accuracy detection algorithm for internal rail defect images. However, it is challenging to achieve high-precision detection of different types of defects. Therefore, it is essential to develop an accurate and high-speed detection algorithm for internal rail defects expressed by ultrasonic B-scan images.
The main types of rail internal defects include (1) fatigue crack of the screw hole, (2) rail head defect, (3) crushing flaw of the rail bottom, (4) transverse cracking of the rail bottom, and (5) material degradation of special parts [18]. The fatigue crack of the screw hole is located at different positions of the screw hole. The internal defect of the rail head is mainly a flaw, including black and white flaws from the color indicated in the real environment. The crushing flaw of the rail bottom includes transverse and longitudinal cracks of rail bottom. Nondestructive testing is usually conducted regularly to monitor track health, to prevent such internal defects that affect traffic safety.
Currently, the mainstream method for detecting internal defects in railway tracks is to analyze and identify defect features based on ultrasonic images [19]. To detect such defects, the ultrasonic rail detection system developed by the SPERRY and TOKIMEC companies can recognize and classify defects in real time. The RTI company has developed an automatic defect identification system based on a neural network model with a learning function [20]. In the mentioned Ref. [15], the Chinese Academy of Railway Sciences has developed a B-scan image rail defect classification system based on pattern recognition, whose recognition rate is approximately 95%. However, the accuracy of the existing ultrasonic detection system does not meet actual detection requirements.
To address the challenge of low detection accuracy, an internal defect detection method for railway tracks is proposed based on the generalization feature cluster in this study. First, the defects are classified and counted by analyzing their location and geometric features. Furthermore, according to the maximum difference between different types of defects and maximum tolerance of the same type of defects, the generalization features of the defects are extracted. Finally, generalization feature clusters for various defect types are established to classify the internal defects of the railway tracks. On this basis, strongly constrained generalization feature clusters are formulated after dimension reduction, and the K-means clustering algorithm is developed to cluster defects. The experimental results demonstrate that the proposed method can be employed to detect internal defects with high accuracy and detection speed.
The remainder of this paper is organized as follows. First, the principles of ultrasonic image detection and defect classification are analyzed in Sect. 2. Section 3 presents the generalization feature clusters of different types of defects and the K-means clustering algorithm based on strong constrained generalization features. The experimental results and analysis of the proposed method are presented in Sect. 4. Finally, the conclusions are presented in Sect. 5.

Principle of Internal Ultrasound Image Detection of Railway Track
Ultrasound has the advantages of a fast spread speed and broad applicability [21] in nondestructive testing. According to the propagation features of ultrasonic waves in different media, the probe emits a certain frequency of sound waves into the rail. If a rail defect arises, a defect wave appears in front of the bottom wave and the peak value of the bottom wave decreases or disappears. Furthermore, the size and position of the defects can be evaluated using the reflected signal. The ultrasonic echo signal is amplified, filtered, and converted by the ultrasonic receiver to draw and display the electrical signal in the form of a digital image, which is called a B-scan image. Therefore, the acquisition process of the rail B-scan image is the collection process of rail defect data. The acquisition mechanism of the B-scan image in the rail track is illustrated in Figure 1. The coupling liquid is sprayed on the contact surface between the probe and rail to prevent attenuation of the ultrasonic signal energy, and water is selected as the coupling agent. The shape, position, and depth of the defects are detected by the features of ultrasonic propagation, reflection, and refraction in railway tracks [22]. Usually, an ultrasonic pulse propagates inside a track to detect internal defects. If it encounters cracks and flaws with different acoustic impedances, it will produce primary and secondary reflections [23]. The defect shape in the railway track can be imaged and displayed [24] by analyzing the magnitude, quantity, and waveform of the reflected wave.
Ultrasonic probes with different angles can detect defects in different locations of the railway track and distinguish them using different colors. As illustrated in Figure 2, the 0° probe generates ultrasonic longitudinal wave beams, which are utilized to detect horizontal cracks in the screw holes, as indicated in red in the B-scan image. The 70° probe generates ultrasonic shear wave beams and detects rail head flaws by primary or secondary waves. In addition, the rail head flaws are indicated in red, green, and blue in the B-scan image. The 37° probe generates ultrasonic shear wave beams and detects other types of defects. A rail B-scan image is imaged using a reflection echo of 0°, 37°, and 70° probes [25]. The B-scan image of color ultrasound for the normal railway track and the actual B-scan image containing defects are illustrated in Figure 3.

Classification Method of Railway Track Internal Defects
A color B-scan image can provide projected-sectional features of defects, with respect to the normal incident wave. In addition, it provides the horizontal location and depth information of the internal defects in the railway track. As presented in Table 1, 12 types of internal  railway track defects can be detected by ultrasound. For example, the type 1 defect is an inside flaw in the rail head. It can be detected by a 70° probe and is indicated in red in the ultrasonic B-scan image. To clarify all these types of defects, some defects are considered as an example in Figure 4. Here, the inside of the rail head and screw  hole are both near the center of the railway. Therefore, defects of types 1, 2, 3, and 4 can be illustrated clearly in Figure 4, and other types of defects can also be illustrated in a similar manner. However, it should be noted that inverted cracks can only be observed on one side of two screw holes close to the rail end face. Furthermore, defects are classified according to ultrasonic imaging mode, defect location, defect cause, defect features, and conventional railway track internal defect classification methods [26]. The acquired ultrasonic images may have some differences, as illustrated in Figure 3(b). The acquisition process may cause random clutter, image coverage, and image fracture. As illustrated in Figure 5, all types of defects are labeled, and are presented in Table 1. There are various clutter types in the black circular region, and the joint is in the yellow triangular region. The length of the rail head flaw is different, and appears in pairs or separately. The lower crack in the screw hole is connected to the screw hole. There are differences in the thickness and length of the rail bottom defect image contours. Furthermore, defects are paired image contours or single image contour, which vary in shape and make it difficult to accurately describe defect features with accuracy models. Therefore, this study proposes a detection method based on a generalization feature cluster to address this challenge.

Defects Feature Analysis and Detection
Several types of defects and patterns arise in railway tracks. Building a precise defect feature model with existing samples to detect such defect images often results in challenges of over -or under-constraint [27]. This study addresses this challenge as follows. First, the existing samples are analyzed to extract features for each type of defect. Second, the extracted features are generalized to increase the fault tolerance and generalization of the features, which means that the generalized features not only exhibit the general characteristics of the same type of defects, but also eliminate the differences as much as possible, such as their positions and shapes. Then, according to the correlation and non-correlation among the defects, generalization feature clusters are built. Finally, generalization feature clusters are employed to detect the B-scan image of the railway track. They indicate the common features of the same type of defects and noncorrelation characteristics between different types of defects, which can effectively avoid the mentioned challenge, speed up the detection time, and improve the detection accuracy.
Practically, because of the influence of different acquisition equipment, working environment, and other factors, B-scan images usually have different features, such as shape, contour, and position. It is difficult to describe defects accurately using conventional features or a single feature because of its poor generalization and application ability. Therefore, in this study, multiple generalization features are combined to build a generalization feature cluster, which is utilized to achieve better defect detection and expand the applicable range of the defect detection model.

Ultrasonic Image Preprocessing
Image noise usually affects the detection accuracy and stability. It is necessary to eliminate the noise before extracting defect feature. The image preprocessing is illustrated as Figure 6. First, according to the color difference of defects, the image is separated into different color channels. Second, the morphological method is employed to remove image noise that emerged from image acquisition processes due to artificial or environmental factors. Specifically, binary B-scan images are dilated before being corroded, and the morphological operator is set according to the contour direction. As illustrated in Figure 7, there are two morphological operators for the lower crack processing of the screw hole. Then, standardization of the ultrasonic image is processed by the skeleton extraction algorithm [28] to eliminate the thickness challenge of different image contour caused by the sensitivity of the ultrasonic probe. As illustrated in Figure 8(a), the image acquired by the ultrasonic detector is indicated in red, green, and blue. Each color represents a defect detected by probes with different angles. First, as illustrated in Figure 8b-d, the image is separated into red, green, and blue images by channel separation processing. Second, image denoising for each single-channel image can eliminate the influence of small clutter on the subsequent defect evaluation, and Figure 8e is eventually obtained. Then, as illustrated in Figure 8f, to improve the detection accuracy, the singlechannel image is refined to obtain a standardized monochrome ultrasonic image. Finally, the defect is located and detected using the proposed algorithm.

Model Analysis of Generalization Features Cluster
The generalization feature cluster is defined as the overall evaluation of feature comparison, color recognition, and the Euclidean distance measurement. This model can be expressed as follows: where a i is the feature of type i defect, such as its position region, area, length, and slope. f i (a i ) denotes the feature function corresponding to a i , T i min and T i max are the lower and upper thresholds of f i (a i ) , respectively. In addition, their values are different for different defects. C i is the color composition of type i defect. R i , G i and B i represent the red, green and bule color components of type i defect, respectively. It should be noted that not all kinds of defects have these three channel colors. x i , y i and x 0 , y 0 are the centroid coordinates of type i defect and its corresponding reference, respectively. D i min and D i max are the lower and upper limits of distance threshold, respectively. Furthermore, the value of distance threshold is different to different defects. Therefore, a defect can be classified as a type of i defect if it satisfies Eq. (1). Thus, every defect is evaluated using the same evaluation method.

Generalization Features Clusters Analysis of Defects
The defects presented in Table 1 and Figure 5 are considered. First, auxiliary lines in the image are analyzed and positioned to divide the image into three parts: rail head, rail waist, and rail bottom. On this basis, different position regions are utilized as the original features for rough classification. Second, to identify the defect more accurately, the features of the defect, such as color, area, aspect ratio, and location relationship, are analyzed and extracted. To further increase the tolerance of the extracted features to the same type of defect, the extracted features must be transformed into generalization features. Finally, a suitable generalization feature cluster comprising different generalization features is employed to detect the defect. The process of building generalization feature clusters for the rail head, rail waist, and rail bottom is analyzed and explained as follows.

Rail Head
As illustrated in Figure 9, defects in the rail head include joints, inner flaws, outer flaws, middle flaws, and clutters. Five generalization features, including the position region, defect color, defect area, defect height ratio, and distance between the defect and joint, are selected and formulated to build a generalization feature cluster. The gap between the two rail joints is much larger than that of the rail head flaw. The joints are detected by probes of 0°, 37°, and 70°, and the image contour colors are red, green, and blue. The ratio of their lengths to the depth of the rail head is large. Rail-head flaws can be divided into inner, middle, and outer flaws according to their different positions, which are distinguished by red, green, and blue, respectively. In addition, the flaw area in the rail head is smaller than the joint area, and the ratio of its length to the rail head height is negligible. After image preprocessing, a negligible number of larger clutters were recognized as defects by generalization feature clusters. Usually, larger clutter is caused by the separation of the ultrasonic probe from the track surface, which leads to a longer oblique line in the rail head. These generalization features can be expressed as follows. Joint, From Eqs.
(2)-(5), x c , y c , x r1 , y r1 , x g1 , y g1 and x b1 , y b1 are the centroid coordinates of joint, inner flaw, middle flaw, and outer flaw in the rail head, respectively. C c , C r1 , C g1 , and C b1 denote the color composition of the joints, inner flaw, middle flaw, and outer flaw of the rail head, respectively. S c , S r1 , S g1 , and S b1 are the areas of joint, inner, middle, and outer flaws in the rail head, respectively. l c , l r1 , l g1 , and l b1 are the lengths of the joints, inner , middle , and outer flaws of the rail head, respectively. In addition, S 0 is an area constant related to the image resolution, H is the height of the contour circumscribed rectangle for the joint, and H t is the height of the rail head region. H 0d is the ratio of the height of the contour circumscribed rectangle of the joint to the height of the rail head region. h 1 denotes the depth of the rail head. l is a constant of the length proportional coefficient, which is set according to the sample statistics and calculation. cols denotes the width of the detected image. m is the constant of the horizontal distance between the defect and the joint. Therefore, the generalization feature clusters expressed by Eqs. (2)-(5) can be employed to detect joints and three types of rail-head flaws. Generally, the generalized feature constraints of the rail head flaws include the longitudinal position, color composition, and area of the rail head flaw; the ratio of the length of the rail head flaw to the height of the rail head region, and the horizontal distance between the rail head flaw and the joint. The longitudinal position of the rail head flaw limits its longitudinal distribution in rail head. As illustrated in Figure 10, the position of the rail head flaw fluctuates, but is always in the rail head. Therefore, the longitudinal position features have poor constraints but good generalization ability. The color composition of a rail head flaw is an essential condition for distinguishing its type, and it has a clear directional constraint. The area of the flaws reflects the defect size. Although the area is not the same for each defect, it is generally distributed in a specific range. After the data statistics, its upper limit distribution can be defined by the area constant, whose value setting is flexible. The ratio between the length of the rail head flaw and height of the rail head region is an important criterion for distinguishing between joints and flaws. It has strong constraints and a good feature generalization ability. However, the threshold setting is critical. The horizontal distance constraint between the rail head flaw and joint prevents the joint from being evaluated as a rail head flaw of the same color channel after channel separation. The generalized feature constraints of the joints are similar to those of the rail head flaws.

Rail Waist
As illustrated in Figure 11, ultrasound imaging of the rail waist includes the screw hole, inner upper cracks, inner lower cracks, outer upper cracks, outer lower cracks, horizontal cracks, inner inverted cracks, and outer inverted cracks of the screw hole. In addition, it also includes clutter.
Seven generalization features, including the position region, defect color, centroid coordinate, defect slope, defect length, ratio of defect area to defect length, and defect distance, are selected and formulated to build a generalization feature cluster. Normal screw-hole image contours are defined by semi-circular images comprising red, green, and blue line segments, and one of which is marked with an ellipse in Figure 11. The image contour of the upper crack in the screw hole is located on the left or right side of the normal screw-hole image contour and is related to the defect position. The image contours of the lower and horizontal cracks in the screw hole are both located below the normal image contour of the screw hole. The image contour of the inverted crack is located below the normal image contour of the screw hole, and their directions are opposite to each other. Usually, the image contour of a cluster is not a line segment but can be filtered by setting the ratio constraint, which is the ratio of the image contour area to the image contour length. For example, the image contours inside of the screw hole include four defects: inner upper crack, inner lower crack, inner inverted crack of the screw hole, and clutter. Generalization feature clusters of defects inside the screw hole can be expressed as follows: The inner upper crack of the screw hole, The inner lower crack of the screw hole, The inner inverted crack of the screw hole, From Eqs. (6)-(8), x g , y g is the centroid coordinate of the screw hole. x g2 , y g2 , x g3 , y g3 , and x g4 , y g4 are the centroid coordinates of the inner upper crack, inner lower crack, and inner inverted crack of the screw hole, respectively. C g2 , C g3 , and C g4 denote the color compositions of the inner upper crack, inner lower crack, and inner inverted crack for the screw hole, respectively. k g denotes the contour slope of the screw hole. k g2 , k g3 , and k g4 are the contour slopes of the inner upper crack, inner lower crack, and inner inverted crack of the screw hole, respectively. S g2 , S g3 , and S g4 are the areas of the inner upper crack, inner lower crack, and inner inverted crack of the screw hole, respectively. l g is the length of the screw hole. l g2 , l g3 , and l g4 are the lengths of the inner upper crack, inner lower crack, and inner inverted crack of the screw hole, respectively. In addition, l 1 and l 2 are constants related to the ratio of the defect length to the normal screw-hole length. n is a constant related to the ratio of the defect area to the defect length. h 2 is the height of the rail waist region, and is expressed by the number of pixels. L min and L max are constants related to the distance between the image contour centroids, which are the distances between screw holes.
Eqs. (6)-(8) can detect three types of screw-hole cracks: the inner upper crack, inner lower crack, and inner inverted crack of the screw hole. The outer upper, outer lower, outer inverted, and horizontal cracks of the screw hole can be detected in the same manner.
The generalized feature constraints of screw hole cracks can be summarized as follows: centroid height, color composition, position relationship between the screw hole crack and its corresponding screw hole, inclination relationship between the screw hole crack and its corresponding screw hole, length ratio between the screw hole crack and its corresponding screw hole, ratio of the screw hole crack area and its length, and the distance relationship between the screw hole crack and its corresponding screw hole. Specifically, the centroid height of the screw-hole crack reflects its longitudinal position in the screw hole. The color composition of the screw-hole crack reflects the angle of acquisition probe. The positional relationship between the screw hole crack and its corresponding screw hole indicates that the screw hole crack appears above or below the screw hole. The inclination angle of the screw hole crack and its corresponding screw hole reflects whether they will be imaged by the same color channel. If the inclination direction is the same, they are imaged by the same color channel. Otherwise, they are imaged by different color channels. The ratio of the screw hole crack length to the corresponding screw hole length reflects the size relationship of the screw hole crack. The ratio of the crack area to its length approximately reflects the crack width, which is affected by the probe sensitivity. The distance between the screw hole crack and its corresponding screw hole is a significant constraint for evaluating the screw hole crack, and its threshold can be obtained using statistical methods.

Rail Bottom
As illustrated in Figure 12, the defects in the rail bottom contain longitudinal cracks, transverse cracks, and clutter. Five generalization features of the position region, defect color, defect slope, defect length, and defect distance are selected and formulated to build a generalization feature cluster. There are three types of image contours for transverse cracks: a single blue oblique segment, single green oblique segment, and blue-green oblique segment appearing in pairs. The longitudinal crack is a long-line segment with red and horizontal lines. Transverse and longitudinal cracks are represented by generalization feature clusters, as follows: Transverse crack of rail bottom, Longitudinal crack of rail bottom, From Eqs. (9) and (10), x i , y i and x t , y t are the centroid coordinates of the transverse and longitudinal cracks in the rail bottom, respectively. C i and C t denote the color compositions of the transverse and longitudinal cracks at the rail bottom, respectively. l i and l t are the lengths of the transverse and longitudinal cracks at the rail bottom, respectively. x i1 , y i1 and x i2 , y i2 are the centroid coordinates of the two adjacent rail bottom defects. k i1 and k i2 are the slopes of the two adjacent rail bottom defects. In addition, if the distance between these two points is less than the setting distance I min , the transverse crack at the rail bottom comprises blue and green defect contours in pairs, and the contour slope directions are opposite. k t is the slope of the longitudinal crack, and the image contour of the longitudinal crack at the rail bottom is approximately horizontal. r ow denotes a row of ultrasonic images, and T min is the length threshold. Therefore, the combination of generalization feature clusters expressed by Eqs. (9)-(10) can be employed to detect transverse and longitudinal cracks at the rail bottom.

Feature Clustering and Dimension Reduction
In the generalization feature cluster, the constraint conditions of each feature are different. Some features have a good constraint effect, whereas others have a poor constraint effect. In this study, features with a strong constraint effect are selected by analyzing the constraint effect of each feature. Subsequently, new constraint features are constructed to achieve dimension reduction and simplify defect identification. In the rail head, S c is the area of the contour circumscribed rectangle, and H 0d is the ratio of the height of the contour circumscribed rectangle to the height of the rail head region, and both are strong constraint generalization features for evaluating rail head flaws and joints. Usually, there are various types of screw-hole cracks in the rail waist, which are evaluated by the position relationship between the screw hole and defect. Therefore, it is difficult to constrain the defects uniformly using the generalized features of the reduction dimension. However, for a specific type of defect, it can be identified using the reduced-dimension generalization feature cluster. For example, the contour centroid height and defect area are strong constraint generalization features for evaluating the horizontal crack of a screw hole. The centroid height and contour circumscribed rectangle length are strong constraint generalization features for evaluating the lower crack of the screw hole. The angle (10) formed by the green and blue contours and the horizontal distance between two centroids are also strong constraint generalization features for evaluating rail bottom defects.
In this study, the K-means clustering method is developed to classify defects in different regions in the following ways.
(1) The number of defect types k is determined in different regions of the rail, and the strong constraint generalization features of various kinds of defects are extracted respectively. Then, the dataset is built using the strong-constraint generalization feature. Moreover, normalization and standardization processing are required for the data, with the following expression: where x ij is the j th sample point of the class i defect data, µ i and σ i are the mean and variance of the class i defect data, respectively. x imax and x imin are the maximum and minimum values of the class i defect data, respectively. (2) k clustering centers are selected randomly, and can refer to the initial setting threshold of related constraint features. Furthermore, the Euclidean distance from the remaining data points to the clustering center is calculated, and the nearest clustering point is selected as their type. (3) All data points belonging to the same cluster are processed with centroid operation and a new cluster center is calculated. (4) The process of (2) to (3) is repeated until the cluster center no longer changes.
DBI is employed to evaluate clustering [29,30]. The DBI index is the mean value of measuring the maximum similarity of each cluster, and is an important index for measuring the intra-cluster distance and cluster spacing. The smaller the value, the better the clustering effect. Assuming there are m digital sequences that are clustered into n clusters, m digital sequences are set as input matrix X, and n clusters are adopted as parameters for the input . The DBI can be expressed as: where S i is the average distance from the data of the class i defect to the cluster center of its corresponding class. It can be expressed as (11) x ij −µ i σ i , where T i is the amount of data in class i, C i is the cluster center of class i. p is adopted to calculate the Euclidean distance of two-dimensional data and p = 2 . The distance between the cluster i defect and cluster j defect is defined as DBI index can be defined with the same way.
The R i,j of the maximum value for each cluster class i defect is calculated. Then, (13) D i is the maximum similarity between cluster class i and other cluster classes. The DBI index is obtained by considering the mean of the maximum similarity of all classes.
The algorithm proposed in this study is illustrated in Figure 13. First, various defects in ultrasonic images are classified according to the classification information presented in Table 1. Second, defect images are preprocessed to achieve image enhancement, including color channel separation, image denoising, image standardization, and image contour location. Based on this, the generalization features of various types of defects are analyzed. For example, in the rail head, after analyzing the influence of various defect features on the detection effect, five generalization features with better detection effects are selected to detect defects. They are then formulated and expressed using a theoretical model and analysis. Finally, generalization feature clusters of all types of defects  are constructed using the adopted generalization features for defect identification and classification. In addition, based on the proposed generalization features, the K-means clustering algorithm is used to verify the validity of strong constraint generalization features based on dimension reduction and to identify and classify defects.

Experimental Results and Analysis of the Proposed Method
To clarify the proposed algorithm, the detection method for rail head flaws is employed to illustrate it again, and other types of defect detection can be coped in the same way. First, the image is pre-processed for image enhancement and noise removal. The detailed steps include color channel separation, image denoising, image standardization, and image contour location. Second, generalized features of rail head flaws are formulated and expressed, including the position region, defect color, defect area, defect height ratio, and distance between the defect and joint. Finally, these generalization features are used to construct an appropriate generalization feature cluster for the identification and classification of rail head flaws. In addition, after extracting all types of generalization features of flaws, the strong constraint generalization feature based on dimension reduction is analyzed and obtained, and K-means clustering algorithm is applied to identify and classify the rail head flaws.   Furthermore, to evaluate the effectiveness of the proposed method, sample and tested images are detected and analyzed to verify its detection accuracy and speed. The cooperator, Guangdong Goworld Co., Ltd., provided 43 sample images with a resolution of 2000 × 400 pixels. As illustrated in Figure 14, the sample images were acquired using the ultrasonic flaw detector EGT-60, which was developed by the authors and a cooperative company. Defects in the sample images were labeled by three experienced rail inspection workers before testing. The proposed method was then employed to extract generalization feature clusters from 43 sample images and detect defects. In this experiment, the misjudgment and missed detection rates were also analyzed.
Accuracy rate: Misjudgment rate: Missed detection rate: From Eqs. (18)- (20), P 1 is the accuracy rate, P 2 is the misjudgment rate, and P 3 is the missed-detection rate. s denotes the number of defects counted by manual (18) inspection, and e e denotes the number of defects consistent with the results of manual detection. e r denotes the number of defects detected artificially, but is detected as other types of defects by the proposed method. r e denotes the number of defects detected artificially but is detected to be normal by the proposed method. If there is no missed inspection and misjudgment, then P 3 = P 2 = 0 , and P 1 = 1 . If there are no missed inspections and only false inspections, P 3 = 0 , P 2 = 1 − P 1 . Subsequently, 43 sample images and 953 testing images were detected, and the experimental results are presented in Tables 2, 3 and  4.
As presented in Table 2, 43 sample images were obtained. The detection accuracy of the proposed method was up to 98.98%, and the average detection speed was 0.17 s per sample image. While the misjudgment rate was 0.64%, missed detection rate was 0.38%.
In addition, 953 railway track images were detected with the proposed method, based on the generalization feature cluster extracted from the sample image, and the results are presented in Table 3. The detection accuracy was 97.55%, misjudgment rate was 1.14%, and missed detection rate was 1.31%. The average detection time was 0.12 s. The detection speed was faster than before, which validates the effectiveness of the proposed method.
Types I and II track ultrasonic images are illustrated in Figure 15. Three channels of the rail head are displayed in the Type-II track ultrasonic image, which includes seven boundary lines. In addition, the channels of the rail head are also displayed in the Type I track ultrasonic image, which includes five boundary lines. Three hundred Type-II track testing images were detected by adjusting the constraint parameters related to their position and region. The test results are presented in Table 4. As presented in Tables 2, 3 and 4, the detection accuracy rate is higher than 97% for different types of track defect images using the generalization feature cluster extracted from the same sample image, which means that the proposed method has good applicability and practicability for railway track defect detection. However, the rate of missing detection and misjudgment for the testing image is slightly higher than that for the sample image. For example, the rates of misjudgment for the inner and outer inverted screw hole lacerations were 15.38%, 10.00%, and 25.00%, respectively. The main reason for this is that the inverted crack in the screw holes is located near the joints. The ultrasonic probe will be off the rail surface in a short time owing to the gap at the joints. The ultrasonic wave attenuates rapidly in the air, which results in the occurrence of a variety of clutter with different patterns and sizes in this region. Moreover, some clutter image contours near the joint are complex, and a few clutters are mistaken for inverted cracks of screw holes because their contours conform to the features of generalization feature clusters. Accordingly, the rail waist region should be divided into the near joint and far joint regions, and the generalization feature cluster should be extracted, which will reduce the detection speed but improve the detection accuracy.

Comparison with Other Methods
To verify the detection performance of the proposed method, it is compared with two other methods: the back-propagation (BP) neural network method [15] and the intelligent method of rail defect identification based on deep learning [16]. In Ref. [15], different neural network structures are utilized to identify different types of defects, and initial weights are defined using empirical values. Then, the momentum BP method and sigmoid function are used to train and output the results, respectively. In Ref. [16], the method is proposed based on the framework of the AlexNet convolutional neural network, and the softmax function is adopted to output results while training is conducted in the TensorFlow framework. The results are illustrated in Figure 16. The testing data and images of the proposed method are obtained from the EGT-60 double rail flaw detector.
As illustrated in Figure 16, X-coordinates 1 -12 are the number of defect types corresponding to Table 1, and X-coordinate 13 is the average accuracy of each corresponding method. Ref. [15] detects screw holes. The detection accuracy for defect types 4-10 is 95%, which means that its average detection accuracy is also 95%. In addition, the correlation thresholds of the hidden and output layers of the neural network must be adjusted. Ref. [16] also classifies the screw holes as screw hole defects, and the detection accuracy of defect types 4-10 is 96%. i.e., the average detection rate is 94.2%. The average defect detection rate of the proposed method is up to 97.60%. Therefore, considering the main generalization features of the restraint defect, the proposed method performed better than the other two methods. The experimental results further validated the effectiveness of the proposed method for railway track internal defect detection.

Evaluating the Effectiveness of Strong Constraint Generalization Features after Reducing Dimension
To validate the effectiveness of the proposed clustering method for reducing the dimension strong constraint generalization features, the K-means clustering method is employed to classify the rail head flaws and joints. Moreover, the DBI index and correct and incorrect classification indices are adopted to evaluate the clustering results. The results are illustrated in Figure 17. The number of samples are 120, 300, and 480 data points. As illustrated in Figure 17, the blue and red dots represent the clustering results of the rail head flaw and joint, respectively. The blue and red squares represent the clustering centers of the rail head flaw and joint, respectively. In addition, dots marked with rectangles represent the error clustering data. A few of them were considered as joints by the K-means algorithm owing to their locations. Such a method usually leads to misclassification. One of the reasons may be that they are noisy with a large image area and are not removed during preprocessing. Another possible reason is that the contour of some flaws is altered after morphological processing, which makes the area and height change evidently, resulting in misclassification.
Furthermore, as illustrated in Figure 17, the distribution of the rail head flaw is relatively centralized and stable, and the distribution of the joint is relatively scattered. This is because the joint is imaged by a 70° endface wave with a large amplitude and long displacement, which leads to the scattered situation of joint imaging. Therefore, joints present features of a broad span of data distributions. Table 5 presents detailed clustering results. From the cluster center, it indicates that the cluster center of the rail head flaw is relatively stable, while the cluster center of joint fluctuates. With an increase in the number of samples, the accuracy of clustering gradually increases; however, the number of error clustering also increases. The reason is that the K-means clustering algorithm only measures the similarity among data points, and does not consider the correlation of features contained in the same set of data, such as the weight proportion of the area and height parameters. It can be solved by keeping the stability of B-scan image acquisition to avoid the appearance of extreme data. In addition, it also can be inferred that the DBI index of several clustering results is smaller, which indicates that the clustering effect is improved, and the defect identification and classification can be achieved by strong constraint generalization features.

Conclusions
To detect rail internal defects, an internal defect detection method based on a generalization feature cluster was proposed in this study. The main conclusions are summarized as follows.
(1) The proposed method avoided instability, over constraint, or under constraint of feature expression during detection processing. Experimental results indicated that the proposed method can also perform better than two other methods, with an accuracy of 97.55% and average detection time of 0.12 s/ frame. (2) The K-means method was developed to cluster strongly constrained generalized features after dimension reduction. In addition, the DBI index and clustering accuracy were adopted to evaluate the results, and good clustering results were achieved, with an accuracy of over 95%. (3) The experimental results indicated that the proposed method has higher detection accuracy and better application prospects than the developed methods, which can improve the operational safety of high-speed rail and rail transit systems. Further studies will be focused on the internal defect detection based on deep learning.