Detecting Semantic Anomalies in Truck Weigh-In-Motion Traffic Data Using Data Mining
Monitoring data from event-based monitoring systems are becoming more and more prevalent in civil engineering. An example is truck weigh-in-motion (WIM) data. These data are used in the transportation domain for various analyses, such as analyzing the effects of commercial truck traffic on pavement materials and designs. It is important that such analyses use good quality data or at least account appropriately for any deficiencies in the quality of data they are using. Low quality data may exist due to problems in the sensing hardware, in its calibration, or in the software processing the raw sensor data. The vast quantities of data collected make it infeasible for a human to examine all the data. We propose a data mining approach for automatically detecting semantic anomalies--unexpected behavior--in monitoring data. Our method provides automated assistance to domain experts in setting up constraints for data behavior. We show the effectiveness of our method by reporting its successful application to data from an actual WIM system: experimental data the Minnesota department of transportation collected by its Minnesota road research project (Mn/ROAD) facilities. The constraints the expert set up by applying our method were useful for automatic anomaly detection over the Mn/ROAD data: they detected anomalies the expert cared about--unlikely vehicles and erroneously classified vehicles--and the misclassification rate was reasonable for a human to handle (usually less than 3%). Moreover, the expert gained insights about the system behavior, such as realizing that a system-wide change had occurred. The constraints detected, for example, periods in which the WIM system reported roughly 20% of the vehicles classified as three axle single unit trucks to have one axle!