Data Quality Management in Large-Scale Cyber-Physical Systems

Ahmed Abdulhasan Alwan
2021
Cyber-Physical Systems (CPSs) are cross-domain, multi-model, advance information systems that play a significant role in many large-scale infrastructure sectors of smart cities public services such as traffic control, smart transportation control, and environmental and noise monitoring systems. Such systems, typically, involve a substantial number of sensor nodes and other devices that stream and exchange data in real-time and usually are deployed in uncontrolled, broad environments. Thus,
more » ... ected measurements may occur due to several internal and external factors, including noise, communication errors, and hardware failures, which may compromise these systems quality of data and raise serious concerns related to safety, reliability, performance, and security. In all cases, these unexpected measurements need to be carefully interpreted and managed based on domain knowledge and computational models. Therefore, in this research, data quality challenges were investigated, and a comprehensive, proof of concept, data quality management system was developed to tackle unaddressed data quality challenges in large-scale CPSs. The data quality management system was designed to address data quality challenges associated with detecting: sensor nodes measurement errors, sensor nodes hardware failures, and mismatches in sensor nodes spatial and temporal contextual attributes. Detecting sensor nodes measurement errors associated with the primary data quality dimensions of accuracy, timeliness, completeness, and consistency in large-scale CPSs were investigated using predictive and anomaly analysis models via utilising statistical and machine-learning techniques. Time-series clustering techniques were investigated as a feasible mean for detecting long-segmental outliers as an indicator of sensor nodes' continuous halting and incipient hardware failures. Furthermore, the quality of the spatial and temporal contextual attributes of sensor nodes observations was investigated using timestamp analysis techniques. The different com [...]
doi:10.15123/uel.8990y fatcat:dktrmiweoffkzeyv6772mblmq4