FIT to monitor feed quality

Tamraparni Dasu, Vladislav Shkapenyuk, Divesh Srivastava, Deborah F. Swayne
2015 Proceedings of the VLDB Endowment  
While there has been significant focus on collecting and managing data feeds, it is only now that attention is turning to their quality. In this paper, we propose a principled approach to online data quality monitoring in a dynamic feed environment. Our goal is to alert quickly when feed behavior deviates from expectations. We make contributions in two distinct directions. First, we propose novel enhancements to the DFMS architecture to permit a publish-subscribe approach to incorporate data
more » ... lity modules into the DFMS architecture. Second, we propose novel temporal extensions to standard statistical techniques to adapt them to online feed monitoring for outlier detection and alert generation at multiple scales along three dimensions: aggregation at multiple time intervals to detect at varying levels of sensitivity; multiple lengths of data history for varying the speed at which models adapt to change; and multiple levels of monitoring delay to address lagged data arrival. FIT, or Feed Inspection Tool, is the result of a successful implementation of our approach. We present several case studies outlining the effective deployment of FIT in real applications along with user testimonials.
doi:10.14778/2824032.2824070 fatcat:5stn34alg5fizcfotd2zjl5fxy