Searching time series with Hadoop in an electric power company

Alice Berard, Georges Hebrail
2013 Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '13  
In this paper, we investigate the possibilities offered by the Hadoop eco-system for searching time series in an electric power company (Top-K or range-queries based on a similarity measure). There has been much work done on speeding up the search of time series in a large dataset, mainly by designing efficient indexing techniques preceded by reduction techniques. In this paper, we do not follow these approaches but focus on using the brutal force of distributed computations in the Hadoop
more » ... nment. We propose an implementation of time series search functions in Hadoop and describe experiments on a large database of electric power consumption curves (35M customers observed during 1 month at a 30' sampling rate). We also show that this architecture supports easily the computation of several distances for the same query with a small response time overhead: this is very useful in practice when the end-user does not know very well which distance to use.
doi:10.1145/2501221.2501224 dblp:conf/kdd/BerardH13 fatcat:w4j2entbhrbmdkkhpu7evw35ia