Temporal Dynamic Matrix Factorization for Missing Data Prediction in Large Scale Coevolving Time Series

Weiwei Shi, Yongxin Zhu, Philip S. Yu, Tian Huang, Chang Wang, Yishu Mao, Yufeng Chen
2016 IEEE Access  
Data missing in collections of time series occurs frequently in practical applications and turns out to be a major menace to precise data analysis. However, most of the existing methods either might be infeasible or could be inefficient to predict the missing values in large-scale coevolving time series. Also, the evolving of time series needs to be handled properly to adapt to the temporal characteristic. Furthermore, more massive volume of data is generated in many areas than ever before. In
more » ... an ever before. In this paper, we have taken up the challenge of missing data prediction in coevolving time series by employing temporal dynamic matrix factorization techniques. First, our approaches are optimally designed to largely utilize both the interior patterns of each time series and the information of time series across multiple sources to build an initial model. Based on the idea, we have imposed hybrid regularization terms to constrain the objective functions of matrix factorization. Then, temporal dynamic matrix factorization is proposed to effectively update the initial already trained models. In the process of dynamic matrix factorization, batch updating and fine-tuning strategies are also employed to build an effective and efficient model. Extensive experiments on real-world data sets and synthetic data set demonstrate that the proposed approaches can effectively improve the performance of missing data prediction. Even when the missing ratio reaches as high as 90%, our proposed methods still show low prediction errors. Dynamic performance demonstrates that the methods can obtain satisfactory effectiveness and efficiency. Furthermore, we have also demonstrated how to take advantage of the high processing power of Apache Spark to perform missing data prediction in large-scale coevolving time series. INDEX TERMS Matrix factorization, missing data prediction, time series, Apache Spark.
doi:10.1109/access.2016.2606242 fatcat:hvyksve7efbmtm3tgxfqi52r2e