Data Infrastructure for Machine Learning

Samridhi Jha
2019 International Journal for Research in Applied Science and Engineering Technology  
Data quality is critical for effective machine learning, and this makes data a first-class citizen in the context of machine learning, on par with algorithms, software, and infrastructure. As a result, machine-learning platforms need to support data analysis and validation in a principled manner, throughout the lifecycle of the machine learning process. This paper reviews the data infrastructure we built at Google to address these challenges in the context of large-scale production machine learning pipelines.
doi:10.22214/ijraset.2019.4133 fatcat:b5iojbgus5ai3lbqsevsinvquu