Coded machine learning: Joint informed replication and learning for linear regression

Shahroze Kabir, Frederic Sala, Guy Van den Broeck, Lara Dolecek
2017 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)  
This paper is concerned with coded machine learning: protecting machine learning algorithms from noise in test data by an informed channel coding approach. Unlike with traditional data storage, we do not seek to ensure that all test data is correctly read from storage and used as a noiseless input to the algorithm. Rather, we seek to protect data in a way that minimizes the effect on the algorithm output (i.e., minimizes a loss compared to the hypothetical noiseless output). We focus on the
more » ... We focus on the case where the collected test data, derived from low-power sensors and devices, is inherently noisy. We show that a smart replication strategy is an effective choice to reduce the impact on the algorithm output for linear regression algorithms. We focus on two scenarios. The first case is where the regression model is fixed, and we must allocate a fixed budget of redundancy for our replication scheme (in order to minimize the loss on the output due to noisy test data). Analyzing this case is necessary to build our understanding for the second case which is more novel. The second case involves a scenario where we may learn an optimized model and jointly protect it. We illustrate the advantages of our approach with practical experiments.
doi:10.1109/allerton.2017.8262880 dblp:conf/allerton/KabirSBD17 fatcat:pe5pqd5egnc45hqvjs5knv74mq