Data Shift in Legal AI Systems

Venkata Nagaraju Buddarapu, Arunprasath Shankar
2019 International Conference on Artificial Intelligence and Law  
One of the fundamental assumptions with any machine learning (ML) system is that training data comes from the same distribution as the real world data. However, in many real-world applications, this important assumption is often violated including legal research. A scenario where training and test samples follow different input distributions is known as covariate shift. This shift in data is often responsible for the deterioration in predictive performance of machine learning systems. The
more » ... tion of this research is to study the effect of covariate shift on deep learning systems used in legal research. In this paper, we propose a unified framework to detect covariate shift impacting AI systems and formulate a strategy to adapt to this shift on a periodic basis. To our knowledge, our work is the first to apply data shift detection and adaption techniques to deep learning systems involving high dimensional word embeddings. Through experiments and evaluations, we demonstrate that our framework can accurately detect data (covariate) shift on legal AI systems involving deep neural architectures.
dblp:conf/icail/BuddarapuS19 fatcat:cdjbwzuxtvbedjyhpnb3rb3vbi