Effective and Efficient Learning at Scale

Wei Yu
How to enable efficient and effective machine learning at scale has been a longstanding problem in modern artificial intelligence, which also motivates this thesis research. In particular, we aim at solving the following problems:1. How to efficiently train a machine learning model?2. How to speed up inference after the model is trained?3. How to make the model generalize better?We approach those problems from two perspectives: models and algorithms. On one hand, we design novel models that are
more » ... intrinsically fast to train and/or test. On the other, we develop new algorithms with rapid convergence guarantee. Not surprisingly, the above three problem are not mutually exclusive and thus solving one of them might also benefit others. For example, 1) a new model that can enableparallel computation helps accelerate both training and inference; 2) a fast algorithm can save time for hyper-parameter tuning and/or make it affordable for training withmore data, which in return boosts the generalization performance. This thesis consists of two parts. The first part presents new machine learning models with a focus on sequential data such as natural language processing andquestion answering. Firstly, we propose a model, LSTM-Jump, that can skip unimportant information in text, mimicking the skimming behavior of human reading.Trained with an efficient reinforcement learning algorithm, this model can be several times faster than a vanilla LSTM in inference time. Then we introduce a text encoding model that totally discards recurrent networks, which thus fully supports parallel training and inference. Based on this technique, a new question-answering model, QANet, is proposed. Combined with data augmentation approach via backtranslation, this model stays at the No.1 place in the competitive Stanford Question and Answer Dataset (SQuAD) from March to Aug 2018, while being times fasterthan the prevalent models. It was also the deepest neural network model for NLP when invented. The second part pr [...]
doi:10.1184/r1/11898309 fatcat:3h2fz5jf6zc6jjli2vp5yalb7a