Natural Language Processing with Small Feed-Forward Networks

Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David Weiss, Ryan McDonald, Slav Petrov
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
We show that small and shallow feedforward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory
more » ... e a small memory budget. 1. Quantization: Using more dimensions and less precision (Lang-ID: §3.1). 2. Word clusters: Reducing the network size to allow for word clusters and derived features (POS tagging: §3.2). 3. Selected features: Adding explicit feature conjunctions (segmentation: §3.3). 4. Pipelines: Introducing another task in a pipeline and allocating parameters to the auxiliary task instead (preordering: §3.4). We achieve results at or near state-of-the-art with small (< 3 MB) models on all four tasks. Small Feed-Forward Network Models The network architectures are designed to limit the memory and runtime of the model. Figure 1 illustrates the model architecture:
doi:10.18653/v1/d17-1309 dblp:conf/emnlp/BothaPMBSWMP17 fatcat:7onb4viaangf5cxyolx3z2qmzi