HLP$@$UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors

Abeed Sarker, Graciela Gonzalez
2017 Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)  
We present a simple supervised text classification system that combines sparse and dense vector representations of words, and the generalized representations of words via clusters. The sparse vectors are generated from word n-gram sequences (1-3). The dense vector representations of words (embeddings) are learned by training a neural network to predict neighboring words in a large unlabeled dataset. To classify a text segment, the different vector representations of it are concatenated, and the
more » ... classification is performed using Support Vector Machines (SVMs). Our system is particularly intended for use by nonexperts of natural language processing and machine learning, and, therefore, the system does not require any manual tuning of parameters or weights. Given a training set, the system automatically generates the training vectors, optimizes the relevant hyper-parameters for the SVM classifier, and trains the classification model. We evaluated this system on the SemEval-2017 English sentiment analysis task. In terms of average F1-Score, our system obtained 8 th position out of 39 submissions (F1-Score: 0.632, average recall: 0.637, accuracy: 0.646).
doi:10.18653/v1/s17-2105 dblp:conf/semeval/SarkerG17 fatcat:nd2utdxu3zbd3fxsqtvzdauqlu