Identifying Banking Transaction Descriptions via Support Vector Machine Short-Text Classification Based on a Specialized Labelled Corpus

Silvia Garcia-Mendez, Milagros Fernandez-Gavilanes, Jonathan Juncal-Martinez, Francisco J. Gonzalez-Castano, Oscar Barba Seara
2020 IEEE Access  
Short texts are omnipresent in real-time news, social network commentaries, etc. Traditional text representation methods have been successfully applied to self-contained documents of medium size. However, information in short texts is often insufficient, due, for example, to the use of mnemonics, which makes them hard to classify. Therefore, the particularities of specific domains must be exploited. In this article we describe a novel system that combines Natural Language Processing techniques
more » ... ith Machine Learning algorithms to classify banking transaction descriptions for personal finance management, a problem that was not previously considered in the literature. We trained and tested that system on a labelled dataset with real customer transactions that will be available to other researchers on request. Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance. Experimental results with a two-stage classifier combining this detector with a SVM indicate a high accuracy in comparison with alternative approaches, taking into account complexity and computing time. Finally, we present a use case with a personal finance application, CoinScrap, which is available at Google Play and App Store. INDEX TERMS Machine learning, natural language processing, banking, personal finance management. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 8, 2020 SILVIA GARCÍA-MÉNDEZ received the B.S. degree in telecommunication technologies engineering and the M.S. degree in telecommunication engineering from the University of Vigo, Spain, in 2014 and 2016, respectively, where she is currently pursuing the Ph.D. degree in information and communications technology. Since 2015, she has been working as a Researcher with the Information Technologies Group, University of Vigo. Her research includes the development of automatic solutions for natural language generation for Spanish and English. Ms. García-Méndez's awards and honors include the Connecting for Good Vodafone award, in 2017.
doi:10.1109/access.2020.2983584 fatcat:wnkm5ifimffzjnr6nuj4wlqnea