Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification

Kristiina Vaik, Marit Asula, Raul Sirel
2020 Zenodo  
This paper presents an industry-driven solution for extreme multi-label classification with a massive label collection. The proposed approach incorporates a large number of binary classification models with label pre-filtering and employs methods and technologies shown to be applicable in industrial scenarios where high-end computational hardware is limited. The system is evaluated on an Estonian newspaper article dataset which contains almost 2000 unique labels and has shown to perform over 80
more » ... times faster than applying all the binary models of the entire label set without negative impact on prediction scores.
doi:10.5281/zenodo.4306169 fatcat:q5nutroftndw3fbgxt2rgdtymu