Filters








35,660 Hits in 7.9 sec

Improving Text Classification Accuracy by Training Label Cleaning

Andrea Esuli, Fabrizio Sebastiani
2013 ACM Transactions on Information Systems  
We also evaluate the degradation in classification effectiveness that these mislabelled texts bring about, and to what extent training label cleaning can prevent this degradation.  ...  In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain.  ...  Thanks also to the ICTIR 2009 and TOIS anonymous reviewers for critical work and suggestions that greatly helped to improve the quality of the article; TOIS Reviewer 1 is to be especially credited for  ... 
doi:10.1145/2516889 fatcat:alkr7t4h4jb2hj5er2uycfftti

Towards Robustness to Label Noise in Text Classification via Noise Modeling [article]

Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe
2021 arXiv   pre-print
Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.  ...  We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier.  ...  CONCLUSION We have presented an approach to improve text classification when learning from noisy labels by jointly training a classifier and a noise model using a de-noising loss.  ... 
arXiv:2101.11214v3 fatcat:o2lcgljnizczrnboow2b5ljd64

A Robust Method to Protect Text Classification Models against Adversarial Attacks

BALA MALLIKARJUNARAO GARLAPATI, Ajeet Kumar Singh, Srinivasa Rao Chalamala
2022 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
The experimental results show that our approach can effectively counter adversarial attacks on text classification models while maintaining classification performance on original clean data.  ...  Text classification is one of the main tasks in natural language processing. Recently, adversarial attacks have shown a substantial negative impact on neural network-based text classification models.  ...  We added additional vocabulary in the text blob to improve defense accuracy.  ... 
doi:10.32473/flairs.v35i.130706 fatcat:pwi2z7d4ajaq7pygh6tnyz5nbu

An Improved Optimal Method for Classification Problem

Huang Wei, Dong Xiao, Shang Wenqian, Lin Weiguo, Yan Menghan
2019 International Journal of Performability Engineering  
Combined with distributed Hadoop technology, a text classification model is designed and implemented by data research, data analysis, and contrast experiments.  ...  The addition of auxiliary information to train the training set can solve the under-fitting to a certain extent and improve the classification effect.  ...  Acknowledgments This work is partIy supported by the National Key R&D Program of China (No. 2018YFB0803700) and Fundamental Research Funds for the Central Universities.  ... 
doi:10.23940/ijpe.19.11.p23.30313041 fatcat:rxsmmcrgbfac3krzojasf7d7be

A Simple and Efficient Ensemble Classifier Combining Multiple Neural Network Models on Social Media Datasets in Vietnamese [article]

Huy Duc Huynh, Hang Thi-Thuy Do, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
2020 arXiv   pre-print
Text classification is a popular topic of natural language processing, which has currently attracted numerous research efforts worldwide.  ...  Therefore, this study aims to classify Vietnamese texts on social media from three different Vietnamese benchmark datasets.  ...  CNN accomplish the highest accuracy classification rate on CLEAN at 99.42% and OFFENSIVE at 68.60%. The HATE label has the highest accuracy by using the LSTM model with 85.10%.  ... 
arXiv:2009.13060v2 fatcat:kox4gpgyirhj5ond5yej5eyiqi

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages [article]

Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow
2022 arXiv   pre-print
Models trained with label noise may not generalize well.  ...  In this work, we experiment with a group of standard noisy-handling methods on text classification tasks with noisy labels.  ...  ACKNOWLEDGMENTS This work has been partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -Project-ID 232722074 -SFB 1102 and the EU Horizon 2020 projects ROX-ANNE under  ... 
arXiv:2206.01476v1 fatcat:zk5qfjv4dbfqzcpgupsmzf73bq

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification [article]

Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow
2022 arXiv   pre-print
It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise.  ...  However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noisehandling methods do not always improve its performance,  ...  Acknowledgments This work has been partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -Project-ID 232722074 -SFB 1102 and the EU Horizon 2020 projects ROX-ANNE under  ... 
arXiv:2204.09371v1 fatcat:ovdlsfhegvg45eqxrjj33iw6eq

An Collaborative and Early Detection of Email Spam Using Multitask Learning

Hariharan N, Kamaraj G, Ramanuja Babu R D
2021 International Journal of New Technology and Research  
result and efficiency compared to existing system.The experimental results show that the proposed algorithm has 92.8% accuracy.  ...  an effective solution to filter possible spam e-mails.In this paper a hybrid solution which uses machine learning algorithms like Deep Neural Network, Convolution Neural Network are used to produce an improved  ...  Hence the accuracy is improved. V.  ... 
doi:10.31871/ijntr.7.4.11 fatcat:xr3ms4rjfbh6jkeeqckgv3keqq

Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching

Xiaobo Tang, Hao Mou, Jiangnan Liu, Xin Du
2021 Scientific Reports  
AbstractDue to its potential impact on business efficiency, automated customer complaint labeling and classification are of great importance for management decision making and business applications.  ...  Furthermore, text enhancement is used to mitigate the problem of imbalanced samples that emerge when the number of labels is large. Finally, the word2vec model is utilized for deep text analysis.  ...  Acknowledgements The work is partially supported by the Chinese National Natural Science Foundation under project 71673209.  ... 
doi:10.1038/s41598-021-91189-0 pmid:34088946 fatcat:4fta4fqz25dmbjxgn6pdourp5i

Deep learning-based classification and structure name standardization for organ at risk and target delineations in prostate cancer radiotherapy

Christian Jamtheim Gustafsson, Michael Lempart, Johan Swärd, Emilia Persson, Tufve Nyholm, Camilla Thellenberg Karlsson, Jonas Scherman
2021 Journal of Applied Clinical Medical Physics  
A weighted classification accuracy of 99.4% was achieved during training.  ...  An image modality independent 2D InceptionResNetV2 classification network was trained with varying amounts of training data using four image input channels.  ...  Model optimization was guided by maximizing the class weighted (by class frequency) classification accuracy in the validation dataset.  ... 
doi:10.1002/acm2.13446 pmid:34623738 pmcid:PMC8664152 fatcat:ng7hfoltyvef5l77nalptealz4

Ensemble-based Semi-Supervised Learning for Hate Speech Detection

Safa Alsafari
2021 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data.  ...  Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.  ...  To improve their accuracy, we re-train the classifiers using both the seed and most trusted pseudo-labeled data.  ... 
doi:10.32473/flairs.v34i1.128427 fatcat:3s7vttew45deneiernrzzypzi4

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data [article]

Emilia Apostolova, R. Andrew Kreek
2018 arXiv   pre-print
Industry datasets used for text classification are rarely created for that purpose.  ...  In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels.  ...  We simulated simultaneously both text noise and label noise and observed that, across all experiments, the accuracy on a clean dataset significantly outperforms the accuracy measured on the dirty training  ... 
arXiv:1809.04019v1 fatcat:gcflmb6uengfpgmqzwaew7zdgu

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

R. Andrew Kreek, Emilia Apostolova
2018 Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text  
Industry datasets used for text classification are rarely created for that purpose.  ...  In most cases, the data and target predictions are a byproduct of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels.  ...  We simulated simultaneously both text noise and label noise and observed that, across all experiments, the accuracy on a clean dataset significantly outperforms the accuracy measured on the dirty training  ... 
doi:10.18653/v1/w18-6114 dblp:conf/aclnut/KreekA18 fatcat:5ryi66f25nddlfreuf5xauh6vq

Distant finetuning with discourse relations for stance classification [article]

Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu
2022 arXiv   pre-print
In this paper, in order to train a system independent from topics, we propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.  ...  Detailed experiments show that the automatically annotated dataset as well as the 3-stage training help improve model performance in stance classification.  ...  Added noisy samples in finetuning We also look at if adding noisy samples into the clean training set in clean finetuning is able to help the model improve its performance, most likely by regularizing  ... 
arXiv:2204.12693v1 fatcat:lncuykup5bhhdaffbahmy5zzlq

WeDef: Weakly Supervised Backdoor Defense for Text Classification [article]

Lesheng Jin, Zihan Wang, Jingbo Shang
2022 arXiv   pre-print
Therefore, a weakly supervised text classifier trained by only the poisoned documents without their labels will likely have no backdoor.  ...  We further improve the results through a two-phase sanitization: (1) iteratively refine the weak classifier based on the reliable samples and (2) train a binary poison classifier by distinguishing the  ...  Our work is sponsored in part by National Science Foundation Convergence Accelerator under award OIA-2040727 as well as generous gifts from Google, Adobe, and Teradata.  ... 
arXiv:2205.11803v1 fatcat:smmqq54lyngipopwmtxvigpy5a
« Previous Showing results 1 — 15 out of 35,660 results