Filters








12,360 Hits in 5.5 sec

Exploiting Class Labels to Boost Performance on Embedding-based Text Classification [article]

Arkaitz Zubiaga
2020 arXiv   pre-print
To make the most of these embeddings as features and to boost the performance of classifiers using them, we introduce a weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency  ...  Text classification is one of the most frequent tasks for processing textual data, facilitating among others research from large-scale datasets.  ...  Previous work leveraging class labels to boost the performance of word embeddings on text classification has largely focused on sentiment analysis.  ... 
arXiv:2006.02104v2 fatcat:tjl2ki6vvnat5miozcmyrhowcq

An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding

Mohammad Shamsul Hoque, Norziana Jamil, Nowshad Amin, Kwok-Yan Lam
2021 Sensors  
Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but  ...  We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded  ...  The cost function is able to aid the classification algorithms by efficiently differentiating the binary classes and thereby to predict the label classes with overall great performance in accuracy, precision  ... 
doi:10.3390/s21124220 fatcat:jmud5z7yzbdd3obtmkeu5y7kie

A Multi-View Ensemble Classification Model for Clinically Actionable Genetic Mutations [article]

Xi Sheryl Zhang, Dandi Chen, Yongjun Zhu, Chao Che, Chang Su, Sendong Zhao, Xu Min, Fei Wang
2019 arXiv   pre-print
The machine learning task aims to classify genetic mutations based on text evidence from clinical literature with promising performance.  ...  We develop a novel multi-view machine learning framework with ensemble classification models to solve the problem.  ...  The authors would like to thank the support from Amazon Web Service Machine Learning for Research Award (AWS MLRA).  ... 
arXiv:1806.09737v2 fatcat:uf272dro6rai5cqjh7y3r7whma

Weakly-supervised Text Classification Based on Keyword Graph [article]

Lu Zhang, Jiandong Ding, Yi Xu, Yingyao Liu, Shuigeng Zhou
2021 arXiv   pre-print
Among them, keyword-driven methods are the mainstream where user-provided keywords are exploited to generate pseudo-labels for unlabeled texts.  ...  With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts. Finally, we re-extract keywords from the classified texts.  ...  For vertices V, the embedding of a node is initialized with vector x v = [v class ; v index ] ∈ R C+|V| , where v class is the one-hot embedding of keyword class, v index is the one-hot embedding of keyword  ... 
arXiv:2110.02591v1 fatcat:jlsfcu7qafa4pdugimhbead4zq

Learning to Recognize Objects from Unseen Modalities [chapter]

C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann, Trevor Darrell
2010 Lecture Notes in Computer Science  
This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning.  ...  To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones.  ...  Using unlabeled text to improve visual classification: We focus on a recognition task that exploits an additional text modality that is only present at test time (no labeled data is available) to improve  ... 
doi:10.1007/978-3-642-15549-9_49 fatcat:l4y2hqy3f5fcffbqgu53lmeozi

Guess What's on my Screen? Clustering Smartphone Screenshots with Active Learning [article]

Agnese Chiatti, Dolzodmaa Davaasuren, Nilam Ram, Prasenjit Mitra, Byron Reeves, Thomas Robinson
2019 arXiv   pre-print
We tested whether SVM-embedded or XGBoost-embedded solutions for class probability propagation provide for more well-formed cluster configurations.  ...  Thus, there is need to examine utility of unsupervised and semi-supervised methods for digital screenshot classification.  ...  Two classifiers are embedded in the discussed framework to test for their impact on the overall performance: Extreme Gradient Boosting (XGBoost) and Support Vector Machines (SVM).  ... 
arXiv:1901.02701v2 fatcat:uepzsys4lzdexd3bak75v5fjmq

Minimally Supervised Categorization of Text with Metadata [article]

Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han
2021 arXiv   pre-print
Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity. We conduct a thorough evaluation on a wide range of datasets.  ...  needs to be performed using only a small set of annotated data.  ...  Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. We thank anonymous reviewers for valuable and insightful feedback.  ... 
arXiv:2005.00624v3 fatcat:ixexrrey3jeapegvw7c5udzl7m

Deep Convolution Neural Network for Extreme Multi-label Text Classification

Francesco Gargiulo, Stefano Silvestri, Mario Ciampi
2018 Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies  
It is worth noting that multi-label classification is an harder problem if compared to multi-class, due to the variable number of labels associated to each sample.  ...  In this paper we present an analysis on the usage of Deep Neural Networks for extreme multi-label and multiclass text classification.  ...  While DL multi-class and multi-label image classification produces state of the art performances, the same problem applied to text classification has many still open issues.  ... 
doi:10.5220/0006730506410650 dblp:conf/biostec/GargiuloSC18 fatcat:yq23faemnvdclpn5uxdjopcmre

Identification of Extreme Guilt and Grave Fault in Bengali Language using Machine Learning

2020 International journal of recent technology and engineering  
We have implemented an Ada Boost algorithm and Maximum voting classification decision method depending on the results of baseline classifiers.  ...  Ensemble learning has been used to improve the baseline classifiers.  ...  Firstly, the exploration of character embedding instead of word embedding exploiting countvectorizer.  ... 
doi:10.35940/ijrte.f7691.038620 fatcat:d2y5h4alqbervfc5xqnut2kbjy

L-Boost: Identifying Offensive Texts from Social Media Post in Bengali

M. F. Mridha, Md. Anwar Hussen Wadud, Md. Abdul Hamid, Muhammad Mostafa Monowar, M. Abdullah-Al-Wadud, Atif Alamri
2021 IEEE Access  
This algorithm selects a offensive text based on text classification algorithms.  ...  Then, we applied boosting algorithms based on baseline classifiers.  ... 
doi:10.1109/access.2021.3134154 fatcat:jaaavefprne2xlukdtywtxzd6a

Shoestring: Graph-Based Semi-Supervised Learning with Severely Limited Labeled Data [article]

Wanyu Lin, Zhaolin Gao, Baochun Li
2020 arXiv   pre-print
Graph-based semi-supervised learning has been shown to be one of the most effective approaches for classification tasks from a wide range of domains, such as image classification and text classification  ...  In particular, our framework learns a metric space in which classification can be performed by computing the similarity to centroid embedding of each class.  ...  Classification, for an embedded unlabeled sample, is then performed by finding its nearest class prototype based on the learned semantic metric.  ... 
arXiv:1910.12976v2 fatcat:bc2uikyhgnb6hifwzdw2m6psny

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text [article]

Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
2021 arXiv   pre-print
We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem.  ...  We show that the performance of BERT based models is best. Moreover, CNN and LSTM models also perform competitively with BERT based models.  ...  We would like to express our gratitude towards our mentors at L3Cube for their continuous support and encouragement.  ... 
arXiv:2101.04144v4 fatcat:2334yi6zlzflpdv7noy7n7qigq

Target Based Speech Act Classification in Political Campaign Text

Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin
2019 Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*  
We show how speech acts and target referents can be modeled as sequential classification, and evaluate several techniques, exploiting contextualized word representations, semi-supervised learning, task  ...  We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance.  ...  Conclusion and Future Work In this work we present a new dataset of election campaign texts, based on a class schema of speech acts specific to the political science domain.  ... 
doi:10.18653/v1/s19-1030 dblp:conf/starsem/SubramanianCB19 fatcat:mqnoze5cerftjihklq7bpknbxu

Target Based Speech Act Classification in Political Campaign Text [article]

Shivashankar Subramanian and Trevor Cohn and Timothy Baldwin
2019 arXiv   pre-print
We show how speech acts and target referents can be modeled as sequential classification, and evaluate several techniques, exploiting contextualized word representations, semi-supervised learning, task  ...  We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance.  ...  Conclusion and Future Work In this work we present a new dataset of election campaign texts, based on a class schema of speech acts specific to the political science domain.  ... 
arXiv:1905.07856v1 fatcat:tkhfgfrgzzae3b4hcy2eykkoeu

Hierarchical Image Classification using Entailment Cone Embeddings [article]

Ankit Dhall, Anastasia Makarova, Octavian Ganea, Dario Pavllo, Michael Greeff, Andreas Krause
2020 arXiv   pre-print
images boosts overall performance.  ...  We present a set of methods for leveraging information about the semantic hierarchy embedded in class labels.  ...  In contrast to general CNNs for image classification, the work done in [15] exploits unannotated text in addition to the images labels.  ... 
arXiv:2004.03459v2 fatcat:vfejv7ybkzb57o5gd3sadm5iva
« Previous Showing results 1 — 15 out of 12,360 results