Filters








1,601 Hits in 7.1 sec

Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling [article]

Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang
2021 arXiv   pre-print
the class balancedness of the learned features, as evaluated via linear classifier evaluation on full-shot and few-shot settings.  ...  In this work, we present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK), which follows three simple principles: (1) tailness, which encourages sampling of examples from  ...  In this work, we are the first to explore a novel direction for improving the feature balancing in contrastive learning further, via sampling additional data from the open world. Active learning.  ... 
arXiv:2111.01004v2 fatcat:cmuqg2cwnrfbnhgs4bbcrvb6km

OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set Unlabeled Data [article]

Jongjin Park, Sukmin Yun, Jongheon Jeong, Jinwoo Shin
2022 arXiv   pre-print
Specifically, we first observe that the out-of-class samples in the open-set unlabeled dataset can be identified effectively via self-supervised contrastive learning.  ...  However, unlabeled data may include out-of-class samples in practice; those that cannot have one-hot encoded labels from a closed-set of classes in label data, i.e. unlabeled data is an open-set.  ...  We observe that OpenCoS still consistently improves FixMatch-ft on the ViT encoder pre-trained via DINO. For example, Open-CoS improves the test accuracy on Food to 72.00% from FixMatch-ft of 66.95%.  ... 
arXiv:2107.08943v2 fatcat:yeimauemc5dd5ezaho2qz2hgxu

Boosting-GNN: Boosting Algorithm for Graph Networks on Imbalanced Node Classification

Shuhao Shi, Kai Qiao, Shuai Yang, Linyuan Wang, Jian Chen, Bin Yan
2021 Frontiers in Neurorobotics  
synthetic imbalanced datasets, with an average performance improvement of 4.5%.  ...  Traditional methods such as resampling, reweighting, and synthetic samples that deal with imbalanced datasets are no longer applicable in GNN.  ...  Ensemble learning methods are more effective in improving the classification performance of imbalanced data than data sampling techniques (Khoshgoftaar et al., 2015).  ... 
doi:10.3389/fnbot.2021.775688 pmid:34899230 pmcid:PMC8655128 fatcat:pphvv2e3xnhwbeh4pt3xqacxbm

Domain-Aware Contrastive Knowledge Transfer for Multi-domain Imbalanced Data [article]

Zixuan Ke, Mohammad Kachuee, Sungjin Lee
2022 arXiv   pre-print
In many real-world machine learning applications, samples belong to a set of domains e.g., for product reviews each review belongs to a product category.  ...  We evaluated the performance of DCMI on three different datasets showing significant improvements in different MIL scenarios.  ...  In many real-world scenarios, data naturally belongs to a set of domains e.g., for an online store, a potential domain assignment for each customer review can be defined based on the corresponding store  ... 
arXiv:2204.01916v1 fatcat:ptb5vgok5zc7fd3yvbxmuxeugu

On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques [article]

David Illes
2022 arXiv   pre-print
The former is not a true representation of reality, where benign samples significantly outnumber malware, and the latter is approach is known to be problematic for imbalanced problems.  ...  The purpose of this project was to collect and analyse data about the comparability and real-life applicability of published results focusing on Microsoft Windows malware, more specifically the impact  ...  if good accuracy on a balanced dataset reliably translates to good performance in a real-world (highly imbalanced) setting.  ... 
arXiv:2206.06256v1 fatcat:rgsbohaw7za53ftylj4hlwxsi4

DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning [article]

Youngtaek Oh, Dong-Jin Kim, In So Kweon
2022 arXiv   pre-print
reliably improves SSL learners with unlabeled data especially when both (1) class imbalance and (2) distribution mismatch dominate.  ...  The capability of the traditional semi-supervised learning (SSL) methods is far from real-world application due to severely biased pseudo-labels caused by (1) class imbalance and (2) class distribution  ...  Acknowledgements This research was supported by the National Research Foundation of Korea (NRF)'s program of developing and demonstrating innovative products based on public demand funded by the Korean  ... 
arXiv:2106.05682v2 fatcat:jv4ap75qfzcurfhtrvg3jm2ujq

Proceedings of the IJCAI 2017 Workshop on Learning in the Presence of Class Imbalance and Concept Drift (LPCICD'17) [article]

Shuo Wang, Leandro L. Minku, Nitesh Chawla, Xin Yao
2017 arXiv   pre-print
With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues.  ...  Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories.  ...  Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computa- tional Intelligence).  ... 
arXiv:1707.09425v1 fatcat:zm6endqlbzfcldpxlreu72t4ya

Few-Shot Learning with Class Imbalance [article]

Mateusz Ochal, Massimiliano Patacchiola, Amos Storkey, Jose Vazquez, Sen Wang
2021 arXiv   pre-print
Few-Shot Learning (FSL) algorithms are commonly trained through Meta-Learning (ML), which exposes models to batches of tasks sampled from a meta-dataset to mimic tasks seen during evaluation.  ...  Our analysis compares 10 state-of-the-art meta-learning and FSL methods on different imbalance distributions and rebalancing techniques.  ...  Most methods perform worse on the imbalanced tasks, despite having the same or higher number of support samples. Fig. 4 . 4 Comparing imbalance levels via support sets of different size.  ... 
arXiv:2101.02523v2 fatcat:6oxq2ax6zrhtjpt7fwocedfa3y

Robust Domain Adaptation for Relation Extraction via Clustering Consistency

Minh Luan Nguyen, Ivor W. Tsang, Kian Ming A. Chai, Hai Leong Chieu
2014 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain.  ...  Our framework leverages on both labeled and unlabeled data in the target domain.  ...  In this work, we study how to learn the target prediction using only a few seed instances, while dealing with negative transfer and imbalanced relation distribution explicitly.  ... 
doi:10.3115/v1/p14-1076 dblp:conf/acl/NguyenTCC14 fatcat:qqxzaooykndhneeb6kekbluvpq

Improving Detection of False Data Injection Attacks Using Machine Learning with Feature Selection and Oversampling

Ajit Kumar, Neetesh Saxena, Souhwan Jung, Bong Jun Choi
2021 Energies  
Hence, in this paper, we investigate the effectiveness of machine-learning algorithms in detecting False Data Injection Attacks (FDIAs).  ...  In particular, the injection of false data and commands into communication is one of the most common and fatal cyberattacks in critical infrastructures.  ...  Data from field devices are available to SCADA via PLCs and transferred to the Historian for analysis.  ... 
doi:10.3390/en15010212 fatcat:7zvo767f7nexvi6ytqyv5uokcy

One-Round Active Learning [article]

Tianhao Wang, Si Chen, Ruoxi Jia
2021 arXiv   pre-print
Specifically, we propose DULO, a data-driven framework for one-round active learning, wherein we learn a model to predict the model performance for a given dataset and then leverage this model to guide  ...  In this work, we initiate the study of one-round active learning, which aims to select a subset of unlabeled data points that achieve the highest model performance after being labeled with only the information  ...  In addition, [29] does not evaluate the robustness of their approach on noisy or imbalanced data, which are common in real-world applications.  ... 
arXiv:2104.11843v2 fatcat:fpra3qzj75fe7gacku5dxk25l4

Intent Classification of Short-Text on Social Media

Hemant Purohit, Guozhu Dong, Valerie Shalin, Krishnaprasad Thirunarayan, Amit Sheth
2015 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity)  
Social media platforms facilitate the emergence of citizen communities that discuss real-world events.  ...  Our results show a significant absolute gain up to 7% in the F1 score relative to a baseline using bottom-up processing alone, within the popular multiclass frameworks of One-vs-One and One-vs-All.  ...  We also acknowledge our colleagues at Kno.e.sis Center, and collaborators at OSU and QCRI for the invaluable discussion and feedback to improve our results.  ... 
doi:10.1109/smartcity.2015.75 dblp:conf/smartcity/PurohitDSTS15 fatcat:atcyriyfenbftf3eeyjb775dzu

Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data [article]

Hari Prasanna Das, Ryan Tran, Japjot Singh, Xiangyu Yue, Geoff Tison, Alberto Sangiovanni-Vincentelli, Costas J. Spanos
2021 arXiv   pre-print
As an example of downstream use of synthetic data, we show improvement in COVID-19 detection from CT scans with conditional synthetic data augmentation.  ...  We show that our method significantly outperforms existing models both on qualitative and quantitative performance, and our semi-supervised approach can efficiently synthesize conditional samples under  ...  As a downstream use of conditional synthetic data, we improve the performance of COVID-19 detectors based on CT scan data via synthetic data augmentation.  ... 
arXiv:2109.06486v1 fatcat:tn46kzbtabblrh44hf4q64zdfu

Noisy Channel Language Model Prompting for Few-Shot Text Classification [article]

Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
2022 arXiv   pre-print
We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning.  ...  are imbalanced, or generalization to unseen labels is required.  ...  Data is imbalanced or |C| is large When the training data is imbalanced, head tuning is uncompetitive, likely because the head relies too much on unconditional distributions of labels the model is exposed  ... 
arXiv:2108.04106v3 fatcat:jawyv2b4ingnjjwyy4gjwapcdy

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap [article]

Yongwei Chen, Zihao Wang, Longkun Zou, Ke Chen, Kui Jia
2022 arXiv   pre-print
quasi-balanced self-training designed for more balanced data distribution by sparsity-driven selection of pseudo labeled samples for long tailed classes.  ...  However, learning from synthetic data may not generalize to practical scenarios, where point clouds are typically incomplete, non-uniformly distributed, and noisy.  ...  Its main challenge lies in learning to construct a class-balanced self-training set with diverse samples via selection and pseudo annotation on unlabeled target samples, whose data distribution is unknown  ... 
arXiv:2203.03833v2 fatcat:icwenq3awjh25dzzaef7lylxee
« Previous Showing results 1 — 15 out of 1,601 results