Filters








738 Hits in 9.8 sec

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [article]

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt
2022 arXiv   pre-print
We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness.  ...  Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups."  ...  Eysenbach, David Fleet, Pieter-Jan Kindermans, Mohammad Norouzi, Sarah Pratt and Vivek Ramanujan for helpful discussions and draft feedback, Lucas Beyer and Xiaohua Zhai for assistance with ViT-G/14 fine-tuning  ... 
arXiv:2203.05482v3 fatcat:hirw3ny7unfarmfmemny7wgya4

Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene [article]

Suryansh Kumar, Yuchao Dai, Hongdong Li
2019 arXiv   pre-print
Consequently, our model of a dynamic scene reduces to a soup of planar structures and rigid motion of these local planar structures.  ...  The prevailing idea to solve this task is composed of a sequence of steps and is dependent on the success of several pipelines in its execution.  ...  to infer distinct motion models of multiple rigidly moving object in the scene.  ... 
arXiv:1911.09092v1 fatcat:elfkvqnrmbbybcf7pcbymvdwwy

AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [article]

Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
2022 arXiv   pre-print
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.  ...  By only tuning 0.23% of a pre-trained language model's parameters, our model outperforms the full model fine-tuning performance and several competing methods.  ...  Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. arXiv preprint arXiv:2203.05482, 2022.  ... 
arXiv:2205.12410v1 fatcat:5ervotcw2zgyxb6erwyoichu4i

Smartening the crowds

Gang Liu, Guang Xiang, Bryan A. Pendleton, Jason I. Hong, Wenyin Liu
2011 Proceedings of the Seventh Symposium on Usable Privacy and Security - SOUPS '11  
Furthermore, we present our experiments using clustering techniques and vote weighting to improve the results of human effort in fighting phishing.  ...  Using tasks posted to the Amazon Mechanical Turk human effort market, we measure the accuracy of minimally trained humans in identifying potential phish, and consider methods for best taking advantage  ...  As  increases, the accuracy first increases a little and then drops down quickly while the average time cost increases in a small range.  ... 
doi:10.1145/2078827.2078838 dblp:conf/soups/LiuXPHL11 fatcat:zqdankxpmnfi5iuoxoxl7o5sim

Merging Models with Fisher-Weighted Averaging [article]

Michael Matena, Colin Raffel
2022 arXiv   pre-print
We first show that our "Fisher merging" technique provides a performance boost in settings where simple parameter averaging is currently used -- specifically, robust fine-tuning and model ensembling.  ...  Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities.  ...  [67] introduced the "Model Soup" approach where fine-tuned models with different hyperparameter settings are averaged to improve performance.  ... 
arXiv:2111.09832v2 fatcat:n6bosnqxvffx5ka2tsj7rdhee4

PaLI: A Jointly-Scaled Multilingual Language-Image Model [article]

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver (+17 others)
2022 arXiv   pre-print
PaLI (Pathways Language and Image model) extends this approach to the joint modeling of language and vision.  ...  PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.  ...  The only other difference is that we apply learning rate cool-down twice, once with and once without inception crop augmentation, and average ("soup") the weights of the two models as in Wortsman et al  ... 
arXiv:2209.06794v2 fatcat:uztv6y57cjabvg2kzravpkswqe

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, Boris Katz
2019 Neural Information Processing Systems  
Controls make ObjectNet robust to fine-tuning showing only small performance increases.  ...  This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications.  ...  We would like to thank the members of CBMM, particularly the postdoc group, for many wonderful and productive discussions.  ... 
dblp:conf/nips/BarbuMALWGTK19 fatcat:zzi7ecnce5dfbnkemjcrombqqe

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost [article]

Lu Yin, Shiwei Liu, Fang Meng, Tianjin Huang, Vlado Menkovski, Mykola Pechenizkiy
2022 arXiv   pre-print
Ensemble, in parallel, is one of the oldest time-proven tricks in machine learning to improve performance by combining the output of multiple independent models.  ...  In this work, we first observe that directly averaging the weights of the adjacent learned subnetworks significantly boosts the performance of LTs.  ...  Acknowledgments This work used the Dutch national einfrastructure with the support of the SURF Cooperative using grant no. NWO-2021.060.  ... 
arXiv:2208.10842v2 fatcat:p6cqdm7u3zhojoiajji5hpc7ci

Out of Context: A New Clue for Context Modeling of Aspect-based Sentiment Analysis [article]

Bowen Xing, Ivor W. Tsang
2022 arXiv   pre-print
And the weighted sum of context hidden states is used as the final representation fed to the classifier.  ...  However, the information related to the given aspect may be already discarded and adverse information may be retained in the context modeling processes of existing models.  ...  Models Training Time (per epoch) Inference Time (per sample) GPU Memory Acc RAM (Bi-AA) + AAGCN 4.8s 0.7ms 1.2G 77.3% RAM + AAGCN + AABERT3 14.4s 2ms 7.2G 81.2% Improvement of BERT 300% 300% 600% 5%  ... 
arXiv:2106.10816v2 fatcat:xkwguhqxxfgbzasn7kxoiifn6q

Classification Aware Neural Topic Model and its Application on a New COVID-19 Disinformation Corpus [article]

Xingyi Song, Johann Petrak, Ye Jiang, Iknoor Singh, Diana Maynard, Kalina Bontcheva
2020 arXiv   pre-print
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide.  ...  We demonstrate that CANTM efficiently improves classification performance with low resources, and is scalable.  ...  Our model CANTM shows almost 5% increase in accuracy and more than 1% F-1 further improvement over BERT.  ... 
arXiv:2006.03354v1 fatcat:tmwveuw7vng73jsua2gqhnumti

MobiVQA

Qingqing Cao, Prerna Khanna, Nicholas D. Lane, Aruna Balasubramanian
2022 Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies  
However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy.  ...  We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models  ...  The shorter training time during fine-tuning forces the VQA model to learn the mapping (i.e. adjusting quickly to fit the task labels) between grid and region features for the specific VQA task.  ... 
doi:10.1145/3534619 fatcat:malh4ljosjdixddfpjlofah6v4

Flamingo: a Visual Language Model for Few-Shot Learning [article]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford (+15 others)
2022 arXiv   pre-print
On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.  ...  an event, and close-ended tasks such as multiple choice visual question-answering.  ...  Acknowledgements We would like to thank many of our colleagues for useful discussions, suggestions, feedback, and advice, including: Relja Arandjelović, Kareem Ayoub, Lorrayne Bennett, Adria Recasens Continente  ... 
arXiv:2204.14198v1 fatcat:5f4uhdmaibhm7cn3zetspjev3q

MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization [article]

Eric Chu, Peter J. Liu
2019 arXiv   pre-print
We show through automated metrics and human evaluation that the generated summaries are highly abstractive, fluent, relevant, and representative of the average sentiment of the input reviews.  ...  Our proposed model consists of an auto-encoder where the mean of the representations of the input reviews decodes to a reasonable summary-review while not relying on any review-specific features.  ...  ACKNOWLEDGMENTS We thank Kai Chen, Trieu Trinh, David Grangier, and Jie Ren for helpful comments on the manuscript, and Jeff Dean, Samy Bengio, Claire Cui, and Deb Roy for their support of this work.  ... 
arXiv:1810.05739v4 fatcat:l63eyf5qqfbgpgiqpcvbp2ycfa

Out of Context: A New Clue for Context Modeling of Aspect-based Sentiment Analysis

Bowen Xing, Ivor W. Tsang
2022 The Journal of Artificial Intelligence Research  
And the weighted sum of context hidden states is used as the final representation fed to the classifier.  ...  However, the information related to the given aspect may be already discarded and adverse information may be retained in the context modeling processes of existing models.  ...  Models Training Time (per epoch) Inference Time (per sample) GPU Memory Acc RAM (Bi-AA) + AAGCN 4.8s 0.7ms 1.2G 77.3% RAM + AAGCN + AABERT3 14.4s 2ms 7.2G 81.2% Improvement of BERT 300% 300% 600% 5%  ... 
doi:10.1613/jair.1.13410 fatcat:gus27fk7mbb6fekik67cn6qoaa

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation [article]

Junnan Li, Ramprasaath R. Selvaraju, Akhilesh Deepak Gotmare, Shafiq Joty, Caiming Xiong, Steven Hoi
2021 arXiv   pre-print
On VQA and NLVR^2, ALBEF achieves absolute improvements of 2.37 while enjoying faster inference speed. Code and pre-trained models are available at https://github.com/salesforce/ALBEF/.  ...  ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks.  ...  We fine-tune the model for 10 epochs, using a batch size of 128 and an initial learning rate of 2e −5 .  ... 
arXiv:2107.07651v2 fatcat:o7pwaj3b5bhpffklhp2obbxr7m
« Previous Showing results 1 — 15 out of 738 results