Filters








11,498 Hits in 3.8 sec

Language Models are Few-Shot Learners [article]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss (+19 others)
2020 arXiv   pre-print
Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.  ...  Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches.  ...  discussions about ways to approach and evaluate bias, Harrison Edwards and Yura Burda for discussions and experimentation with in-context learning, Geoffrey Irving and Paul Christiano for early discussions of language  ... 
arXiv:2005.14165v4 fatcat:kilb2lujxfax3kgfiuotql2iyy

Language Models are Few-shot Multilingual Learners [article]

Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, Pascale Fung
2021 arXiv   pre-print
Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art  ...  We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.  ...  the few-shot learning, where the source language and target language are German and English, respectively. and the performance increases as the models are given more samples.  ... 
arXiv:2109.07684v1 fatcat:h5uejjfgebcc7ci3zfsosbmzke

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners [article]

Timo Schick, Hinrich Schütze
2021 arXiv   pre-print
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance.  ...  We show that performance similar to GPT-3 can be obtained with language models that are much "greener" in that their parameter count is several orders of magnitude smaller.  ...  Very recently, Brown et al. (2020) introduced GPT-3, a pretrained LM with an enormous 175 billion parameters, and showed that it has amazing few-shot abilities: By reformulating tasks as language modeling  ... 
arXiv:2009.07118v2 fatcat:o7bw7trdyfaipd7qrrtx6ku37u

Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis [article]

Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
2021 arXiv   pre-print
Besides strong results on 40+ languages, the framework demonstrates capabilities to adapt to new languages under extreme low-resource and even few-shot scenarios of merely 40s transcribed recording, without  ...  Exhaustive comparative and ablation studies are performed to reveal the potential of the framework for low-resource languages.  ...  a few-shot spoken language learner.  ... 
arXiv:2103.03541v2 fatcat:z7xtjh723rey3gyrzhrrepj3ea

Language Models are Few-shot Multilingual Learners

Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, Pascale Fung
2021 Proceedings of the 1st Workshop on Multilingual Representation Learning   unpublished
Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art  ...  We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.  ...  on the few-shot learning, where the source language and target language are German and English, respectively. and the performance increases as the models are given more samples.  ... 
doi:10.18653/v1/2021.mrl-1.1 fatcat:czv4znd6g5gexgduqrq6j4rgcu

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

Timo Schick, Hinrich Schütze
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance.  ...  We show that performance similar to GPT-3 can be obtained with language models that are much "greener" in that their parameter count is several orders of magnitude smaller.  ...  Exploiting cloze questions for few shot text classification and natural language inference.  ... 
doi:10.18653/v1/2021.naacl-main.185 fatcat:su6c4nod5zc3pg24ojxhvzhv24

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners [article]

Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, Huajun Chen
2022 arXiv   pre-print
Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners.  ...  This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners without any prompt engineering  ...  Hyperparameters are provided in the Appendix A.1.  ... 
arXiv:2108.13161v7 fatcat:h5a52v35h5cvxevnzlr45qnuoi

Evaluating few shot and Contrastive learning Methods for Code Clone Detection [article]

Mohamad Khajezade, Fatemeh Hendijani Fard, Mohamed S. Shehata
2022 arXiv   pre-print
Method: We assess the generalizability of the state of the art models for CCD in few shot settings (i.e., only a few samples are available for fine-tuning) by setting three scenarios: i) unseen problems  ...  Objective: The main objective of this research is to assess the ability of the CCD models as well as few shot learning algorithms for unseen programming problems and new languages (i.e., the model is not  ...  Few shot learning models are widely studied in computer vision [31, 33, 36] and there are few works conducted on using few shot learning for natural language processing [4, 45] .  ... 
arXiv:2204.07501v2 fatcat:z6bhugzq6rhgpl6jyj4wydwdhi

LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework [article]

Mengjie Zhao, Fei Mi, Yasheng Wang, Minglei Li, Xin Jiang, Qun Liu, Hinrich Schütze
2022 arXiv   pre-print
Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data.  ...  The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating.  ...  Related Work Few-shot learners in NLP.  ... 
arXiv:2112.07522v2 fatcat:yn2txs3mznh53a5ipmi3mhhvy4

Few-shot Learning with Meta Metric Learners [article]

Yu Cheng, Mo Yu, Xiaoxiao Guo, Bowen Zhou
2019 arXiv   pre-print
Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels.  ...  Few-shot Learning aims to learn classifiers for new classes with only a few training examples per class.  ...  Finally, we would like to move forward to apply the current framework in other applications, such as language modeling [19] , machine translation [20] and vision applications [21] .  ... 
arXiv:1901.09890v1 fatcat:ssekfocxqzdkle6ie7ltw3r34i

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment [article]

Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
2022 arXiv   pre-print
In this work, we empirically show that CLIP can be a strong vision-language few-shot learner by leveraging the power of language.  ...  However, after being pre-trained by language supervision from a large amount of image-caption pairs, CLIP itself should also have acquired some few-shot abilities for vision-language tasks.  ...  Acknowledgements Haoyu Song, Wei-Nan Zhang, and Ting Liu are supported by the Science and Technology Innovation 2030 Major Project of China (No.2020AAA0108605), National Natural Science Foundation of China  ... 
arXiv:2203.07190v1 fatcat:whf2ljh2mjfa5l4wsbr5dpvktq

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain [article]

Milad Moradi, Kathrin Blagec, Florian Haberl, Matthias Samwald
2021 arXiv   pre-print
However, the ability of these large language models in few-shot transfer learning has not yet been explored in the biomedical domain.  ...  However, in-domain pretraining seems not to be sufficient; novel pretraining and few-shot learning strategies are required in the biomedical NLP domain.  ...  BioBERT, in few-shot settings to figure out whether large language models are proficient few-shot learners in the biomedical domain.  ... 
arXiv:2109.02555v1 fatcat:3mmirkmyzfdmlgrij6ddro2mhu

Rapid Adaptation with Conditionally Shifted Neurons [article]

Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, Adam Trischler
2018 arXiv   pre-print
On metalearning benchmarks from the vision and language domains, models augmented with conditionally shifted neurons achieve state-of-the-art results.  ...  Few-shot Language Modeling To evaluate the effectiveness of recurrent models with conditionally shifted neurons, we ran experiments on the few-shot Penn Treebank (PTB) language modeling task introduced  ...  These results indicate that our model's few-shot language modelling capabilities far exceed those of Matching Networks (Vinyals et al., 2016) .  ... 
arXiv:1712.09926v3 fatcat:6kcgr64nlzguzcv74u3icw5ghi

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models [article]

Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren
2022 arXiv   pre-print
To solve this limitation, we study prompt-based low-resource learning of VL tasks with our proposed method, FewVLM, relatively smaller than recent few-shot learners.  ...  For FewVLM, we pre-train a sequence-to-sequence transformer model with prefix language modeling (PrefixLM) and masked language modeling (MaskedLM).  ...  Conclusion In this work, we present FEWVLM, a few-shot prompt-based learner on vision-language tasks.  ... 
arXiv:2110.08484v2 fatcat:dsfqfdvlhbenpfkgjgqlyulaba

A Hybrid Approach with Optimization and Metric-based Meta-Learner for Few-Shot Learning [article]

Duo Wang, Yu Cheng, Mo Yu, Xiaoxiao Guo, Tao Zhang
2019 arXiv   pre-print
Our meta-metric-learning approach consists of two components, a task-specific metric-based learner as a base model, and a meta-learner that learns and specifies the base model.  ...  The task-specific classifiers are required to be homogeneous-structured to ease the parameter prediction, so the meta-learning approaches could only handle few-shot learning problems where the tasks share  ...  The aforementioned deep few-shot learning models usually are applied to the so-called "k-shot, N -way" scenario, in which each few-shot learning task has the same N number of class labels and each label  ... 
arXiv:1904.03014v2 fatcat:lkdqydb5e5dyrmx5u5j7f4btoa
« Previous Showing results 1 — 15 out of 11,498 results