Filters








5,514 Hits in 8.4 sec

Distributional Modeling on a Diet: One-shot Word Learning from Text Only [article]

Su Wang, Stephen Roller, Katrin Erk
2017 arXiv   pre-print
We test whether distributional models can do one-shot learning of definitional properties from text only.  ...  Our experiments show that our model can learn properties from a single exposure when given an informative utterance.  ...  Further, to the best of our knowledge, for one-shot property learning from text (only), our work has been the first attempt.  ... 
arXiv:1704.04550v4 fatcat:25pi5iwmabckbicyyxl3zyrbxy

Distributional model on a diet : one-shot word learning from text only [article]

Su Wang
2018
We test whether distributional models can do one-shot learning of definitional properties from text only.  ...  Our experiments show that our model can learn properties from a single exposure when given an informative utterance.  ...  Further, to the best of our knowledge, for one-shot property learning from text (only), our work has been the first attempt.  ... 
doi:10.15781/t29p2wp5n fatcat:4xo3x7jnmvdr5mzqxtz64ygxfa

Few-shot Text Classification with Distributional Signatures [article]

Yujia Bao, Menghua Wu, Shiyu Chang, Regina Barzilay
2020 arXiv   pre-print
Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns.  ...  We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant  ...  Thus, we focus on learning the connection between word importance and distributional signatures. As a result, our model can reliably identify important features from novel classes.  ... 
arXiv:1908.06039v3 fatcat:bbddbkpop5gynaloacfxnuib3q

Memory, Show the Way: Memory Based Few Shot Word Representation Learning

Jingyuan Sun, Shaonan Wang, Chengqing Zong
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
Distributional semantic models (DSMs) generally require sufficient examples for a word to learn a high quality representation.  ...  This is in stark contrast with human who can guess the meaning of a word from one or a few referents only.  ...  Each target word corresponds to only one sentence extracted from its Wikipedia definition as context.  ... 
doi:10.18653/v1/d18-1173 dblp:conf/emnlp/SunWZ18 fatcat:skgqcydy2zbbnlmvjufwels6mq

Language Models are Few-Shot Learners [article]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss (+19 others)
2020 arXiv   pre-print
By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do.  ...  Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task.  ...  , Geoffrey Irving and Paul Christiano for early discussions of language model scaling, Long Ouyang for advising on the design of the human evaluation experiments, Chris Hallacy for discussions on data  ... 
arXiv:2005.14165v4 fatcat:kilb2lujxfax3kgfiuotql2iyy

Write a Classifier: Predicting Visual Classifiers from Unstructured Text [article]

Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh
2016 arXiv   pre-print
We finally propose a kernel function between unstructured text descriptions that builds on distributional semantics, which shows an advantage in our setting and could be useful for other applications.  ...  In a machine learning context, these observations motivates us to ask whether this learning process could be computationally modeled to learn visual classifiers.  ...  First, one limitation of the adopted language model is that it produces only one vector per word, which causes problems when a word has multiple meanings.  ... 
arXiv:1601.00025v2 fatcat:pjifqfwe6rcnpcic3jq2ags3ti

A Large-scale Attribute Dataset for Zero-shot Learning [article]

Bo Zhao, Yanwei Fu, Rui Liang, Jiahong Wu, Yonggang Wang, Yizhou Wang
2018 arXiv   pre-print
The experimental results reveal the challenge of implementing zero-shot learning on our dataset.  ...  We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset.  ...  In recent years, zero-shot generation methods synthesize images conditioned on attributes/texts using generative models.  ... 
arXiv:1804.04314v2 fatcat:7lf5sdvzc5dlrbbusquwou7z54

Make Better Choices (MBC): Study design of a randomized controlled trial testing optimal technology-supported change in multiple diet and physical activity risk behaviors

Bonnie Spring, Kristin Schneider, HG McFadden, Jocelyn Vaughn, Andrea T Kozak, Malaina Smith, Arlen C Moller, Leonard Epstein, Stephanie W Russell, Andrew DeMott, Donald Hedeker
2010 BMC Public Health  
They will use decision support feedback on the personal digital assistant and receive counseling from a coach to alter their diet and activity during a 3-week prescription period when payment is contingent  ...  Findings will fill a gap in knowledge about optimal goal prescription to facilitate simultaneous diet and activity change.  ...  Figure 1 shows a screen shot of the MBC program on a PDA.  ... 
doi:10.1186/1471-2458-10-586 pmid:20920275 pmcid:PMC2955698 fatcat:scxdsjffdrfy3adtiwk3y6lz74

oLMpics-On What Language Model Pre-training Captures

Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant
2020 Transactions of the Association for Computational Linguistics  
To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple  ...  manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely  ...  Controls Comparing learning curves tells us which model learns from fewer examples.  ... 
doi:10.1162/tacl_a_00342 fatcat:eghd7glhlngsdmgth4jwbu2afq

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models [article]

Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy, Miguel Ballesteros
2020 arXiv   pre-print
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts.  ...  Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to  ...  Distri- butional modeling on a diet: One-shot word learning No Modifier PP Modifier RC Modifier  ... 
arXiv:2010.05725v1 fatcat:owcdzh3ndrac3gi47phlsffqdq

oLMpics – On what Language Model Pre-training Captures [article]

Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant
2020 arXiv   pre-print
To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple  ...  manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely  ...  a model is compared to a learning curve when words are associated with random behaviour.  ... 
arXiv:1912.13283v2 fatcat:cto4p3jcnrazhcgqwavhuur4ny

Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [article]

Prasetya Ajie Utama, Nafise Sadat Moosavi, Victor Sanh, Iryna Gurevych
2021 arXiv   pre-print
heuristics based on lexical overlap, e.g., models incorrectly assuming a sentence pair is of the same meaning because they consist of the same set of words.  ...  Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem.  ...  Automatically identifying words that can serve as labels for few-shot text classification.  ... 
arXiv:2109.04144v1 fatcat:jsozuwbm5vf5vdc2mfytvh5npe

A Unified Feature Representation for Lexical Connotations [article]

Emily Allaway, Kathleen McKeown
2020 arXiv   pre-print
on the task of stance detection when data is limited.  ...  Ideological attitudes and stance are often expressed through subtle meanings of words and phrases.  ...  First, it can generate a representation for a word in a zero-shot manner from only a few dictionary definitions, rather than the thousands of examples of contextual use required by standard word-embedding  ... 
arXiv:2006.00635v1 fatcat:dahlqkgjwrcm5dxapmmksx226q

Coherence boosting: When your pretrained language model is not paying enough attention [article]

Nikolay Malkin, Zhen Wang, Nebojsa Jojic
2022 arXiv   pre-print
We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction.  ...  We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses.  ...  If the text generated so far is 1 2 . . . , the distribution from which the next word +1 is sampled is ( 1 , . . . , ; ) -only the ensemble member using full context is used.  ... 
arXiv:2110.08294v2 fatcat:ggittgqw5farnksaz7ggswlp7i

Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification [article]

Aleksandra Edwards, Asahi Ushio, Jose Camacho-Collados, Hélène de Ribaupierre, Alun Preece
2021 arXiv   pre-print
However, their applicability to data augmentation for text classification tasks in few-shot settings have not been fully explored, especially for specialised domains.  ...  Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity.  ...  In order to learn domain-specific word embedding models we used the corresponding training sets for each dataset by using fastText's skipgram model .  ... 
arXiv:2111.09064v1 fatcat:hyncpcigfveofjwuzhb4t7r76m
« Previous Showing results 1 — 15 out of 5,514 results