A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Distributional Modeling on a Diet: One-shot Word Learning from Text Only
[article]
2017
arXiv
pre-print
We test whether distributional models can do one-shot learning of definitional properties from text only. ...
Our experiments show that our model can learn properties from a single exposure when given an informative utterance. ...
Further, to the best of our knowledge, for one-shot property learning from text (only), our work has been the first attempt. ...
arXiv:1704.04550v4
fatcat:25pi5iwmabckbicyyxl3zyrbxy
Distributional model on a diet : one-shot word learning from text only
[article]
2018
We test whether distributional models can do one-shot learning of definitional properties from text only. ...
Our experiments show that our model can learn properties from a single exposure when given an informative utterance. ...
Further, to the best of our knowledge, for one-shot property learning from text (only), our work has been the first attempt. ...
doi:10.15781/t29p2wp5n
fatcat:4xo3x7jnmvdr5mzqxtz64ygxfa
Few-shot Text Classification with Distributional Signatures
[article]
2020
arXiv
pre-print
Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. ...
We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant ...
Thus, we focus on learning the connection between word importance
and distributional signatures. As a result, our model can reliably identify important features from
novel classes. ...
arXiv:1908.06039v3
fatcat:bbddbkpop5gynaloacfxnuib3q
Memory, Show the Way: Memory Based Few Shot Word Representation Learning
2018
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Distributional semantic models (DSMs) generally require sufficient examples for a word to learn a high quality representation. ...
This is in stark contrast with human who can guess the meaning of a word from one or a few referents only. ...
Each target word corresponds to only one sentence extracted from its Wikipedia definition as context. ...
doi:10.18653/v1/d18-1173
dblp:conf/emnlp/SunWZ18
fatcat:skgqcydy2zbbnlmvjufwels6mq
Language Models are Few-Shot Learners
[article]
2020
arXiv
pre-print
By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. ...
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. ...
, Geoffrey Irving and Paul Christiano for early discussions of language model scaling, Long Ouyang for advising on the design of the human evaluation experiments, Chris Hallacy for discussions on data ...
arXiv:2005.14165v4
fatcat:kilb2lujxfax3kgfiuotql2iyy
Write a Classifier: Predicting Visual Classifiers from Unstructured Text
[article]
2016
arXiv
pre-print
We finally propose a kernel function between unstructured text descriptions that builds on distributional semantics, which shows an advantage in our setting and could be useful for other applications. ...
In a machine learning context, these observations motivates us to ask whether this learning process could be computationally modeled to learn visual classifiers. ...
First, one limitation of the adopted language model is that it produces only one vector per word, which causes problems when a word has multiple meanings. ...
arXiv:1601.00025v2
fatcat:pjifqfwe6rcnpcic3jq2ags3ti
A Large-scale Attribute Dataset for Zero-shot Learning
[article]
2018
arXiv
pre-print
The experimental results reveal the challenge of implementing zero-shot learning on our dataset. ...
We analyze our dataset by conducting both supervised learning and zero-shot learning tasks. Seven state-of-the-art ZSL algorithms are tested on this new dataset. ...
In recent years, zero-shot generation methods synthesize images conditioned on attributes/texts using generative models. ...
arXiv:1804.04314v2
fatcat:7lf5sdvzc5dlrbbusquwou7z54
Make Better Choices (MBC): Study design of a randomized controlled trial testing optimal technology-supported change in multiple diet and physical activity risk behaviors
2010
BMC Public Health
They will use decision support feedback on the personal digital assistant and receive counseling from a coach to alter their diet and activity during a 3-week prescription period when payment is contingent ...
Findings will fill a gap in knowledge about optimal goal prescription to facilitate simultaneous diet and activity change. ...
Figure 1 shows a screen shot of the MBC program on a PDA. ...
doi:10.1186/1471-2458-10-586
pmid:20920275
pmcid:PMC2955698
fatcat:scxdsjffdrfy3adtiwk3y6lz74
oLMpics-On What Language Model Pre-training Captures
2020
Transactions of the Association for Computational Linguistics
To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple ...
manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely ...
Controls Comparing learning curves tells us which model learns from fewer examples. ...
doi:10.1162/tacl_a_00342
fatcat:eghd7glhlngsdmgth4jwbu2afq
Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models
[article]
2020
arXiv
pre-print
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. ...
Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to ...
Distri-
butional modeling on a diet: One-shot word learning No Modifier PP Modifier RC Modifier ...
arXiv:2010.05725v1
fatcat:owcdzh3ndrac3gi47phlsffqdq
oLMpics – On what Language Model Pre-training Captures
[article]
2020
arXiv
pre-print
To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple ...
manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely ...
a model is compared to a learning curve when words are associated with random behaviour. ...
arXiv:1912.13283v2
fatcat:cto4p3jcnrazhcgqwavhuur4ny
Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
[article]
2021
arXiv
pre-print
heuristics based on lexical overlap, e.g., models incorrectly assuming a sentence pair is of the same meaning because they consist of the same set of words. ...
Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem. ...
Automatically identifying words that can
serve as labels for few-shot text classification. ...
arXiv:2109.04144v1
fatcat:jsozuwbm5vf5vdc2mfytvh5npe
A Unified Feature Representation for Lexical Connotations
[article]
2020
arXiv
pre-print
on the task of stance detection when data is limited. ...
Ideological attitudes and stance are often expressed through subtle meanings of words and phrases. ...
First, it can generate a representation for a word in a zero-shot manner from only a few dictionary definitions, rather than the thousands of examples of contextual use required by standard word-embedding ...
arXiv:2006.00635v1
fatcat:dahlqkgjwrcm5dxapmmksx226q
Coherence boosting: When your pretrained language model is not paying enough attention
[article]
2022
arXiv
pre-print
We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. ...
We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. ...
If the text generated so far is 1 2 . . . , the distribution from which the next word +1 is sampled is ( 1 , . . . , ; ) -only the ensemble member using full context is used. ...
arXiv:2110.08294v2
fatcat:ggittgqw5farnksaz7ggswlp7i
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification
[article]
2021
arXiv
pre-print
However, their applicability to data augmentation for text classification tasks in few-shot settings have not been fully explored, especially for specialised domains. ...
Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. ...
In order to learn domain-specific word embedding models we used the corresponding training sets for each dataset by using fastText's skipgram model . ...
arXiv:2111.09064v1
fatcat:hyncpcigfveofjwuzhb4t7r76m
« Previous
Showing results 1 — 15 out of 5,514 results