A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Scaling Language Models: Methods, Analysis Insights from Training Gopher
[article]
2022
arXiv
pre-print
In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter ...
We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. ...
While we show modest success in the compression of these models, resulting in small shifts in the scaling curves, on the whole, none of the methods we explore are remarkably successful. ...
arXiv:2112.11446v2
fatcat:wtajhbesibbetikkpow2vwiwqq
PaLM: Scaling Language Modeling with Pathways
[article]
2022
arXiv
pre-print
We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. ...
A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. ...
We also thank Lucas Dixon, Ellen Jiang, and Tolga Bolukbasi for their support in model serving. ...
arXiv:2204.02311v3
fatcat:ewsbnc6tqrfffounsqlr7utdzm
On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model
[article]
2022
arXiv
pre-print
Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability. ...
necessarily determine the emergence of in-context learning, (2) in-context learning ability can emerge when a language model is trained on a combination of multiple corpora, even when each corpus does ...
Although we leave the validation on larger-scale models, such as tens of billion parameters, to future work, our analysis presents a hint to effectively training LMs with smaller corpora. ...
arXiv:2204.13509v2
fatcat:37bg5pocyfgi3m7rde54czopr4
Internet-augmented language models through few-shot prompting for open-domain question answering
[article]
2022
arXiv
pre-print
Motivated by semi-parametric language models, which ground their decisions in external retrieved evidence, we use few-shot prompting to learn to condition language models on information returned from the ...
In this work, we aim to capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges with respect to grounding to factual and up-to-date information ...
Finally, we would like to thank our colleagues in the DeepMind Language team for their insightful comments and suggestions. ...
arXiv:2203.05115v1
fatcat:rmskgvwiibd45fri2w3c52svdy
Red Teaming Language Models with Language Models
[article]
2022
arXiv
pre-print
Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict ways. ...
We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty. ...
DPG often discusses itself (I, me) and often in a self-aware way, using terms from its prompt used to describe it (Gopher, my creators, an AI a language model). ...
arXiv:2202.03286v1
fatcat:ogptxm22d5e37bzpyv7cizarp4
Predictability and Surprise in Large Generative Models
[article]
2022
arXiv
pre-print
Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others. ...
Namely, these generative models have an unusual combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs, ...
A.7 AI and Compute Analysis Details We leverage data from existing work on estimating compute usage for training large-scale AI models 40 which was recently complemented with additional data from more ...
arXiv:2202.07785v1
fatcat:plhyyc5l2vgwripvzfem5dojzi
Teaching language models to support answers with verified quotes
[article]
2022
arXiv
pre-print
Recent large language models often answer factual questions correctly. ...
But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense. ...
modelling software ecosystem, and particularly Doug Fritz for developing a frontend framework with which our human evaluation apps were built. ...
arXiv:2203.11147v1
fatcat:xcyia7pag5ayxmbnhbvjkzyrc4
Scaling Law for Recommendation Models: Towards General-purpose User Representations
[article]
2022
arXiv
pre-print
Unlike vision recognition and language models, studies on general-purpose user representation at scale still remain underexplored. ...
Recent advancement of large-scale pretrained models such as BERT, GPT-3, CLIP, and Gopher, has shown astonishing achievements across various task domains. ...
Acknowledgements The authors would like to thank the NAVER CLOVA ML X team for insightful comments and discussions. ...
arXiv:2111.11294v3
fatcat:ywcd5hvfbfa3djdj5wmapdexym
Improving language models by retrieving from trillions of tokens
[article]
2022
arXiv
pre-print
Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. ...
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. ...
We show that retrieving based on a pre-trained frozen B model ( §2.3) works at scale, removing the need for training and updating a retriever network. • We show that our method scales well with model size ...
arXiv:2112.04426v3
fatcat:gr2zeribfzh2jhpufbvga65lge
Can language models learn from explanations in context?
[article]
2022
arXiv
pre-print
However, only large models can benefit from explanations. In summary, explanations can support the in-context learning abilities of large language models on ...
Large language models can perform new tasks by adapting to a few in-context examples. For humans, rapid learning from examples can benefit from explanations that connect examples to task principles. ...
Acknowledgements We thank Dani Yogatama and Neil Rabinowitz for helpful comments and suggestions, as well as the team that trained the language models. ...
arXiv:2204.02329v1
fatcat:iwqbt3lpmnahtcspl2vmig5zom
Flamingo: a Visual Language Model for Few-Shot Learning
[article]
2022
arXiv
pre-print
Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot ...
We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. ...
In this work, we scale Flamingo models up to 80B parameters and provide some initial insights on their scaling behaviour across evaluation benchmarks, summarized in Figure 9 . ...
arXiv:2204.14198v1
fatcat:5f4uhdmaibhm7cn3zetspjev3q
Fine-scale mapping of a grassland from digitized aerial photography: An approach using image segmentation and discriminant analysis
1998
International Journal of Remote Sensing
Conventional methods of classi® cation from remotely-sensed images seldom discriminate accurately among the land cover categories that are relevant in ecological applications. ...
O., and Johnson, P., 1986, Spectral mixture modeling, a new analysis of the Viking Lander 1 site. ...
This work was supported in part through grants from the Andrew W. ...
doi:10.1080/014311698216431
fatcat:2xrimyviwref3a3lic45owuomq
Stalker, A Multilingual Text Mining Search Engine for Open Source Intelligence
2008
2008 12th International Conference Information Visualisation
STALKER provides with a language independent search and dynamic classification features for a broad range of data collected from several sources in a number of culturally diverse languages. ...
The process of accessing all these raw data, heterogeneous in terms of source and language, and transforming them into information is therefore strongly linked to automatic textual analysis and synthesis ...
The bayesian method was used as the learning method: the probabilist classification model was built on around 1.000 documents. ...
doi:10.1109/iv.2008.9
dblp:conf/iv/NeriP08
fatcat:tjcmpifkbjg6pli3ydltdoa7qa
Predicting Optimal Sites for Ecosystem Restoration Using Stacked-Species Distribution Modeling
2019
Frontiers in Marine Science
The stacked-species distribution model provides insight for marine restoration projects in southern California specifically, but more generally this method can also be widely applied to other types of ...
The predicted richness from this linear model was associated with observed species richness when considering only the focal species on manmade reefs (linear model: slope = 0.52, 95% CI = 0.13-0.92, w = ...
the R programming language (R Core Team, 2015). ...
doi:10.3389/fmars.2019.00003
fatcat:ukmmg2ngwzacvac3de3tv2vwsa
What Are the General Principles of Cognition?
1982
Contemporary Psychology
The chapter on short-term memory discusses the decay versus in- terference debate, capacity limitations, and the insights that may be gained from the comparison of expert and nonexpert performance in various ...
For example, multi- dimensional scaling is exemplified during a discussion of categorization. ...
doi:10.1037/020905
fatcat:25vhmgmwljfjnoth7ysxshnm3q
« Previous
Showing results 1 — 15 out of 597 results