Filters








597 Hits in 4.2 sec

Scaling Language Models: Methods, Analysis Insights from Training Gopher [article]

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan (+68 others)
2022 arXiv   pre-print
In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter  ...  We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity.  ...  While we show modest success in the compression of these models, resulting in small shifts in the scaling curves, on the whole, none of the methods we explore are remarkably successful.  ... 
arXiv:2112.11446v2 fatcat:wtajhbesibbetikkpow2vwiwqq

PaLM: Scaling Language Modeling with Pathways [article]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi (+55 others)
2022 arXiv   pre-print
We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale.  ...  A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model.  ...  We also thank Lucas Dixon, Ellen Jiang, and Tolga Bolukbasi for their support in model serving.  ... 
arXiv:2204.02311v3 fatcat:ewsbnc6tqrfffounsqlr7utdzm

On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model [article]

Seongjin Shin, Sang-Woo Lee, Hwijeen Ahn, Sungdong Kim, HyoungSeok Kim, Boseop Kim, Kyunghyun Cho, Gichang Lee, Woomyoung Park, Jung-Woo Ha, Nako Sung
2022 arXiv   pre-print
Many recent studies on large-scale language models have reported successful in-context zero- and few-shot learning ability.  ...  necessarily determine the emergence of in-context learning, (2) in-context learning ability can emerge when a language model is trained on a combination of multiple corpora, even when each corpus does  ...  Although we leave the validation on larger-scale models, such as tens of billion parameters, to future work, our analysis presents a hint to effectively training LMs with smaller corpora.  ... 
arXiv:2204.13509v2 fatcat:37bg5pocyfgi3m7rde54czopr4

Internet-augmented language models through few-shot prompting for open-domain question answering [article]

Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, Nikolai Grigorev
2022 arXiv   pre-print
Motivated by semi-parametric language models, which ground their decisions in external retrieved evidence, we use few-shot prompting to learn to condition language models on information returned from the  ...  In this work, we aim to capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges with respect to grounding to factual and up-to-date information  ...  Finally, we would like to thank our colleagues in the DeepMind Language team for their insightful comments and suggestions.  ... 
arXiv:2203.05115v1 fatcat:rmskgvwiibd45fri2w3c52svdy

Red Teaming Language Models with Language Models [article]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, Geoffrey Irving
2022 arXiv   pre-print
Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict ways.  ...  We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty.  ...  DPG often discusses itself (I, me) and often in a self-aware way, using terms from its prompt used to describe it (Gopher, my creators, an AI a language model).  ... 
arXiv:2202.03286v1 fatcat:ogptxm22d5e37bzpyv7cizarp4

Predictability and Surprise in Large Generative Models [article]

Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen (+18 others)
2022 arXiv   pre-print
Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many others.  ...  Namely, these generative models have an unusual combination of predictable loss on a broad training distribution (as embodied in their "scaling laws"), and unpredictable specific capabilities, inputs,  ...  A.7 AI and Compute Analysis Details We leverage data from existing work on estimating compute usage for training large-scale AI models 40 which was recently complemented with additional data from more  ... 
arXiv:2202.07785v1 fatcat:plhyyc5l2vgwripvzfem5dojzi

Teaching language models to support answers with verified quotes [article]

Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham, Geoffrey Irving, Nat McAleese
2022 arXiv   pre-print
Recent large language models often answer factual questions correctly.  ...  But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense.  ...  modelling software ecosystem, and particularly Doug Fritz for developing a frontend framework with which our human evaluation apps were built.  ... 
arXiv:2203.11147v1 fatcat:xcyia7pag5ayxmbnhbvjkzyrc4

Scaling Law for Recommendation Models: Towards General-purpose User Representations [article]

Kyuyong Shin, Hanock Kwak, Su Young Kim, Max Nihlen Ramstrom, Jisu Jeong, Jung-Woo Ha, Kyung-Min Kim
2022 arXiv   pre-print
Unlike vision recognition and language models, studies on general-purpose user representation at scale still remain underexplored.  ...  Recent advancement of large-scale pretrained models such as BERT, GPT-3, CLIP, and Gopher, has shown astonishing achievements across various task domains.  ...  Acknowledgements The authors would like to thank the NAVER CLOVA ML X team for insightful comments and discussions.  ... 
arXiv:2111.11294v3 fatcat:ywcd5hvfbfa3djdj5wmapdexym

Improving language models by retrieving from trillions of tokens [article]

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy (+16 others)
2022 arXiv   pre-print
Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.  ...  We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens.  ...  We show that retrieving based on a pre-trained frozen B model ( §2.3) works at scale, removing the need for training and updating a retriever network. • We show that our method scales well with model size  ... 
arXiv:2112.04426v3 fatcat:gr2zeribfzh2jhpufbvga65lge

Can language models learn from explanations in context? [article]

Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill
2022 arXiv   pre-print
However, only large models can benefit from explanations. In summary, explanations can support the in-context learning abilities of large language models on  ...  Large language models can perform new tasks by adapting to a few in-context examples. For humans, rapid learning from examples can benefit from explanations that connect examples to task principles.  ...  Acknowledgements We thank Dani Yogatama and Neil Rabinowitz for helpful comments and suggestions, as well as the team that trained the language models.  ... 
arXiv:2204.02329v1 fatcat:iwqbt3lpmnahtcspl2vmig5zom

Flamingo: a Visual Language Model for Few-Shot Learning [article]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford (+15 others)
2022 arXiv   pre-print
Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot  ...  We introduce Flamingo, a family of Visual Language Models (VLM) with this ability.  ...  In this work, we scale Flamingo models up to 80B parameters and provide some initial insights on their scaling behaviour across evaluation benchmarks, summarized in Figure 9 .  ... 
arXiv:2204.14198v1 fatcat:5f4uhdmaibhm7cn3zetspjev3q

Fine-scale mapping of a grassland from digitized aerial photography: An approach using image segmentation and discriminant analysis

A. Lobo, K. Moloney, N. Chiariello
1998 International Journal of Remote Sensing  
Conventional methods of classi® cation from remotely-sensed images seldom discriminate accurately among the land cover categories that are relevant in ecological applications.  ...  O., and Johnson, P., 1986, Spectral mixture modeling, a new analysis of the Viking Lander 1 site.  ...  This work was supported in part through grants from the Andrew W.  ... 
doi:10.1080/014311698216431 fatcat:2xrimyviwref3a3lic45owuomq

Stalker, A Multilingual Text Mining Search Engine for Open Source Intelligence

Federico Neri, Massimo Pettoni
2008 2008 12th International Conference Information Visualisation  
STALKER provides with a language independent search and dynamic classification features for a broad range of data collected from several sources in a number of culturally diverse languages.  ...  The process of accessing all these raw data, heterogeneous in terms of source and language, and transforming them into information is therefore strongly linked to automatic textual analysis and synthesis  ...  The bayesian method was used as the learning method: the probabilist classification model was built on around 1.000 documents.  ... 
doi:10.1109/iv.2008.9 dblp:conf/iv/NeriP08 fatcat:tjcmpifkbjg6pli3ydltdoa7qa

Predicting Optimal Sites for Ecosystem Restoration Using Stacked-Species Distribution Modeling

Amanda J. Zellmer, Jeremy T. Claisse, Chelsea M. Williams, Stuart Schwab, Daniel J. Pondella
2019 Frontiers in Marine Science  
The stacked-species distribution model provides insight for marine restoration projects in southern California specifically, but more generally this method can also be widely applied to other types of  ...  The predicted richness from this linear model was associated with observed species richness when considering only the focal species on manmade reefs (linear model: slope = 0.52, 95% CI = 0.13-0.92, w =  ...  the R programming language (R Core Team, 2015).  ... 
doi:10.3389/fmars.2019.00003 fatcat:ukmmg2ngwzacvac3de3tv2vwsa

What Are the General Principles of Cognition?

Avishai Henik
1982 Contemporary Psychology  
The chapter on short-term memory discusses the decay versus in- terference debate, capacity limitations, and the insights that may be gained from the comparison of expert and nonexpert performance in various  ...  For example, multi- dimensional scaling is exemplified during a discussion of categorization.  ... 
doi:10.1037/020905 fatcat:25vhmgmwljfjnoth7ysxshnm3q
« Previous Showing results 1 — 15 out of 597 results