55 Hits in 7.1 sec

Memory-assisted prompt editing to improve GPT-3 after deployment [article]

Aman Madaan, Niket Tandon, Peter Clark, Yiming Yang
2022 arXiv   pre-print
Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?"  ...  Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past.  ...  Experiments Baselines We compare our system, MEM-PROMPT (memory-assisted prompt editing) with two different baselines: • NO-MEM This is the standard GPT-3 4 in few-shot prompting mode, with the suggested  ... 
arXiv:2201.06009v4 fatcat:e5yl7pi4sndy7lxncxov3tv6we

Adversarial Training for High-Stakes Reliability [article]

Daniel M. Ziegler, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis, Noa Nabeshima, Ben Weinstein-Raun, Daniel de Haas, Buck Shlegeris, Nate Thomas
2022 arXiv   pre-print
One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance.  ...  rule out the possibility of catastrophic deployment-time failures of powerful models.  ...  We are grateful to Shauna Kravec, Dane Sherburn, and Everett Smith for their contributions to parts of the project, and to Kelsey Piper for organizing a party to collect more manual adversarial examples  ... 
arXiv:2205.01663v2 fatcat:56y643h4czbcza4y72ecu4sqke

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model [article]

Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child (+8 others)
2022 arXiv   pre-print
In this paper, we first focus on the infrastructure as well as the 3D parallelism methodology used to train this model using DeepSpeed and Megatron.  ...  Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model.  ...  We achieved significant improvements compared to GPT-3 in all 3 settings, with our zero-shot performance surpassing few-shot for GPT-3.  ... 
arXiv:2201.11990v3 fatcat:bgn6ioqqlvcylhav4ulk654r34

Evaluating Large Language Models Trained on Code [article]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri (+46 others)
2021 arXiv   pre-print
On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves  ...  Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts.  ...  Finally, we thank GitHub for partnering to build GitHub Copilot and Microsoft Azure for supporting model training with infrastructure management.  ... 
arXiv:2107.03374v2 fatcat:tnan6rhwq5fsfek2jydeesgmmy

Training language models to follow instructions with human feedback [article]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton (+8 others)
2022 arXiv   pre-print
In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters.  ...  Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-  ...  Thanks to those who contributed in various ways to the infrastructure used to train and deploy our models, including: Daniel Ziegler, William Saunders, Brooke Chan, Dave Cummings, Chris Hesse, Shantanu  ... 
arXiv:2203.02155v1 fatcat:nsjth3nazzeithrsgpggfbchci

A Review on Language Models as Knowledge Bases [article]

Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad
2022 arXiv   pre-print
In this paper, we present a set of aspects that we deem a LM should have to fully act as a KB, and review the recent literature with respect to those aspects.  ...  Acknowledgements Special thanks to Siddharth Verma for many helpful discussions and comments on the paper. Ahmed El-Kholy for the graphic in Figure 1 .  ...  TOME as non-parametric memory to improve reasoning over various knowledge sources.  ... 
arXiv:2204.06031v1 fatcat:nrixk5zcrffkdmhrlwifnga6iu

Scaling Language Models: Methods, Analysis Insights from Training Gopher [article]

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan (+68 others)
2022 arXiv   pre-print
Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.  ...  Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.  ...  Using an estimate of 283W drawn per chip, this leads to a total of 380 net tCO 2 e, compared to 552 net tCO 2 e for GPT-3 (Patterson et al., 2021) or roughly 300 tCO 2 e per passenger jet round trip  ... 
arXiv:2112.11446v2 fatcat:wtajhbesibbetikkpow2vwiwqq

Learning to Repair: Repairing model output errors after deployment using a dynamic memory of feedback [article]

Niket Tandon, Aman Madaan, Peter Clark, Yiming Yang
2022 arXiv   pre-print
Our goal is for an LM to continue to improve after deployment, without retraining, using feedback from the user.  ...  general feedback into specific edits to repair the model output.  ...  We would like to thank Google for providing the TPU machines for conducting experiments.  ... 
arXiv:2112.09737v2 fatcat:6zjm2rtkhjgczmqljzme7jbrk4

Automatic Evaluation and Moderation of Open-domain Dialogue Systems [article]

Chen Zhang and João Sedoc and Luis Fernando D'Haro and Rafael Banchs and Alexander Rudnicky
2021 arXiv   pre-print
judgements across multiple dialogue evaluation aspects (with explainable features for providing constructive and explicit feedback on the quality of generative models' responses for quick development and deployment  ...  )and 2) mechanisms that can help to control chatbot responses,while avoiding toxicity and employing intelligent ways to handle toxic user comments and keeping interaction flow and engagement.  ...  Acknowledgments We want to thank Mario Rodríguez-Cantelar and Marcos Estecha for their contribution to the annotation of data for subtask 2.  ... 
arXiv:2111.02110v3 fatcat:7urg22iv3bbk7or3vqxhqjvoqa

Language Models are Few-Shot Learners [article]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss (+19 others)
2020 arXiv   pre-print
At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora  ...  We discuss broader societal impacts of this finding and of GPT-3 in general.  ...  Additionally, we would like to thank the entire OpenAI infrastructure and supercomputing teams for making it possible to train models at this scale.  ... 
arXiv:2005.14165v4 fatcat:kilb2lujxfax3kgfiuotql2iyy

Towards Understanding and Mitigating Social Biases in Language Models [article]

Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, Ruslan Salakhutdinov
2021 arXiv   pre-print
As a step towards improving the fairness of LMs, we carefully define several sources of representational biases before proposing new benchmarks and metrics to measure them.  ...  Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational biases - harmful biases resulting from stereotyping  ...  We would also like to acknowledge NVIDIA's GPU support and the anonymous reviewers for their extremely helpful comments.  ... 
arXiv:2106.13219v1 fatcat:yjkjuktjyjbejjp2axyc3wprhy

On the Opportunities and Risks of Foundation Models [article]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch (+102 others)
2021 arXiv   pre-print
., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.  ...  Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties  ...  In addition, we would like to especially thank Vanessa Parli for helping to organize this effort.  ... 
arXiv:2108.07258v2 fatcat:yktkv4diyrgzzfzqlpvaiabc2m

Natural Language-Guided Programming [article]

Geert Heyman, Rafael Huysegems, Pascal Justen, Tom Van Cutsem
2021 arXiv   pre-print
The key idea is to adapt code autocompletion tools such that they take into account not only the developer's already-written code but also the intent of the task the developer is trying to achieve next  ...  Central to the tool is the use of language models trained on a large corpus of documented code.  ...  Acknowledgments We would like to thank our colleagues Frederik Vandeputte, Bart Theeten, Maayan Goldstein, Guillermo Rodriguez-Navas and Cecilia Gonzalez-Alvarez for discussions and their help collecting  ... 
arXiv:2108.05198v2 fatcat:3gayryvr2jb27bxpfz3h2hgdd4

A General Language Assistant as a Laboratory for Alignment [article]

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds (+10 others)
2021 arXiv   pre-print
Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest  ...  As an initial foray in this direction we study simple baseline techniques and evaluations, such as prompting.  ...  Right: By adding two human-assistant conversations we can improve performance after finetuning on the prompt.  ... 
arXiv:2112.00861v3 fatcat:g7awtuczp5aldjdl75rwf2eaxm

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models [article]

Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao (+8 others)
2022 arXiv   pre-print
To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control  ...  Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs  ...  Thanks to all the pioneering researchers who developed the structures, objectives, and delta tuning methods for pre-trained models. Ning Ding is supported by Baidu Scholarship.  ... 
arXiv:2203.06904v2 fatcat:yk2v44f74zbe7hfw4lw2nq7eju
« Previous Showing results 1 — 15 out of 55 results