79 Hits in 8.5 sec

Soloist: BuildingTask Bots at Scale with Transfer Learning and Machine Teaching

Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao
2021 Transactions of the Association for Computational Linguistics  
We present a new method, Soloist,1 that uses transfer learning and machine teaching to build task bots at scale.  ...  The pre-trained model can be efficiently adapted to accomplish new tasks with a handful of task-specific dialogs via machine teaching, where training samples are generated by human teachers interacting  ...  The result is shown in Few-Shot Evaluation It is desirable for task bots to effectively generalize to new tasks with few task-specific training samples.  ... 
doi:10.1162/tacl_a_00399 fatcat:66kfsqab5jdp3jr4btcn3btcby

Robust Conversational AI with Grounded Text Generation [article]

Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Heung-Yeung Shum
2020 arXiv   pre-print
responses grounded in dialog belief state and real-world knowledge for task completion.  ...  GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone, combined with symbol-manipulation modules for knowledge base inference and prior knowledge encoding, to generate  ...  However, widelyaccepted research benchmarks for evaluating few-shot learning and machine teaching for task bots are yet to be developed to enable a more comprehensive study.  ... 
arXiv:2009.03457v1 fatcat:2462mlxn7fg3zgw5mwknoqngaa

Survey of Generative Methods for Social Media Analysis [article]

Stan Matwin, Aristides Milios, Paweł Prałat, Amilcar Soares, François Théberge
2021 arXiv   pre-print
This survey draws a broad-stroke, panoramic picture of the State of the Art (SoTA) of the research in generative methods for the analysis of social media data.  ...  Social dynamics are important for understanding the spreading of influence or diseases, formation of friendships, the productivity of teams, etc.  ...  GPT-3 has been demonstrated to be effective on a variety of few-shot tasks: due to its extensive pre-training and size, it is able to learn rapidly from very few training examples [39] .  ... 
arXiv:2112.07041v1 fatcat:xgmduwctpbddfo67y6ack5s2um

Scaling Language Models: Methods, Analysis Insights from Training Gopher [article]

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan (+68 others)
2022 arXiv   pre-print
In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter  ...  We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity.  ...  Using an estimate of 283W drawn per chip, this leads to a total of 380 net tCO 2 e, compared to 552 net tCO 2 e for GPT-3 (Patterson et al., 2021) or roughly 300 tCO 2 e per passenger jet round trip  ... 
arXiv:2112.11446v2 fatcat:wtajhbesibbetikkpow2vwiwqq

STraTA: Self-Training with Task Augmentation for Better Few-shot Learning [article]

Tu Vu, Minh-Thang Luong, Quoc V. Le, Grady Simon, Mohit Iyyer
2022 arXiv   pre-print
To address this shortcoming, we propose STraTA, which stands for Self-Training with Task Augmentation, an approach that builds on two key ideas for effective leverage of unlabeled data.  ...  Remarkably, on the SST-2 sentiment dataset, STraTA, with only 8 training examples per class, achieves comparable results to standard fine-tuning with 67K training examples.  ...  We would also like to thank the anonymous reviewers, Kenton Lee, Zihang Dai, Ed H.  ... 
arXiv:2109.06270v2 fatcat:fh35wq35pvhujbyxqpptaoya7m

A Survey on Green Deep Learning [article]

Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li
2021 arXiv   pre-print
We classify these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3) energy-efficient inference approaches, and (4) efficient data usage.  ...  Green deep learning is an increasingly hot research field that appeals to researchers to pay attention to energy usage and carbon emission during model training and inference.  ...  2) How many parameters do we need at least for feasible training and inference? 3) How to design Green learning algorithms to enable efficient zero-shot learning or few-shot learning like human?  ... 
arXiv:2111.05193v2 fatcat:t2blz24y2jakteeeawqqogbkpy

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey [article]

Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria
2022 arXiv   pre-print
To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques.  ...  Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research.  ...  Then they used model-agnostic meta-learning (MAML) (Finn et al., 2017) to train the framework to retrieve correct responses in a few-shot fashion.  ... 
arXiv:2105.04387v5 fatcat:yd3gqg45rjgzxbiwfdlcvf3pye

Recent Advances and Challenges in Task-oriented Dialog System [article]

Zheng Zhang, Ryuichi Takanobu, Qi Zhu, Minlie Huang, Xiaoyan Zhu
2020 arXiv   pre-print
policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model.  ...  We also discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog  ...  [46] proposed SC-GPT by first pre-training GPT with large-scale NLG corpus collected from existing publicly available dialog datasets, and then fine-tuning the model on target NLG tasks with few training  ... 
arXiv:2003.07490v3 fatcat:powcuixxargkbp57kpwmjict3y

Deep Transfer Learning Beyond: Transformer Language Models in Information Systems Research [article]

Ross Gruetzemacher, David Paradice
2021 arXiv   pre-print
, and to enable new IS research topics, thus creating more value for the research community.  ...  This is possible because these techniques make it easier to develop very powerful custom systems and their performance is superior to existing methods for a wide range of tasks and applications.  ...  For two-digit addition problems GPT-3 achieves 99.6% accuracy with only one example, and with no examplesi.e., zero-shot learning -GPT-3 still achieves 76.9% accuracy.  ... 
arXiv:2110.08975v2 fatcat:bw6rzrz2zvdyrgraxoxdraf4d4

A Student-Teacher Architecture for Dialog Domain Adaptation under the Meta-Learning Setting [article]

Kun Qian, Wei Wei, Zhou Yu
2021 arXiv   pre-print
The meta-teacher learns to quantify the importance of tokens under different contexts across different domains.  ...  We propose an efficient domain adaptive task-oriented dialog system model, which incorporates a meta-teacher model to emphasize the different impacts between generated tokens with respect to the context  ...  For each experiment, to reduce randomness in the few-shot learning setting, we repeat the adaptation process for 10 times and report the average result.  ... 
arXiv:2104.02689v1 fatcat:52gyybd6t5g3hhbimgbwzabw64

The backpropagation-based recollection hypothesis: Backpropagated action potentials mediate recall, imagination, language understanding and naming [article]

Zied Ben Houidi
2021 arXiv   pre-print
with the same high accuracy as a state of the art machine learning classifier.  ...  We then leverage simulations based on existing spiking neural network models with STDP learning to show the computational feasibility of using such a mechanism to map the image of an object to its name  ...  Accuracy in few-shot learning We now focus on the case where the "teacher" shows the SNN only few examples.  ... 
arXiv:2101.04137v3 fatcat:rzuj6fvutreidkflgjlsv4yevi

A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception [article]

Keenan Jones, Enes Altuncu, Virginia N. L. Franqueira, Yichao Wang, Shujun Li
2022 arXiv   pre-print
This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research.  ...  As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends  ...  These approaches include: Few-shot learning: Few-shot learning relies on providing only a small number of samples to the NLG model at inference time [15] .  ... 
arXiv:2208.05757v1 fatcat:l7qqnfdawjhcxb7m6jmt7zukmq

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies  ...  The study of GPT-3 demonstrates that scaling up models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches [20]  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI [article]

Jiangchao Yao, Shengyu Zhang, Yang Yao, Feng Wang, Jianxin Ma, Jianwei Zhang, Yunfei Chu, Luo Ji, Kunyang Jia, Tao Shen, Anpeng Wu, Fengda Zhang (+6 others)
2022 arXiv   pre-print
Specifically, we are the first to set up the collaborative learning mechanism for cloud and edge modeling with a thorough review of the architectures that enable such mechanism.  ...  In recent years, we have witnessed significant progress in developing more advanced AI models on cloud servers that surpass traditional deep learning models owing to model innovations (e.g., Transformers  ...  We thus need to seek a more efficient approach in place of fine-tuning for few-sample and few-parameter adaptation.  ... 
arXiv:2111.06061v3 fatcat:5rq6s5s4cvcidblidgahwynp34

Innovations in Neural Data-to-text Generation [article]

Mandar Sharma, Ajay Gogineni, Naren Ramakrishnan
2022 arXiv   pre-print
This survey offers a consolidated view into the neural DTG paradigm with a structured examination of the approaches, benchmark datasets, and evaluation protocols.  ...  With this holistic view, we highlight promising avenues for DTG research that not only focus on the design of linguistically capable systems but also systems that exhibit fairness and accountability.  ...  For few-shot learning, Liu et. al.  ... 
arXiv:2207.12571v1 fatcat:5auwcdcpavg57dldveqohueioe
« Previous Showing results 1 — 15 out of 79 results