Filters








30,079 Hits in 7.8 sec

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning [article]

Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal
2018 arXiv   pre-print
In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.  ...  Recent work has explored unsupervised as well as supervised learning techniques with different training objectives to learn general purpose fixed-length sentence representations.  ...  & FUTURE WORK We present a multi-task framework for learning general-purpose fixed-length sentence representations.  ... 
arXiv:1804.00079v1 fatcat:fj4oslpghvespf7ylvrsrue22q

Learning Robust, Transferable Sentence Representations for Text Classification [article]

Wasi Uddin Ahmad, Xueying Bai, Nanyun Peng, Kai-Wei Chang
2018 arXiv   pre-print
sentence representations that are useful for transfer learning.  ...  In this paper, we show that jointly learning sentence representations from multiple text classification tasks and combining them with pre-trained word-level and sentence level encoders result in robust  ...  Unsupervised approaches are also proposed in literature by utilizing a large collection of unlabeled text corpora to learn distributional sentence representations.  ... 
arXiv:1810.00681v1 fatcat:6pkwhuzjrrh7bklpd7lxunf3nq

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks [article]

Wasi Uddin Ahmad, Xueying Bai, Zhechao Huang, Chao Jiang, Nanyun Peng, Kai-Wei Chang
2018 arXiv   pre-print
Learning distributed sentence representations is one of the key challenges in natural language processing.  ...  The quantitative analysis using auxiliary tasks show that multi-task learning helps to embed better semantic information in the sentence representations compared to single-task learning.  ...  General purpose distributional sentence representations can be learned from a large collection of unlabeled text corpora.  ... 
arXiv:1804.07911v2 fatcat:z3bmcufu3jcrxktmomcdmbqlhq

Dialogue Generation on Infrequent Sentence Functions via Structured Meta-Learning [article]

Yifan Gao, Piji Li, Wei Bi, Xiaojiang Liu, Michael R. Lyu, Irwin King
2020 arXiv   pre-print
We treat dialogue generation conditioned on different sentence functions as separate tasks, and apply model-agnostic meta-learning to high-resource sentence functions data.  ...  Sentence function is an important linguistic feature indicating the communicative purpose in uttering a sentence.  ...  We conduct experiments on STC-SeFun dataset (Bi et al., 2019) which is a large-scale Chinese short text conversation dataset with manually labeled sentence functions.  ... 
arXiv:2010.01495v1 fatcat:uhnxk3uuhfennkdo2wle5didzm

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding [article]

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
2020 arXiv   pre-print
A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.  ...  To enable efficient production deployment, MT-DNN supports multi-task knowledge distillation, which can substantially compress a deep neural model without significant performance drop.  ...  model via self-supervision on a large unlabeled text corpus, followed by a fine-tuning step that starts from the pretrained contextual representations and conducts supervised learning for * Equal Contribution  ... 
arXiv:2002.07972v2 fatcat:4rrvw3owinap5f2wckdwhaftny

The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures

Sushant Singh, Ausif Mahmood
2021 IEEE Access  
retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG).  ...  Although these large-size models have achieved unprecedented performances, they come at high computational costs.  ...  (a) BERT's NSP learning via simple non-reversed pair order (b) ALBERT's SOP dual sentiment learning via sentence order reversal.  ... 
doi:10.1109/access.2021.3077350 fatcat:gchmms4m2ndvzdowgrvro3w6z4

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures [article]

Sushant Singh, Ausif Mahmood
2021 arXiv   pre-print
retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG).  ...  Although these large-size models have achieved unprecedented performances, they come at high computational costs.  ...  (a) BERT's NSP learning via simple non-reversed pair order (b) ALBERT's SOP dual sentiment learning via sentence order reversal.  ... 
arXiv:2104.10640v3 fatcat:ctuyddhm3baajk5uqrynwdap44

Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders [article]

Edgar Schönfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata
2019 arXiv   pre-print
Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.  ...  The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with  ...  of the art in the truly large-scale ImageNet dataset in all splits for the generalized zero-shot learning task.  ... 
arXiv:1812.01784v4 fatcat:px7zvsnsz5a5zfxt7vbvhxnh54

Multi-document Summarization via Deep Learning Techniques: A Survey [article]

Congbo Ma, Wei Emma Zhang, Mingyu Guo, Hu Wang, Quan Z. Sheng
2021 arXiv   pre-print
Multi-document summarization (MDS) is an effective tool for information aggregation that generates an informative and concise summary from a cluster of topic-related documents.  ...  Our survey, the first of its kind, systematically overviews the recent deep learning based MDS models.  ...  The development of large-scale cross-task datasets will facilitate multi-task learning [34, 135] .  ... 
arXiv:2011.04843v3 fatcat:zfi52xxef5g2tjkaw6hgjpwa5i

Multi-Task Learning in Natural Language Processing: An Overview [article]

Shijie Chen, Yu Zhang, Qiang Yang
2021 arXiv   pre-print
In recent years, Multi-Task Learning (MTL), which can leverage useful information of related tasks to achieve simultaneous performance improvement on multiple related tasks, has been used to handle these  ...  Then we present optimization techniques on loss construction, data sampling, and task scheduling to properly train a multi-task model.  ...  [126] trains multi-role dialogue representations via unsupervised multi-task pre-training on reference prediction, word prediction, role prediction, and sentence generation.  ... 
arXiv:2109.09138v1 fatcat:hlgzjykuvzczzmsgnl32w5qo5q

Dynamic Multi-Level Multi-Task Learning for Sentence Simplification [article]

Han Guo, Ramakanth Pasunuru, Mohit Bansal
2018 arXiv   pre-print
In this work, we first present a strong pointer-copy mechanism based sequence-to-sequence sentence simplification model, and then improve its entailment and paraphrasing capabilities via multi-task learning  ...  We also introduce a novel multi-armed bandit based training approach that dynamically learns how to effectively switch across tasks during multi-task learning.  ...  Further, we also induce word/phrase-level paraphrasing knowledge via a paraphrase generation task, enabling parallel learning of these three tasks in a threeway multi-task learning setup.  ... 
arXiv:1806.07304v1 fatcat:tzgbb4lcabfebjqurbeuevjv6e

Multi-document Summarization via Deep Learning Techniques: A Survey

Congbo Ma, Wei Emma Zhang, Mingyu Guo, Hu Wang, QUAN Z. Sheng
2022 ACM Computing Surveys  
Multi-document summarization (MDS) is an effective tool for information aggregation that generates an informative and concise summary from a cluster of topic-related documents.  ...  Our survey, the first of its kind, systematically overviews the recent deep learning based MDS models.  ...  We also summarize nine network design strategies based on our extensive studies of the current models. • We discuss the open issues of deep learning based multi-document summarization and identify the  ... 
doi:10.1145/3529754 fatcat:r4lngnzrgjbfziazokpd2c5s44

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks [article]

Trapit Bansal, Rishikesh Jha, Tsendsuren Munkhdalai, Andrew McCallum
2020 arXiv   pre-print
This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.  ...  This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework.  ...  (2019) demonstrated that better feature learning from supervised tasks helps few-shot learning. Thus, we also evaluate multi-task learning and multi-task meta-learning for few-shot generalization.  ... 
arXiv:2009.08445v2 fatcat:klscagonaveaxo67swdr56pyry

Retrofitting Structure-aware Transformer Language Model for End Tasks [article]

Hao Fei and Yafeng Ren and Donghong Ji
2020 arXiv   pre-print
A middle-layer structural learning strategy is leveraged for structure integration, accomplished with main semantic task training under multi-task learning scheme.  ...  By performing structure-aware fine-tuning, our model achieves significant improvements for both semantic- and syntactic-dependent tasks.  ...  Structure-aware Learning Multi-task training for language modeling and structure induction.  ... 
arXiv:2009.07408v1 fatcat:rf7tqsrv6vcbjbjooscsv7txxq

Exploring Domain Shift in Extractive Text Summarization [article]

Danqing Wang, Pengfei Liu, Ming Zhong, Jie Fu, Xipeng Qiu, Xuanjing Huang
2019 arXiv   pre-print
Our source code including BERT-based, meta-learning methods for multi-domain summarization learning and the re-purposed dataset Multi-SUM will be available on our project: .  ...  As a result, the model is under-utilizing the nature of the training data due to ignoring the difference in the distribution of training sets and shows poor generalization on the unseen domain.  ...  However, there are few works on building the connection between large-scale pre-trained models and multi-domain learning.  ... 
arXiv:1908.11664v1 fatcat:emzaybnlbvbtncx647hk6udfoy
« Previous Showing results 1 — 15 out of 30,079 results