Filters








53 Hits in 7.8 sec

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning [article]

Alexey Tikhonov, Max Ryabinin
2021 arXiv   pre-print
In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features.  ...  Also, we demonstrate that most of the performance is given by the same small subset of attention heads for all studied languages, which provides evidence of universal reasoning capabilities in multilingual  ...  This makes holistic cross-lingual evaluation of new commonsense reasoning approaches a quite difficult problem for researchers in the area.  ... 
arXiv:2106.12066v2 fatcat:3aqk7hxhdjfnlmnfvljl3augjq

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing [article]

Pengcheng He, Jianfeng Gao, Weizhu Chen
2021 arXiv   pre-print
For example, the mDeBERTa Base achieves a 79.8% zero-shot cross-lingual accuracy on XNLI and a 3.6% improvement over XLM-R Base, creating a new SOTA on this benchmark.  ...  Furthermore, we have pre-trained a multi-lingual model mDeBERTa and observed a larger improvement over strong baselines compared to English models.  ...  Following previous multi-lingual PLMs, we report both the zero-shot cross-lingual transfer performance and the translate train all performance.  ... 
arXiv:2111.09543v2 fatcat:2hwqqhxr6jchtd63p4vgqodkc4

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning [article]

Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang
2021 arXiv   pre-print
Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention  ...  Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability  ...  Introduction Visual Commonsense Reasoning (Zellers et al. 2019) is a new addition to Vision-and-Language (VL) research, which has drawn significant attention in the past few years.  ... 
arXiv:2112.08587v1 fatcat:7bpj6jmqb5fy5hznrry7t4tcsa

Few-shot Learning with Multilingual Language Models [article]

Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer (+9 others)
2021 arXiv   pre-print
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement on  ...  On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 translation directions with 32 training examples, while surpassing the official supervised baseline in 45  ...  It’s all in Yinfei Yang, Yuan Zhang, Chris Tar, and Jason the heads: Using attention heads as a baseline for Baldridge. 2019.  ... 
arXiv:2112.10668v1 fatcat:ehexgbyr5jfetimihdd66sxdtm

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  , Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research.  ...  Understanding how cross-lingual information transfers will benefit the research in this direction.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters [article]

Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, Anna Korhonen, Hinrich Schütze
2021 arXiv   pre-print
In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots.  ...  Additionally, we show that a straightforward full model finetuning approach is quite effective for few-shot transfer, outperforming several state-of-the-art few-shot approaches.  ...  Acknowledgments This work was funded by the European Research Council: ERC NonSequeToR (#740516) and ERC LEXICAL (#648909). We thank the anonymous reviewers and Fei Mi for their helpful suggestions.  ... 
arXiv:2012.15682v2 fatcat:q737zz6mpzcfrinqkz6zwdn6i4

So Different Yet So Alike! Constrained Unsupervised Text Style Transfer [article]

Abhinav Ramesh Kashyap, Devamanyu Hazarika, Min-Yen Kan, Roger Zimmermann, Soujanya Poria
2022 arXiv   pre-print
Unlike the competing losses used in GANs, we introduce cooperative losses where the discriminator and the generator cooperate and reduce the same loss.  ...  We introduce a method for such constrained unsupervised text style transfer by introducing two complementary losses to the generative adversarial network (GAN) family of models.  ...  to gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeForce GTX Titan XGPU used in this research.  ... 
arXiv:2205.04093v1 fatcat:etviw56zrbh7lnqoxpucbjhwru

Core Challenges in Embodied Vision-Language Planning [article]

Jonathan Francis, Nariaki Kitamura, Felix Labelle, Xiaopeng Lu, Ingrid Navarro, Jean Oh
2022 arXiv   pre-print
We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for  ...  ., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field.  ...  This work was supported, in part, by a doctoral research fellowship from Bosch Research and by the U.S. Air Force Office of Scientific Research, under award number FA2386-17-1-4660.  ... 
arXiv:2106.13948v4 fatcat:esrtfxpun5ae5kaydjnymf3v6u

Vision-Language Navigation: A Survey and Taxonomy [article]

Wansen Wu, Tao Chang, Xinmeng Li
2022 arXiv   pre-print
For single-turn tasks, we further subdivide them into goal-oriented and route-oriented based on whether the instructions designate a single goal location or specify a sequence of multiple locations.  ...  This paper provides a comprehensive survey and an insightful taxonomy of these tasks based on the different characteristics of language instructions in these tasks.  ...  ACKNOWLEDGMENT The work described in this paper was sponsored in part by the National Natural Science Foundation of China under Grant No. 62103420 and 62103428 , the Natural Science Fund of Hunan Province  ... 
arXiv:2108.11544v3 fatcat:qo5g237si5cwtewxiaeqtjwqpy

Emotionally Informed Hate Speech Detection: A Multi-target Perspective

Patricia Chiril, Endang Wahyu Pamungkas, Farah Benamara, Véronique Moriceau, Viviana Patti
2021 Cognitive Computation  
In this paper, we propose to tackle, for the first time, hate speech detection from a multi-target perspective.  ...  how to detect hate speech at a finer level of granularity and how to transfer knowledge across different topics and targets; and (3) we study the impact of affective knowledge encoded in sentic computing  ...  Pamungkas and Viviana Patti is partially funded by Progetto di Ateneo/CSP 2016 (Immigrants, Hate and Prejudice in Social Media, S1618.L2.BOSC.01) and by the project "Be Positive!"  ... 
doi:10.1007/s12559-021-09862-5 fatcat:742czn3qvnep5gt6hkaoccz75m

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey [article]

Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria
2022 arXiv   pre-print
Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems.  ...  As a result, a multitude of novel works on this task are carried out, and most of them are deep learning based due to the outstanding performance.  ...  Acknowledgements This research/project is supported by A*STAR under its Industry Alignment Fund (LOA Award I1901E0046).  ... 
arXiv:2105.04387v5 fatcat:yd3gqg45rjgzxbiwfdlcvf3pye

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech [article]

Yi-Ling Chung, Serra Sinem Tekiroglu, Marco Guerini
2021 arXiv   pre-print
Together with our approach, we present a series of experiments that show its feasibility to produce suitable and informative counter narratives in in-domain and cross-domain settings.  ...  Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading  ...  As for cross-domain tests, GPT-2 KN still yields better performance than baselines while the performance for all models (except for XNLG) drops due to unseen events during training.  ... 
arXiv:2106.11783v1 fatcat:llne46m3hranno7hykbtmlt2k4

PaLM: Scaling Language Modeling with Pathways [article]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi (+55 others)
2022 arXiv   pre-print
On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance  ...  Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training  ...  We also thank Lucas Dixon, Ellen Jiang, and Tolga Bolukbasi for their support in model serving.  ... 
arXiv:2204.02311v3 fatcat:ewsbnc6tqrfffounsqlr7utdzm

Message from the general chair

Benjamin C. Lee
2015 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)  
To inject knowledge, we use a state-of-the-art system which cross-links (or "grounds") expressions in free text to Wikipedia.  ...  Our system gives a better performance than all the learning-based systems from the CoNLL-2011 shared task on the same dataset.  ...  Experiment results show that as a standalone speller, our model outperforms all the baseline systems.  ... 
doi:10.1109/ispass.2015.7095776 dblp:conf/ispass/Lee15 fatcat:ehbed6nl6barfgs6pzwcvwxria

Pretrained Transformers for Text Ranking: BERT and Beyond [article]

Jimmy Lin, Rodrigo Nogueira, Andrew Yates
2021 arXiv   pre-print
In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers  ...  The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query.  ...  In addition, we would like to thank the TPU Research Cloud for resources used to obtain new results in this work.  ... 
arXiv:2010.06467v3 fatcat:obla6reejzemvlqhvgvj77fgoy
« Previous Showing results 1 — 15 out of 53 results