Filters








2,234 Hits in 6.7 sec

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding [article]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman
2019 arXiv   pre-print
In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing  ...  For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is  ...  and Kornel Csernai for providing access to private evaluation data.  ... 
arXiv:1804.07461v3 fatcat:66ez3vqwlrgangfoqy74kep2o4

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel Bowman
2018 Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP  
for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models.  ...  The GLUE benchmark GLUE consists of nine English sentence understanding tasks covering a broad range of domains, data quantities, and difficulties.  ...  To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models  ... 
doi:10.18653/v1/w18-5446 dblp:conf/emnlp/WangSMHLB18 fatcat:2dwabagjybhqfp2dro4ndfkw6q

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark [article]

Nikita Nangia, Samuel R. Bowman
2019 arXiv   pre-print
The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state  ...  remains a challenge for modern neural network approaches to text understanding.  ...  We thank Alex Wang and Amanpreet Singh for their help with conducting GLUE evaluations, and we thank Jason Phang for his help with training the BERT model.  ... 
arXiv:1905.10425v3 fatcat:rrwhaa2bwndlvnibllbbs3hjey

Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

Nikita Nangia, Samuel R. Bowman
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state  ...  a challenge for modern neural network approaches to text understanding.  ...  We thank Alex Wang and Amanpreet Singh for their help with conducting GLUE evaluations, and we thank Jason Phang for his help with training the BERT model.  ... 
doi:10.18653/v1/p19-1449 dblp:conf/acl/NangiaB19 fatcat:3inm2kscvnczrdqpvvcnqz6cbi

The Glue-Nail deductive database system: Design, implementation, and evaluation

Marcia A. Derr, Shinichi Morishita, Geoffrey Phipps
1994 The VLDB journal  
We describe the design and implementation of the Glue-Nail deductive database system. Nail is a purely declarative query language; Glue is a procedural language used for non-query activities.  ...  We also describe the Glue-Nail benchmark suite, a set of applications developed to evaluate the Glue-Nail language and to measure the performance of the system.  ...  David Chang wrote a statistical package in Glue. Ashish Gupta and Sanjai Tiwari wrote the CIFE application. We are grateful to Jeff Ullman for his comments on earlier versions of this article.  ... 
doi:10.1007/bf01228879 fatcat:jh2cqcpo5fedphkysu5bl6gqie

KLEJ: Comprehensive Benchmark for Polish Language Understanding [article]

Piotr Rybak, Robert Mroczkowski, Janusz Tracz, Ireneusz Gawlik
2020 arXiv   pre-print
To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard.  ...  In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks.  ...  In this paper, we introduce the comprehensive multi-task benchmark for the Polish language understanding -KLEJ (eng. GLUE, also abbreviation for Kompleksowa Lista Ewaluacji Językowych, eng.  ... 
arXiv:2005.00630v1 fatcat:7rdrs2elxzgpdiylq4qolkhxg4

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding [article]

Xiaodong Liu, Yu Wang, Jianshu Ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
2020 arXiv   pre-print
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.  ...  A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.  ...  Acknowledgments We thank Liyuan Liu, Sha Li, Mehrad Moradshahi and other contributors to the package, and the anonymous reviewers for valuable discussions and comments.  ... 
arXiv:2002.07972v2 fatcat:4rrvw3owinap5f2wckdwhaftny

ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Experimental results demonstrate that ERNIE 2.0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese.  ...  Recently pre-trained models have achieved state-of-the-art results in various language understanding tasks.  ...  Acknowledgements This work is supported by the National Key Research and Development Project of China (No. 2018AAA0101900).  ... 
doi:10.1609/aaai.v34i05.6428 fatcat:2qq5zh5f3rbtfbk3josuwza6me

A Pragmatics-Centered Evaluation Framework for Natural Language Understanding [article]

Damien Sileo and Tim Van-de-Cruys and Camille Pradel and Philippe Muller
2022 arXiv   pre-print
We introduce PragmEval, a new benchmark for the evaluation of natural language understanding, that unites 11 pragmatics-focused evaluation datasets for English.  ...  Using our evaluation suite, we show that natural language inference, a widely used pretraining task, does not result in genuinely universal representations, which presents a new challenge for multi-task  ...  ., and Muller, P. (2019b) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. (2019b). GLUE: A multitask benchmark and analysis platform for natural language understanding.  ... 
arXiv:1907.08672v2 fatcat:l2w4rs2c7vd3lc3gbus2k2rxcq

ERNIE 2.0: A Continual Pre-training Framework for Language Understanding [article]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang
2019 arXiv   pre-print
Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural  ...  Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese.  ...  Pre-training Settings Fine-tuning Tasks English Task As a multi-task benchmark and analysis platform for natural language understanding, General Language Understanding Evaluation (GLUE) is usually applied  ... 
arXiv:1907.12412v2 fatcat:h7v3wkdfa5gorc6ico3s2yxm6u

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark [article]

Tatiana Shavrina and Alena Fenogenova and Anton Emelyanov and Denis Shevelev and Ekaterina Artemova and Valentin Malykh and Vladislav Mikhailov and Maria Tikhonova and Andrey Chertok and Andrey Evlampiev
2020 arXiv   pre-print
For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language.  ...  In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE.  ...  Acknowledgements Ekaterina Artemova works within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project "5-100".  ... 
arXiv:2010.15925v2 fatcat:r6ix3d53ovgzpfcnwariv5fxiu

NeuronBlocks: Building Your NLP DNN Models Like Playing Lego

Ming Gong, Linjun Shou, Wutao Lin, Zhijie Sang, Quanjia Yan, Ze Yang, Feixiang Cheng, Daxin Jiang
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations  
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.  ...  An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks  ...  Acknowledgements We sincerely thank the anonymous reviewers for their valuable suggestions.  ... 
doi:10.18653/v1/d19-3028 dblp:conf/emnlp/GongSLSYYCJ19 fatcat:eyydy335g5ffrehrvi3tioxoya

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding [article]

Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao
2021 arXiv   pre-print
Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc.  ...  To help accelerate this line of work, we introduce CLUES (Constrained Language Understanding Evaluation Standard), a benchmark for evaluating the few-shot learning capabilities of NLU models.  ...  For classification, we focus on both sentence classification and sentence-pair classification. Sentiment Analysis (SA) and Natural Language Inference (NLI) are both popular benchmark tasks.  ... 
arXiv:2111.02570v1 fatcat:xkapvzlmtnawdn2kb22yhvvije

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing [article]

Pengcheng He, Jianfeng Gao, Weizhu Chen
2021 arXiv   pre-print
We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks.  ...  Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art (  ...  Glue: A multi-task benchmark and analysis platform for natural language understanding.  ... 
arXiv:2111.09543v2 fatcat:2hwqqhxr6jchtd63p4vgqodkc4

NeuronBlocks: Building Your NLP DNN Models Like Playing Lego [article]

Ming Gong, Linjun Shou, Wutao Lin, Zhijie Sang, Quanjia Yan, Ze Yang, Feixiang Cheng, Daxin Jiang
2019 arXiv   pre-print
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.  ...  An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks  ...  Acknowledgements We sincerely thank the anonymous reviewers for their valuable suggestions.  ... 
arXiv:1904.09535v3 fatcat:g6deehinyve43ho3hyps2qmqtm
« Previous Showing results 1 — 15 out of 2,234 results