A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
[article]
2019
arXiv
pre-print
In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing ...
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is ...
and Kornel Csernai for providing access to private evaluation data. ...
arXiv:1804.07461v3
fatcat:66ez3vqwlrgangfoqy74kep2o4
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models. ...
The GLUE benchmark GLUE consists of nine English sentence understanding tasks covering a broad range of domains, data quantities, and difficulties. ...
To facilitate research in this direction, we present the General Language Understanding Evaluation (GLUE, gluebenchmark.com): a benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models ...
doi:10.18653/v1/w18-5446
dblp:conf/emnlp/WangSMHLB18
fatcat:2dwabagjybhqfp2dro4ndfkw6q
Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark
[article]
2019
arXiv
pre-print
The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state ...
remains a challenge for modern neural network approaches to text understanding. ...
We thank Alex Wang and Amanpreet Singh for their help with conducting GLUE evaluations, and we thank Jason Phang for his help with training the BERT model. ...
arXiv:1905.10425v3
fatcat:rrwhaa2bwndlvnibllbbs3hjey
Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state ...
a challenge for modern neural network approaches to text understanding. ...
We thank Alex Wang and Amanpreet Singh for their help with conducting GLUE evaluations, and we thank Jason Phang for his help with training the BERT model. ...
doi:10.18653/v1/p19-1449
dblp:conf/acl/NangiaB19
fatcat:3inm2kscvnczrdqpvvcnqz6cbi
The Glue-Nail deductive database system: Design, implementation, and evaluation
1994
The VLDB journal
We describe the design and implementation of the Glue-Nail deductive database system. Nail is a purely declarative query language; Glue is a procedural language used for non-query activities. ...
We also describe the Glue-Nail benchmark suite, a set of applications developed to evaluate the Glue-Nail language and to measure the performance of the system. ...
David Chang wrote a statistical package in Glue. Ashish Gupta and Sanjai Tiwari wrote the CIFE application. We are grateful to Jeff Ullman for his comments on earlier versions of this article. ...
doi:10.1007/bf01228879
fatcat:jh2cqcpo5fedphkysu5bl6gqie
KLEJ: Comprehensive Benchmark for Polish Language Understanding
[article]
2020
arXiv
pre-print
To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. ...
In recent years, a series of Transformer-based models unlocked major improvements in general natural language understanding (NLU) tasks. ...
In this paper, we introduce the comprehensive multi-task benchmark for the Polish language understanding -KLEJ (eng. GLUE, also abbreviation for Kompleksowa Lista Ewaluacji Językowych, eng. ...
arXiv:2005.00630v1
fatcat:7rdrs2elxzgpdiylq4qolkhxg4
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
[article]
2020
arXiv
pre-print
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. ...
A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. ...
Acknowledgments We thank Liyuan Liu, Sha Li, Mehrad Moradshahi and other contributors to the package, and the anonymous reviewers for valuable discussions and comments. ...
arXiv:2002.07972v2
fatcat:4rrvw3owinap5f2wckdwhaftny
ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Experimental results demonstrate that ERNIE 2.0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese. ...
Recently pre-trained models have achieved state-of-the-art results in various language understanding tasks. ...
Acknowledgements This work is supported by the National Key Research and Development Project of China (No. 2018AAA0101900). ...
doi:10.1609/aaai.v34i05.6428
fatcat:2qq5zh5f3rbtfbk3josuwza6me
A Pragmatics-Centered Evaluation Framework for Natural Language Understanding
[article]
2022
arXiv
pre-print
We introduce PragmEval, a new benchmark for the evaluation of natural language understanding, that unites 11 pragmatics-focused evaluation datasets for English. ...
Using our evaluation suite, we show that natural language inference, a widely used pretraining task, does not result in genuinely universal representations, which presents a new challenge for multi-task ...
., and Muller, P. (2019b) Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. R. (2019b). GLUE: A multitask benchmark and analysis platform for natural language understanding. ...
arXiv:1907.08672v2
fatcat:l2w4rs2c7vd3lc3gbus2k2rxcq
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
[article]
2019
arXiv
pre-print
Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural ...
Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese. ...
Pre-training Settings
Fine-tuning Tasks English Task As a multi-task benchmark and analysis platform for natural language understanding, General Language Understanding Evaluation (GLUE) is usually applied ...
arXiv:1907.12412v2
fatcat:h7v3wkdfa5gorc6ico3s2yxm6u
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
[article]
2020
arXiv
pre-print
For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. ...
In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. ...
Acknowledgements Ekaterina Artemova works within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project "5-100". ...
arXiv:2010.15925v2
fatcat:r6ix3d53ovgzpfcnwariv5fxiu
NeuronBlocks: Building Your NLP DNN Models Like Playing Lego
2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. ...
An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks ...
Acknowledgements We sincerely thank the anonymous reviewers for their valuable suggestions. ...
doi:10.18653/v1/d19-3028
dblp:conf/emnlp/GongSLSYYCJ19
fatcat:eyydy335g5ffrehrvi3tioxoya
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
[article]
2021
arXiv
pre-print
Most recent progress in natural language understanding (NLU) has been driven, in part, by benchmarks such as GLUE, SuperGLUE, SQuAD, etc. ...
To help accelerate this line of work, we introduce CLUES (Constrained Language Understanding Evaluation Standard), a benchmark for evaluating the few-shot learning capabilities of NLU models. ...
For classification, we focus on both sentence classification and sentence-pair classification. Sentiment Analysis (SA) and Natural Language Inference (NLI) are both popular benchmark tasks. ...
arXiv:2111.02570v1
fatcat:xkapvzlmtnawdn2kb22yhvvije
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
[article]
2021
arXiv
pre-print
We have pre-trained DeBERTaV3 using the same settings as DeBERTa to demonstrate its exceptional performance on a wide range of downstream natural language understanding (NLU) tasks. ...
Taking the GLUE benchmark with eight tasks as an example, the DeBERTaV3 Large model achieves a 91.37% average score, which is 1.37% over DeBERTa and 1.91% over ELECTRA, setting a new state-of-the-art ( ...
Glue: A multi-task benchmark and analysis platform for natural language understanding. ...
arXiv:2111.09543v2
fatcat:2hwqqhxr6jchtd63p4vgqodkc4
NeuronBlocks: Building Your NLP DNN Models Like Playing Lego
[article]
2019
arXiv
pre-print
Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks. ...
An NLP toolkit for DNN models with both generality and flexibility can greatly improve the productivity of engineers by saving their learning cost and guiding them to find optimal solutions to their tasks ...
Acknowledgements We sincerely thank the anonymous reviewers for their valuable suggestions. ...
arXiv:1904.09535v3
fatcat:g6deehinyve43ho3hyps2qmqtm
« Previous
Showing results 1 — 15 out of 2,234 results