A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
[article]
2022
arXiv
pre-print
This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain. ...
We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. ...
To facilitate the development, evaluation, and comparison of new NLG models, we introduced a multi-task evaluation benchmark for Bangla NLG, a widely spoken yet low-resource language. ...
arXiv:2205.11081v2
fatcat:5z3xeoix5zf2zcapyqthkuegbe
CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark
[article]
2021
arXiv
pre-print
Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. ...
To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected ...
GLGE: A new Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen,
general language generation evaluation benchmark. ...
arXiv:2112.13610v1
fatcat:eks56wvqtbhmfkq7wvs5n46lte
ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation
[article]
2021
arXiv
pre-print
In our experiments, ProphetNet-X models achieve new state-of-the-art performance on 10 benchmarks. ...
And also, we provide a PLG (Programming Language Generation) model ProphetNet-Code to show the generation performance besides NLG (Natural Language Generation) tasks. ...
Finetuning Benchmarks For different ProphetNet-X models, we select different benchmarks to evaluate them, respectively. ...
arXiv:2104.08006v2
fatcat:d3rxjftdkvbtvozgurhamq5mv4
GEM: A General Evaluation Benchmark for Multimodal Tasks
[article]
2021
arXiv
pre-print
In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. ...
Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language ...
GLGE (Liu et al., 2020) is another comprehensive dataset for natural language generation evaluation. ...
arXiv:2106.09889v1
fatcat:zwuq4lnufnblhcdxwepeekwvru
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
2022
Transactions of the Association for Computational Linguistics
Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks. ...
Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts. ...
And most tasks in NLG benchmarks such as GLGE and GEM (Gehrmann et al., 2021) require generating only several words (e.g., dialogue generation). ...
doi:10.1162/tacl_a_00469
fatcat:wzxedwqfnbawvo3gbgmo4cjvpi
Indian Legal NLP Benchmarks : A Survey
[article]
2021
arXiv
pre-print
We review the existing work in this area and propose ideas to create new benchmarks for Indian Legal Natural Language Processing. ...
Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural ...
(Yang et al., 2015) , and GlGE . ...
arXiv:2107.06056v1
fatcat:n2vsarxaqvbz3kqrt4riz7psoq
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
[article]
2022
arXiv
pre-print
Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks. ...
Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts. ...
We propose a new story-centric benchmark LOT for evaluating Chinese long text understanding and generation. LOT consists of four tasks for testing the fundamental abilities to model long texts. ...
arXiv:2108.12960v2
fatcat:6c4g5rhwureftcoao2d5tah6wa
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
[article]
2021
arXiv
pre-print
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. ...
Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. ...
We additionally thank all participants of INLG 2019, the Generation Birdsof-a-Feather meeting at ACL 2020, the EvalNL-GEval Workshop at INLG 2020, and members of the generation challenge mailing list of ...
arXiv:2102.01672v3
fatcat:cco3zpdwnzcrxcpaicndrk2y7i
Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets
[article]
2022
arXiv
pre-print
Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish preference in a model's output over another is often necessary. ...
In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) that repurposes prior human annotations into NND tests. ...
Following the success of benchmarks such as GLUE (Wang et al., 2018) for the evaluation of NLU models, some work has proposed benchmarks as a way to evaluate NLG models, such as GLGE with 8 NLG tasks ...
arXiv:2205.06871v1
fatcat:ld5xvzijqzbgfjmvwnzjcg2ze4
Pretrained Language Models for Text Generation: A Survey
[article]
2022
arXiv
pre-print
Text Generation aims to produce plausible and readable text in a human language from input data. ...
Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation. ...
[117] introduced the General Language Generation Evaluation (GLGE) benchmark, a new multi-task benchmark for evaluating the generalization capabilities of text generation. ...
arXiv:2201.05273v4
fatcat:pnffabspsnbhvo44gbaorhxc3a
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
[article]
2021
arXiv
pre-print
Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. ...
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. ...
ACKNOWLEDGMENTS Kalyan would like to thank his father Katikapalli Subramanyam for giving a) $750 to buy a new laptop, 24inch monitor and study table. b) $180 for one year subscription of Medium, Overleaf ...
arXiv:2108.05542v2
fatcat:4uyj6uut65d37hfi7yss2fek6q
SIENNA D2.4: Ethical Analysis of Human Genetics and Genomics
2019
Zenodo
particular, we focus on the ethical issues pertaining to two areas of human genomics: 1) the study of the genome as currently performed through high throughput sequencing (e.g. with tools such as next generation ...
The report is based on a description of such technologies in previous deliverable D.2.1 and intends to provide a basis for our next report D.2.7, in which we aim to discuss an ethical framework for human ...
In order to develop a viable GLGE procedure, which could be potentially used in a clinical setting, scientists would have to generate enough data to evaluate its safety and efficacy 418 . ...
doi:10.5281/zenodo.4068015
fatcat:c2xwv5h6jje4piilvj6rv6uajm
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
At the end of this paper, we conclude the further development of BMs in a more general view. ...
, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. ...
To better benchmark general-purpose language intelligence, Beijing Academy of Artificial Intelligence proposed CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with a hierarchical ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
[article]
2022
arXiv
pre-print
In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse ...
Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG). ...
Similarly, [Liu et al. 2020d] proposed the General Language Generation Evaluation (GLGE), a new multi-task benchmark for natural language generation. ...
arXiv:2201.05337v1
fatcat:lqr6ulndhrcjbiy7etejwtdghy
A Survey of Knowledge-Enhanced Text Generation
[article]
2022
arXiv
pre-print
The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). ...
In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years. ...
Therefore, we re-screened from the existing four text generation benchmarks, i.e., GLGE [74] , GEM [39] , KilT [95] , GENIE [57] , and determined ten benchmark datasets for evaluating knowledge-enhanced ...
arXiv:2010.04389v3
fatcat:vzdtlz4j65g2va7gwkbmzyxkhq
« Previous
Showing results 1 — 15 out of 21 results