Filters








21 Hits in 2.3 sec

BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla [article]

Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Rifat Shahriyar
2022 arXiv   pre-print
This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language in the web domain.  ...  We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark.  ...  To facilitate the development, evaluation, and comparison of new NLG models, we introduced a multi-task evaluation benchmark for Bangla NLG, a widely spoken yet low-resource language.  ... 
arXiv:2205.11081v2 fatcat:5z3xeoix5zf2zcapyqthkuegbe

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark [article]

Yuan Yao, Qingxiu Dong, Jian Guan, Boxi Cao, Zhengyan Zhang, Chaojun Xiao, Xiaozhi Wang, Fanchao Qi, Junwei Bao, Jinran Nie, Zheni Zeng, Yuxian Gu (+23 others)
2021 arXiv   pre-print
Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role.  ...  To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected  ...  GLGE: A new Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, general language generation evaluation benchmark.  ... 
arXiv:2112.13610v1 fatcat:eks56wvqtbhmfkq7wvs5n46lte

ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation [article]

Weizhen Qi, Yeyun Gong, Yu Yan, Can Xu, Bolun Yao, Bartuer Zhou, Biao Cheng, Daxin Jiang, Jiusheng Chen, Ruofei Zhang, Houqiang Li, Nan Duan
2021 arXiv   pre-print
In our experiments, ProphetNet-X models achieve new state-of-the-art performance on 10 benchmarks.  ...  And also, we provide a PLG (Programming Language Generation) model ProphetNet-Code to show the generation performance besides NLG (Natural Language Generation) tasks.  ...  Finetuning Benchmarks For different ProphetNet-X models, we select different benchmarks to evaluate them, respectively.  ... 
arXiv:2104.08006v2 fatcat:d3rxjftdkvbtvozgurhamq5mv4

GEM: A General Evaluation Benchmark for Multimodal Tasks [article]

Lin Su and Nan Duan and Edward Cui and Lei Ji and Chenfei Wu and Huaishao Luo and Yongfei Liu and Ming Zhong and Taroon Bharti and Arun Sacheti
2021 arXiv   pre-print
In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks.  ...  Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language  ...  GLGE (Liu et al., 2020) is another comprehensive dataset for natural language generation evaluation.  ... 
arXiv:2106.09889v1 fatcat:zwuq4lnufnblhcdxwepeekwvru

LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation

Jian Guan, Zhuoer Feng, Yamei Chen, Ruilin He, Xiaoxi Mao, Changjie Fan, Minlie Huang
2022 Transactions of the Association for Computational Linguistics  
Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks.  ...  Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts.  ...  And most tasks in NLG benchmarks such as GLGE and GEM (Gehrmann et al., 2021) require generating only several words (e.g., dialogue generation).  ... 
doi:10.1162/tacl_a_00469 fatcat:wzxedwqfnbawvo3gbgmo4cjvpi

Indian Legal NLP Benchmarks : A Survey [article]

Prathamesh Kalamkar, Janani Venugopalan Ph.D., Vivek Raghavan Ph.D
2021 arXiv   pre-print
We review the existing work in this area and propose ideas to create new benchmarks for Indian Legal Natural Language Processing.  ...  Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural  ...  (Yang et al., 2015) , and GlGE .  ... 
arXiv:2107.06056v1 fatcat:n2vsarxaqvbz3kqrt4riz7psoq

LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation [article]

Jian Guan, Zhuoer Feng, Yamei Chen, Ruilin He, Xiaoxi Mao, Changjie Fan, Minlie Huang
2022 arXiv   pre-print
Therefore, we propose a story-centric benchmark named LOT for evaluating Chinese long text modeling, which aggregates two understanding tasks and two generation tasks.  ...  Existing benchmarks for natural language processing (NLP) usually focus only on understanding or generating short texts.  ...  We propose a new story-centric benchmark LOT for evaluating Chinese long text understanding and generation. LOT consists of four tasks for testing the fundamental abilities to model long texts.  ... 
arXiv:2108.12960v2 fatcat:6c4g5rhwureftcoao2d5tah6wa

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics [article]

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus (+44 others)
2021 arXiv   pre-print
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics.  ...  Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics.  ...  We additionally thank all participants of INLG 2019, the Generation Birdsof-a-Feather meeting at ACL 2020, the EvalNL-GEval Workshop at INLG 2020, and members of the generation challenge mailing list of  ... 
arXiv:2102.01672v3 fatcat:cco3zpdwnzcrxcpaicndrk2y7i

Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets [article]

Philippe Laban and Chien-Sheng Wu and Wenhao Liu and Caiming Xiong
2022 arXiv   pre-print
Precisely assessing the progress in natural language generation (NLG) tasks is challenging, and human evaluation to establish preference in a model's output over another is often necessary.  ...  In this paper, we propose a new and simple automatic evaluation method for NLG called Near-Negative Distinction (NND) that repurposes prior human annotations into NND tests.  ...  Following the success of benchmarks such as GLUE (Wang et al., 2018) for the evaluation of NLU models, some work has proposed benchmarks as a way to evaluate NLG models, such as GLGE with 8 NLG tasks  ... 
arXiv:2205.06871v1 fatcat:ld5xvzijqzbgfjmvwnzjcg2ze4

Pretrained Language Models for Text Generation: A Survey [article]

Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
2022 arXiv   pre-print
Text Generation aims to produce plausible and readable text in a human language from input data.  ...  Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this paper, we provide a survey on the utilization of PLMs in text generation.  ...  [117] introduced the General Language Generation Evaluation (GLGE) benchmark, a new multi-task benchmark for evaluating the generalization capabilities of text generation.  ... 
arXiv:2201.05273v4 fatcat:pnffabspsnbhvo44gbaorhxc3a

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing [article]

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
2021 arXiv   pre-print
Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic.  ...  Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT.  ...  ACKNOWLEDGMENTS Kalyan would like to thank his father Katikapalli Subramanyam for giving a) $750 to buy a new laptop, 24inch monitor and study table. b) $180 for one year subscription of Medium, Overleaf  ... 
arXiv:2108.05542v2 fatcat:4uyj6uut65d37hfi7yss2fek6q

SIENNA D2.4: Ethical Analysis of Human Genetics and Genomics

Alexandra Soulier, Emilia Niemiec, Heidi Carmen Howard
2019 Zenodo  
particular, we focus on the ethical issues pertaining to two areas of human genomics: 1) the study of the genome as currently performed through high throughput sequencing (e.g. with tools such as next generation  ...  The report is based on a description of such technologies in previous deliverable D.2.1 and intends to provide a basis for our next report D.2.7, in which we aim to discuss an ethical framework for human  ...  In order to develop a viable GLGE procedure, which could be potentially used in a clinical setting, scientists would have to generate enough data to evaluate its safety and efficacy 418 .  ... 
doi:10.5281/zenodo.4068015 fatcat:c2xwv5h6jje4piilvj6rv6uajm

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
At the end of this paper, we conclude the further development of BMs in a more general view.  ...  , Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research.  ...  To better benchmark general-purpose language intelligence, Beijing Academy of Artificial Intelligence proposed CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with a hierarchical  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models [article]

Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, Dawei Song
2022 arXiv   pre-print
In recent years, methods using large-scale pre-trained language models (PLMs), in particular the widely used transformer-based PLMs, have become a new paradigm of NLG, allowing generation of more diverse  ...  Controllable Text Generation (CTG) is emerging area in the field of natural language generation (NLG).  ...  Similarly, [Liu et al. 2020d] proposed the General Language Generation Evaluation (GLGE), a new multi-task benchmark for natural language generation.  ... 
arXiv:2201.05337v1 fatcat:lqr6ulndhrcjbiy7etejwtdghy

A Survey of Knowledge-Enhanced Text Generation [article]

Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, Meng Jiang
2022 arXiv   pre-print
The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP).  ...  In this survey, we present a comprehensive review of the research on knowledge enhanced text generation over the past five years.  ...  Therefore, we re-screened from the existing four text generation benchmarks, i.e., GLGE [74] , GEM [39] , KilT [95] , GENIE [57] , and determined ten benchmark datasets for evaluating knowledge-enhanced  ... 
arXiv:2010.04389v3 fatcat:vzdtlz4j65g2va7gwkbmzyxkhq
« Previous Showing results 1 — 15 out of 21 results