407 Hits in 5.8 sec

CPM-2: Large-scale Cost-effective Pre-trained Language Models [article]

Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng (+7 others)
2021 arXiv   pre-print
In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios.  ...  the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs.  ...  .,, and for the support in collecting the Chinese corpus.  ... 
arXiv:2106.10715v3 fatcat:wypmozuq65fglg7ho3sagogfzu

FPM: A Collection of Large-scale Foundation Pre-trained Language Models [article]

Dezhou Shen
2022 arXiv   pre-print
To the best of our knowledge, we provide the largest Chinese generative model and the largest Chinese encoding model.  ...  The BERT language models we trained on English datasets deliver a 14.45% higher F1 score than the Turing-NLR.  ...  This paper uses three tokens, Chinese Pre-trained language Model (CPM) [22] , English Pre-trained language Model (EPM) [22] , and Generative Pre-Training (GPT) [11] , to represent autoregressive language  ... 
arXiv:2111.04909v3 fatcat:qab2xwr6fbdwvfzv6iehfjlx6y

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [article]

Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu (+10 others)
2021 arXiv   pre-print
Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities.  ...  In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.  ...  [19] released a 2.6 billion parameters Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data and the model structure was inspired by [2] .  ... 
arXiv:2107.02137v1 fatcat:uuocxl66cbhvtc7kkys3hpbgbu

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation [article]

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu
2021 arXiv   pre-print
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT).  ...  Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively.  ...  CPM-2 is a large-scale encoder-decoder model with 11 billion parameters, pre-trained in multiple stages with large-scale Chinese and bilingual data.  ... 
arXiv:2109.05729v3 fatcat:a65tgy3zvzehbprz6i3v2zq7bu

Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk [article]

Benyou Wang, Xiangbo Wu, Xiaokang Liu, Jianquan Li, Prayag Tiwari, Qianqian Xie
2022 arXiv   pre-print
However, the humor aspect of natural language is relatively under-investigated, especially in the age of pre-trained language models.  ...  We benchmark various generation approaches including training-from-scratch Seq2seq, fine-tuned middle-scale PLMs, and large-scale PLMs (with and without fine-tuning).  ...  This paper also aims to break the stereotype that Chinese people are serious and cold. Instead, we do have a great sense of humor with a long history of many thousand years.  ... 
arXiv:2207.00735v1 fatcat:7hpweifxmzdwrlcfwhh4doadr4

EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training [article]

Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen Henry Wu, Hao Sun, Xiaocong Yang, Bosi Wen, Xiaoyan Zhu (+2 others)
2021 arXiv   pre-print
Although pre-trained language models have remarkably enhanced the generation ability of dialogue systems, open-domain Chinese dialogue systems are still limited by the dialogue data and the model size  ...  In this paper, we propose EVA, a Chinese dialogue system that contains the largest Chinese pre-trained dialogue model with 2.8B parameters.  ...  We conduct extensive experiments on automatic and human evaluation to show the effectiveness of our model.  ... 
arXiv:2108.01547v1 fatcat:2yoizykcjbhztmw6ofkso7wp4a

COLD: A Benchmark for Chinese Offensive Language Detection [article]

Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, Minlie Huang
2022 arXiv   pre-print
To facilitate Chinese offensive language detection and model evaluation, we collect COLDataset, a Chinese offensive language dataset containing 37k annotated sentences.  ...  and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.  ...  CDialGPT is a 12-layer GPT which is pre-trained on the Chinese novel dataset and post-trained on large-scale cleaned Chinese conversation dataset LCCC.  ... 
arXiv:2201.06025v1 fatcat:e2ahpikv2jeo5av5sydbzjyr2y

EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training [article]

Yuxian Gu, Jiaxin Wen, Hao Sun, Yi Song, Pei Ke, Chujie Zheng, Zheng Zhang, Jianzhu Yao, Xiaoyan Zhu, Jie Tang, Minlie Huang
2022 arXiv   pre-print
We propose EVA2.0, a large-scale pre-trained open-domain Chinese dialogue model with 2.8 billion parameters, and make our models and code publicly available.  ...  Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems.  ...  There also emerge numerous large-scale pretrained models in Chinese. The CPM family (Zhang et al., 2021b,a) pioneer the Chinese pretrained models.  ... 
arXiv:2203.09313v2 fatcat:4qhf5rbjmfg3hprk3yx6p6zahi

PPT: Pre-trained Prompt Tuning for Few-shot Learning [article]

Yuxian Gu, Xu Han, Zhiyuan Liu, Minlie Huang
2022 arXiv   pre-print
Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks.  ...  To ensure the generalization of PPT, we formulate similar classification tasks into a unified task form and pre-train soft prompts for this unified task.  ...  Note that for Chinese experiments, CPM-2 and mT5-XXL share the same parameter scale. Since CPM-2 outperforms mT5-XXL across all tasks, we use CPM-2 as the base model.  ... 
arXiv:2109.04332v3 fatcat:wqjtq7o7j5fkjixtnj6d5oksvi

PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation [article]

Siqi Bao, Huang He, Fan Wang, Hua Wu, Haifeng Wang, Wenquan Wu, Zhihua Wu, Zhen Guo, Hua Lu, Xinxian Huang, Xin Tian, Xinchao Xu (+2 others)
2021 arXiv   pre-print
To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations.  ...  To train such large models, we adopt the architecture of unified transformer with high computation and parameter efficiency.  ...  Related Work Large-scale Pre-trained Language Models The pre-training paradigm has brought substantial performance improvements in natural language processing, where large-scale transformer models are  ... 
arXiv:2109.09519v1 fatcat:kma55sd5ifaszjhbfppp7dswzy

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [article]

Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen (+17 others)
2021 arXiv   pre-print
A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters.  ...  GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential.  ...  ERNIE 3.0 Titan achieves strong performance compared to recently proposed large-scale Chinese language models such as CPM-1 (2.6B), PanGu-α, Yuan 1.0 on all downstream tasks.  ... 
arXiv:2112.12731v1 fatcat:hact2hlojrdydhxcnzozmb7kee

Controllable Generation from Pre-trained Language Models via Inverse Prompting [article]

Xu Zou, Da Yin, Qingyang Zhong, Ming Ding, Zhilin Yang, Jie Tang
2021 arXiv   pre-print
Empirically, we pre-train a large-scale Chinese language model to perform a systematic study using human evaluation on the tasks of open-domain poem generation and open-domain long-form question answering  ...  Large-scale pre-trained language models have demonstrated strong capabilities of generating realistic text. However, it remains challenging to control the generation results.  ...  Different from Jiuge, we employ a large-scale language model pretrained on a general-purpose corpus and leverage inverse prompting to enhance its generation qualities.  ... 
arXiv:2103.10685v2 fatcat:x4fupknpybfcbjktj26derhdti

BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent for Few-Shot Learning [article]

Tianxiang Sun, Zhengfu He, Hong Qian, Xuanjing Huang, Xipeng Qiu
2022 arXiv   pre-print
Although BBT has achieved comparable performance to full model tuning on simple classification tasks under few-shot settings, it requires pre-trained prompt embedding to match model tuning on hard tasks  ...  Black-Box Tuning (BBT) is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.  ...  By this, we can formulate various downstream tasks into a general-purpose (masked) language modeling task and utilize the pre-trained (masked) language modeling head to solve them.  ... 
arXiv:2205.11200v1 fatcat:blvjzx52ujhatgyimvwkdo7dha

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators [article]

Zhixing Tan, Xiangwen Zhang, Shuo Wang, Yang Liu
2022 arXiv   pre-print
Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks.  ...  We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language models to translation tasks.  ...  Zhang et al. (2021) investigate using prompt tuning for steering CPM-2 model to the WMT20 English-Chinese translation task.  ... 
arXiv:2110.06609v2 fatcat:rygisaagqbdrzllgakvo5ardd4

WeLM: A Well-Read Pre-trained Language Model for Chinese [article]

Hui Su, Xiao Zhou, Houjin Yu, Yuwen Chen, Zilin Zhu, Yang Yu, Jie Zhou
2022 arXiv   pre-print
Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks.  ...  In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations.  ...  efficient training of large language models.  ... 
arXiv:2209.10372v3 fatcat:372a44357jfjzchekgp45ecky4
« Previous Showing results 1 — 15 out of 407 results