1,649 Hits in 5.7 sec

Pre-Training with Whole Word Masking for Chinese BERT [article]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu
2019 arXiv   pre-print
Recently, an upgraded version of BERT has been released with Whole Word Masking (WWM), which mitigate the drawbacks of masking partial WordPiece tokens in pre-training BERT.  ...  In this technical report, we adapt whole word masking in Chinese text, that masking the whole word instead of masking Chinese characters, which could bring another challenge in Masked Language Model (MLM  ...  Acknowledgments Yiming Cui would like to thank TensorFlow Research Cloud (TFRC) program for supporting this research.  ... 
arXiv:1906.08101v2 fatcat:ikghqulquzeklbwaxbpelri3n4

Revisiting Pre-Trained Models for Chinese Natural Language Processing [article]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu
2020 arXiv   pre-print
In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community  ...  of the pre-trained language models.  ...  The first author was partially supported by the Google TensorFlow Research Cloud (TFRC) program for Cloud TPU access.  ... 
arXiv:2004.13922v2 fatcat:exnyfrndhbfthgcsylugfguyui

Conceptualized Representation Learning for Chinese Biomedical Text Mining [article]

Ningyu Zhang, Qianghuai Jia, Kangping Yin, Liang Dong, Feng Gao, Nengwei Hua
2020 arXiv   pre-print
In this paper, we investigate how the recently introduced pre-trained language model BERT can be adapted for Chinese biomedical corpora and propose a novel conceptualized representation learning approach  ...  We examine the effectiveness of Chinese pre-trained models: BERT, BERT-wwm, RoBERTa, and our approach. Experimental results on the benchmark show that our approach could bring significant gain.  ...  MC-BERT is our method; w/o entity is the method without whole entity masking; w/o span is the method without whole span masking; BERT-wwm [1] is a whole word masking pretraining approach for the Chinese  ... 
arXiv:2008.10813v1 fatcat:dqt4id37gjdalcqpxkvemkb7tm

NEZHA: Neural Contextualized Representation for Chinese Language Understanding [article]

Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu
2021 arXiv   pre-print
The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking  ...  In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning  ...  In [8] , whole word masking (WWM) strategy is found to be more effective than random masking for training BERT.  ... 
arXiv:1909.00204v3 fatcat:6sscgspukjg23at4dvsqhigsbu

PERT: Pre-training BERT with Permuted Language Model [article]

Yiming Cui, Ziqing Yang, Ting Liu
2022 arXiv   pre-print
Moreover, we also apply whole word masking and N-gram masking to improve the performance of PERT. We carried out extensive experiments on both Chinese and English NLU benchmarks.  ...  In this paper, we propose a new PLM called PERT for natural language understanding (NLU). PERT is an auto-encoding model (like BERT) trained with Permuted Language Model (PerLM).  ...  ACKNOWLEDGMENTS Yiming Cui would like to thank TPU Research Cloud (TRC) program for Cloud TPU access.  ... 
arXiv:2203.06906v1 fatcat:kpjmaqa7wffbjhgjkbwcmonuji

MarkBERT: Marking Word Boundaries Improves Chinese BERT [article]

Linyang Li, Yong Dai, Duyu Tang, Zhangyin Feng, Cong Zhou, Xipeng Qiu, Zenglin Xu, Shuming Shi
2022 arXiv   pre-print
We present a Chinese BERT model dubbed MarkBERT that uses word information.  ...  Besides, our model has two additional benefits: first, it is convenient to add word-level learning objectives over markers, which is complementary to traditional character and sentence-level pre-training  ...  In language understanding tasks, we compare with the RoBERTawwm-ext (Cui et al., 2019a) baseline, which is a whole-word-mask trained Chinese pre-trained models.  ... 
arXiv:2203.06378v1 fatcat:gq6y54ma4rb4xjacpa272rnhxy

SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion

Jiahao Chen, Chenjie Cao, Xiuyan Jiang
2020 International Conference on Language Resources and Evaluation  
Moreover, a word segmentation method called SentencePiece is utilized to further enhance Chinese Bert performance for tasks with long texts.  ...  Hence a new pre-training task called Sentence Insertion (SI) is proposed in this paper for Chinese query-passage pairs NLP tasks including answer span prediction, retrieval question answering and sentence  ...  In this table, WWM means the Whole Word Masking for Chinese. DA indicates the data augmentation for samples with unpaired queries and passages.  ... 
dblp:conf/lrec/ChenCJ20 fatcat:b2ub26e3zfbe5anwbsyc7gyili

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT [article]

Xingchen Liu and Yawen Li and Yingxia Shao and Ang Li and Jian Liang
2022 arXiv   pre-print
Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).  ...  From the perspective of word vectors, pre-training is carried out by means of whole word mask of proprietary vocabulary in the automotive field, and then training data is carried out through the strategy  ...  At the same time, we tested the model after pre-training with whole word mask in the field of car reviews.  ... 
arXiv:2206.02389v1 fatcat:f6madd2syvh5zbyxy2nhgd3num

ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders [article]

Yan Song, Tong Zhang, Yonggang Wang, Kai-Fu Lee
2021 arXiv   pre-print
To further enhance the encoders, in this paper, we propose to pre-train n-gram-enhanced encoders with a large volume of data and advanced techniques for training.  ...  Pre-trained text encoders have drawn sustaining attention in natural language processing (NLP) and shown their capability in obtaining promising results in different tasks.  ...  The Effect of Whole N-gram Masking Whole word masking is proved to be useful in learning many previous pre-trained models.  ... 
arXiv:2105.01279v1 fatcat:wf72ugznbrckxkqapatmwh3bpa

Chinese named entity recognition model based on BERT

Hongshuai Liu, Ge Jun, Yuanyuan Zheng, I. Barukčić
2021 MATEC Web of Conferences  
In the model, we embeded the BERT pre-training language model that adopts the Whole Word Mask strategy, and added a document-level attention.  ...  Nowadays, most deep learning models ignore Chinese habits and global information when processing Chinese tasks. To solve this problem, we constructed the BERT-BiLSTM-Attention-CRF model.  ...  Different from the BERT in the Single Word Mask (SWM) [12] , we use the Whole Word Mask(WWM) [13] in accordance with the Chinese habit when pre-processing the sentence.  ... 
doi:10.1051/matecconf/202133606021 fatcat:l3qvdcirbfh4ri5qu6duhubpla

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models [article]

Yuxuan Lai, Yijia Liu, Yansong Feng, Songfang Huang, Dongyan Zhao
2021 arXiv   pre-print
In this work, we propose a novel pre-training paradigm for Chinese -- Lattice-BERT, which explicitly incorporates word representations along with characters, thus can model a sentence in a multi-granularity  ...  Chinese pre-trained language models usually process text as a sequence of characters, while ignoring more coarse granularity, e.g., words.  ...  For any correspondence, please contact Yansong Feng.  ... 
arXiv:2104.07204v2 fatcat:2xxwbavxqbc6bbodlmdhfirn5a

ERNIE: Enhanced Representation through Knowledge Integration [article]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu
2019 arXiv   pre-print
Entity-level strategy masks entities which are usually composed of multiple words.Phrase-level strategy masks the whole phrase which is composed of several words standing together as a conceptual unit.Experimental  ...  Inspired by the masking strategy of BERT, ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking.  ...  different masking strategy and dataset size pre-train dataset size mask strategy dev Accuracy test Accuracy 10% of all word-level(chinese character) 77.7% 76.8% 10% of all word-level&phrase-level  ... 
arXiv:1904.09223v1 fatcat:tgbhnpobindobkzv5zwpnw7kg4

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation [article]

Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu
2021 arXiv   pre-print
Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively.  ...  In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT).  ...  The first line of works follows BERT and uses MLM with whole word masking strategy to pre-train Transformer encoder, such as Chinese versions of BERT and RoBERTa (Cui et al. 2019a) , NEZHA , ZEN (Diao  ... 
arXiv:2109.05729v3 fatcat:a65tgy3zvzehbprz6i3v2zq7bu

Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets [article]

Changchang. Zeng, Shaobo. Li
2021 arXiv   pre-print
With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, span masking, and so on.  ...  If this hypothesis is true, it can guide us how to pre-train the MLM model with a relatively suitable mask length distribution for MRC task.  ...  Therefore, Cui et al. (2019) applied the whole word masking to Chinese [6] , and masked the whole word instead of masking Chinese characters. Whole Word Masking.  ... 
arXiv:2110.15712v1 fatcat:owpminh76nchrocxhs6tusd27u

ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations [article]

Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang
2019 arXiv   pre-print
As a result, potential word or phase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT).  ...  Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.  ...  They used an existing segmenter to produce possible words in the input sentences, and then train a standard BERT on the segmented texts by masking whole words.  ... 
arXiv:1911.00720v1 fatcat:k6r7slaefzdz5hjiy5zkpbr6mm
« Previous Showing results 1 — 15 out of 1,649 results