Filters








247 Hits in 3.5 sec

Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank [article]

Ethan C. Chau, Lucy H. Lin, Noah A. Smith
2020 arXiv   pre-print
This presents a challenge for language varieties unfamiliar to these models, whose labeled and unlabeled data is too limited to train a monolingual model effectively.  ...  We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings.  ...  Acknowledgments We thank Jungo Kasai, Phoebe Mulcaire, members of UW NLP, and the anonymous reviewers for their helpful comments on preliminary versions of this paper.  ... 
arXiv:2009.14124v2 fatcat:lz7zkabbzfea3nl75gf4pwf4lq

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing [article]

Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen
2021 arXiv   pre-print
This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages.  ...  It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages.  ...  Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.  ... 
arXiv:2101.03289v5 fatcat:pqisz3mc3fb2pnuyo3eo5h6jjm

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing [article]

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
2021 arXiv   pre-print
Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods.  ...  Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT.  ...  ACKNOWLEDGMENTS Kalyan would like to thank his father Katikapalli Subramanyam for giving a) $750 to buy a new laptop, 24inch monitor and study table. b) $180 for one year subscription of Medium, Overleaf  ... 
arXiv:2108.05542v2 fatcat:4uyj6uut65d37hfi7yss2fek6q

Pre-Training with Whole Word Masking for Chinese BERT [article]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu
2019 arXiv   pre-print
In this technical report, we adapt whole word masking in Chinese text, that masking the whole word instead of masking Chinese characters, which could bring another challenge in Masked Language Model (MLM  ...  Recently, an upgraded version of BERT has been released with Whole Word Masking (WWM), which mitigate the drawbacks of masking partial WordPiece tokens in pre-training BERT.  ...  Acknowledgments Yiming Cui would like to thank TensorFlow Research Cloud (TFRC) program for supporting this research.  ... 
arXiv:1906.08101v2 fatcat:ikghqulquzeklbwaxbpelri3n4

Revisiting Pre-Trained Models for Chinese Natural Language Processing [article]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu
2020 arXiv   pre-print
In this paper, we target on revisiting Chinese pre-trained language models to examine their effectiveness in a non-English language and release the Chinese pre-trained language model series to the community  ...  We carried out extensive experiments on eight Chinese NLP tasks to revisit the existing pre-trained language models as well as the proposed MacBERT.  ...  The first author was partially supported by the Google TensorFlow Research Cloud (TFRC) program for Cloud TPU access.  ... 
arXiv:2004.13922v2 fatcat:exnyfrndhbfthgcsylugfguyui

A Turkish Question Answering System Based on Deep Learning Neural Networks

Cavide Balkı GEMİRTER, Dionysis GOULARAS
2021 Zeki sistemler teori ve uygulamaları dergisi  
The proposed methodology is not only proper to the Turkish language, but can also be adapted to any other language for performing various NLP tasks.  ...  More specifically, BERT algorithm is used for the generation of the language model, followed by a fine-tuning procedure for performing a machine reading for question answering (MRQA) task.  ...  Kemal Oflazer for sharing the News Corpus (Tr), described in Section 3.1 and used in the pre-training task and the QA tasks for open and closed domain were generated using BERT neural network.  ... 
doi:10.38016/jista.815823 fatcat:lwb4ysemgfhvreoyvsktfybte4

AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models [article]

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
2021 arXiv   pre-print
We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs).  ...  These models combine the power of transformers, transfer learning, and self-supervised learning (SSL).  ...  BPE chooses the most frequent symbol pair while Word-Piece uses a language model to choose the symbol pair. BERT [2] , DistilBERT [80] , and ELECTRA model use WordPiece embeddings.  ... 
arXiv:2105.00827v2 fatcat:yzsr4tg7lrexzinrn5psw5r5q4

A Robust Self-Learning Framework for Cross-Lingual Text Classification

Xin Dong, Gerard de Melo
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
However, for other languages, relevant training data may be lacking, while state-of-the-art deep learning methods are known to be data-hungry.  ...  In this paper, we present an elegantly simple robust self-learning framework to include unlabeled non-English samples in the fine-tuning process of pretrained multilingual representation models.  ...  They further propose an improved variant (CLDFA-KCNN) that utilizes adversarial training for domain adaptation within a single language.  ... 
doi:10.18653/v1/d19-1658 dblp:conf/emnlp/DongM19 fatcat:h67fgxty6bftdnwdiaqufmpdxy

ERNIE: Enhanced Representation through Knowledge Integration [article]

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu
2019 arXiv   pre-print
We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration).  ...  results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity  ...  In order to get reliable word representation, neural language models are designed to learn word cooccurrence and then obtain word embedding with unsupervised learning.  ... 
arXiv:1904.09223v1 fatcat:tgbhnpobindobkzv5zwpnw7kg4

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT [article]

Shijie Wu, Mark Dredze
2019 arXiv   pre-print
A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task.  ...  This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI  ...  In mBERT, the WordPiece modeling strategy allows the model to share embeddings across languages.  ... 
arXiv:1904.09077v2 fatcat:tvxheufrerhkhphamtxnokrpdu

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Shijie Wu, Mark Dredze
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task.  ...  This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero-shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI  ...  In mBERT, the WordPiece modeling strategy allows the model to share embeddings across languages.  ... 
doi:10.18653/v1/d19-1077 dblp:conf/emnlp/WuD19 fatcat:gbeccfe4ondmnff6ywf75gxagq

Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets [article]

Changchang. Zeng, Shaobo. Li
2021 arXiv   pre-print
MRC datasets are created for these tasks; (3) we also have pre-trained four masked language models according to the answer length distributions of these datasets; (4) ablation experiments are conducted  ...  The masked language model (MLM) is a self-supervised training objective that widely used in various PTMs.  ...  Interpretability of Masked Language Models With various advanced MLMs, many pre-trained language models have achieved the stateof-the-art performance when adapted to MRC task.  ... 
arXiv:2110.15712v1 fatcat:owpminh76nchrocxhs6tusd27u

A survey on natural language processing (nlp) and applications in insurance [article]

Antoine Ly, Benno Uthayasooriyar, Tingting Wang
2020 arXiv   pre-print
have been stored for many years.  ...  techniques.After giving a general overview on the evolution of text mining during the past few years,we share about how to conduct a full study with text mining and share some examples to serve those models  ...  Jieba for Chinese, Treetagger for French or NLTK for English). Recently, models like BERT detailed in section 10 page 27 can deal with multi-language data with a unique model.  ... 
arXiv:2010.00462v1 fatcat:f5rcvwpcq5garfy6kw425bhpki

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation [article]

Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting
2021 arXiv   pre-print
Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step.  ...  any fixed vocabulary may limit a model's ability to adapt.  ...  We would also like to thank Martin Njoroge and Nanjala Misiko for their consultations on the Swahili examples, Diana Akrong for consulting on Twi orthography, and Waleed Ammar for consulting on Arabic  ... 
arXiv:2103.06874v3 fatcat:7c7o43utxfe7nnyvznzbl5pjem

A clinical specific BERT developed with huge size of Japanese clinical narrative [article]

Yoshimasa Kawazoe, Daisaku Shibata, Emiko Shinohara, Eiji Aramaki, Kazuhiko Ohe
2020 medRxiv   pre-print
Generalized language models that pre-trained with a large corpus have achieved great performance on natural language tasks.  ...  While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine.  ...  The novelty of BERT is that it took an idea of learning word embeddings one step further, by learning each embedding vector directly from the sequence.  ... 
doi:10.1101/2020.07.07.20148585 fatcat:d7sur7q3gbgxbcr6d5oe74xkvm
« Previous Showing results 1 — 15 out of 247 results