Filters








9 Hits in 3.2 sec

CharBERT: Character-aware Pre-trained Language Model

Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu
2020 Proceedings of the 28th International Conference on Computational Linguistics   unpublished
In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems.  ...  Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable.  ...  Related Work Pre-trained Language Model.  ... 
doi:10.18653/v1/2020.coling-main.4 fatcat:aoykn7se35fdpib2int2j37w6e

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? [article]

Arij Riabi, Benoît Sagot, Djamé Seddah
2021 arXiv   pre-print
pre-trained on large multilingual and monolingual models.  ...  We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture  ...  This dependency on large data sets for pre-training is a severe issue for low-resource languages, despite the emergence of large and successful multilingual pre-trained language models (Muller et al.,  ... 
arXiv:2110.13658v1 fatcat:uwqdqzr3yfaxphx3yocomkuurm

Learning to Look Inside: Augmenting Token-Based Encoders with Character-Level Information [article]

Yuval Pinter, Amanda Stent, Mark Dredze, Jacob Eisenstein
2021 arXiv   pre-print
Commonly-used transformer language models depend on a tokenization schema which sets an unchangeable subword vocabulary prior to pre-training, destined to be applied to all downstream tasks regardless  ...  Recent work has shown that "token-free" models can be trained directly on characters or bytes, but training these models from scratch requires substantial computational resources, and this implies discarding  ...  Our method bridges the gap between the two representation modules via an additional pre-training sequence where the language modeling objective is supplemented with training a character-level encoding  ... 
arXiv:2108.00391v1 fatcat:j5j6rspp6bbw3mrcftnmfwpzri

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing [article]

Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha
2021 arXiv   pre-print
These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch.  ...  Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT.  ...  Models like CharacterBERT [66] , AlphaBERT [120] use character embeddings, and models like CharBERT [121] use both character and sub-word embeddings.  ... 
arXiv:2108.05542v2 fatcat:4uyj6uut65d37hfi7yss2fek6q

Adapting vs. Pre-training Language Models for Historical Languages

Enrique Manjavacas, Lauren Fonteyn
2022 Journal of Data Mining and Digital Humanities  
An appealing alternative, then, is to employ existing 'general purpose' models (pre-trained on present-day language) and subsequently adapt them to a specific domain by further pre-training.  ...  adapting a present-day language model.  ...  In the case of historical material, Baptiste et al. [2021] have shown that CharBERT [Ma et al., 2020 ]a BERT variant that processes input tokens character by character -produces more robust results  ... 
doi:10.46298/jdmdh.9152 fatcat:ovclsv2tafauvfvtyw4ix4yhmu

Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?

Arij Riabi, Benoît Sagot, Djamé Seddah
2021 Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)   unpublished
We show that a characterbased model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture  ...  Confirming these results a on much larger data set of noisy French user-generated content, we argue that such character-based language models can be an asset for NLP in low-resource and high language variability  ...  even if the language is not one of the pre-training languages.  ... 
doi:10.18653/v1/2021.wnut-1.47 fatcat:a3kimj2hxzb6vbeq5vjb2q6fzy

Leveraging Discourse Rewards for Document-Level Neural Machine Translation

Inigo Jauregi Unanue, Nazanin Esmaili, Gholamreza Haffari, Massimo Piccardi
2020 Proceedings of the 28th International Conference on Computational Linguistics   unpublished
Models for Text Generation with Named Entities Yash Agarwal, Devansh Batra and Ganesh Bagler 15:48-15:54 CharBERT: Character-aware Pre-trained Language Model Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu  ...  Profiling of a Neural Language Model Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta and Giulia Venturi 17:12-17:18 IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model  ... 
doi:10.18653/v1/2020.coling-main.395 fatcat:gjghdqnknjdp7m6p6noobs7nxa

LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence Semantic Matching

Kun Zhang, Guangyi Lv, Le Wu, Enhong Chen, Qi Liu, Meng Wang
2021
Much recent progress has been made in this area, especially attention-based methods and pre-trained language model based methods.  ...  Moreover, selecting one small region in dynamic re-read attention seems insufficient for sentence semantics, and employing pre-trained language models as input encoders will introduce incomplete and fragile  ...  Language Model based methods: In order to make full use of existing large language corpora, various pre-trained language models have been proposed.  ... 
doi:10.48550/arxiv.2108.02915 fatcat:hkkcq2qj2bbw3fkzru6agwlrh4

Open Proverbs: Exploring Genre and Openness in Proverbs 10:1-22:16

Suzanna Ruth Millar, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository, Katharine Julia Dell
2018
As 'didactic' texts, the sayings shape the worldview, character and intellect of their students. As 'proverbs', they apply to specific situations with specific purposes.  ...  Ch. 4 considers 'character' terms (e.g. wise/foolish, righteous/wicked). I use cognitive linguistic theories to examine the terms as open categories with 'prototype structure'.  ...  It also provides stimulation for intellectual training and character development.  ... 
doi:10.17863/cam.24279 fatcat:pv4izvzyknffzf2ygpovhtopt4