6 Hits in 1.6 sec

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [article]

Frederico Souza, João Filho
2022 arXiv   pre-print
, especially for the Brazilian Portuguese language.  ...  The experiments include BERT models trained with Brazilian Portuguese corpora and the multilingual version, contemplating multiple aggregation strategies and open-source datasets with predefined training  ...  To answer this question, we analyzed three different BERT variants: the BERTimbau Base and Large [3] , a Portuguese BERT variant trained with the Brazilian Web as Corpus (BrWaC) [22] , and the Multilingual  ... 
arXiv:2201.03382v1 fatcat:s2dctdj3nndszipi2dcey4ygry

NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese [article]

Sidney Evaldo Leal and Magali Sanches Duran and Carolina Evaristo Scarton and Nathan Siegle Hartmann and Sandra Maria Aluísio
2021 arXiv   pre-print
School I and II (Final Years); (ii) a new predictor of textual complexity for the corpus of original and simplified texts of the PorSimples project; (iii) a complexity prediction model for school grades  ...  , to assess textual complexity in Brazilian Portuguese (BP).  ...  series of the genres Family and Animation in Brazilian Portuguese, made available by Open Subtitles 23 in 2019.  ... 
arXiv:2201.03445v1 fatcat:4bzrdqdynjgyljy6fvu2hpxohu

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment [article]

Flávio Nakasato Cação, Marcos Menon José, André Seidel Oliveira, Stefano Spindola, Anna Helena Reali Costa, Fábio Gagliardi Cozman
2021 arXiv   pre-print
As training data, we collected questions from open-domain datasets, as well as content from the Portuguese Wikipedia and news from the press.  ...  Our QA systems focus on the Portuguese language, thus offering resources not found elsewhere in the literature.  ...  We also gratefully acknowledge support from Conselho Nacional de Desenvolvimento Cientiífico e Tecnológico (CNPq) (grants 312180/2018-7 and 310085/2020-9) and the Center for Artificial Intelligence (C4AI-USP  ... 
arXiv:2110.10015v1 fatcat:lo4nknat2bcnxdhduhryi4e3le

A cost-benefit analysis of cross-lingual transfer methods [article]

Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
2021 arXiv   pre-print
An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.  ...  Based on these results, we question the need for manually labeled training data in a target language. Code and translated datasets are available at  ...  Acknowledgment This research was funded by a grant from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) 2020/09753-5.  ... 
arXiv:2105.06813v4 fatcat:7pn6yfc3ibfvxkf3sccuquiqya

Unsupervised Compositionality Prediction of Nominal Compounds

Silvio Cordeiro, Aline Villavicencio, Marco Idiart, Carlos Ramisch
2018 Computational Linguistics  
We extend the evaluation reported in Cordeiro et al. (2016) not only by adding Portuguese, but also by evaluating additional parameters: corpus size, composition functions, and new DSMs.  ...  General crosslingual analyses reveal the impact of morphological variation and corpus size in the ability of the model to predict compositionality, and of a uniform combination of the components for best  ...  terms of Brazilian federal law No. 8.248/91.  ... 
doi:10.1162/coli_a_00341 fatcat:6wfaohmkkfcjnmxke64wob5qxm

Evaluation of Synthetic Datasets Generation for Intent Classification Tasks in Portuguese

Robson T. Paula, Décio G. Aguiar Neto, Davi Romero, Paulo T. Guerra
2021 Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL 2021)   unpublished
Intent classification is an essential task for chatbots where it aims to identify what the user wants in a certain dialogue.  ...  We intend to simulate the task of migrating a search-based portal to an interactive dialogue-based information service by using artificial datasets for initial model training.  ...  This new model based on BERT was pre-trained on a large Brazilian Portuguese corpus named BrWaC (Brazilian Web as Corpus) using the same pre-training method as BERT. 2.1.2.  ... 
doi:10.5753/stil.2021.17806 fatcat:6fb5wecv4rd4xing7cf33wjxl4