A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2023; you can also visit the original URL.
The file type is application/pdf
.
CPLP:tuítes – The pluricentric corpus of tweets in Portuguese language
2023
Zenodo
This work presents the process of collecting, preparing and publishing the Pluricentric Corpus of Tweets in Portuguese Language (CPLP:tuítes). CPLP:tuítes is a corpus composed of 125,827 tweets and a total of 2,633,507 tokens. The tweets come from 53 newspaper accounts or news providers in Angola, Brazil, Cape Verde, Guinea-Bissau, Mozambique, Portugal, and São Tomé and Príncipe. This corpus is part of the Portuguese Database (BDP), a repository that will offer free access to corpora, as well
doi:10.5281/zenodo.7649627
fatcat:uhmtqe6u6zh2noyg56d4wfdmfm