Social Media Variety Geolocation with geoBERT

Yves Scherrer, Nikola Ljubesic
2021 Workshop on NLP for Similar Languages, Varieties and Dialects  
This paper describes the Helsinki-Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation. Following our successful participation at VarDial 2020, we again propose constrained and unconstrained systems based on the BERT architecture. In this paper, we report experiments with different tokenization settings and different pre-trained models, and we contrast our parameter-free regression approach with various classification schemes proposed by other participants
more » ... t VarDial 2020. Both the code and the best-performing pre-trained models are made freely available.
dblp:conf/vardial/ScherrerL21 fatcat:5xvxwf6candrnod7vzaeigzfz4