130 Hits in 1.2 sec

Translation Error Detection as Rationale Extraction [article]

Marina Fomicheva, Lucia Specia, Nikolaos Aletras
2021 arXiv   pre-print
Recent Quality Estimation (QE) models based on multilingual pre-trained representations have achieved very competitive results when predicting the overall quality of translated sentences. Predicting translation errors, i.e. detecting specifically which words are incorrect, is a more challenging task, especially with limited amounts of training data. We hypothesize that, not unlike humans, successful QE models rely on translation errors to predict overall sentence quality. By exploring a set of
more » ... eature attribution methods that assign relevance scores to the inputs to explain model predictions, we study the behaviour of state-of-the-art sentence-level QE models and show that explanations (i.e. rationales) extracted from these models can indeed be used to detect translation errors. We therefore (i) introduce a novel semi-supervised method for word-level QE and (ii) propose to use the QE task as a new benchmark for evaluating the plausibility of feature attribution, i.e. how interpretable model explanations are to humans.
arXiv:2108.12197v1 fatcat:d4jmmhtdfzbebft7gwdilwnybm

Complaint Identification in Social Media with Transformer Networks [article]

Mali Jin, Nikolaos Aletras
2020 arXiv   pre-print
Acknowledgements Nikolaos Aletras is supported by ESRC grant ES/T012714/1.  ... 
arXiv:2010.10910v1 fatcat:ushn7luf2vb23k66fgcfntrooa

Active Learning by Acquiring Contrastive Examples [article]

Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras
2021 arXiv   pre-print
Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive
more » ... ive Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.
arXiv:2109.03764v1 fatcat:6o3lzxumcndg7bcn7fcmd4lpui

Point-of-Interest Type Prediction using Text and Images [article]

Danae Sánchez Villegas, Nikolaos Aletras
2021 arXiv   pre-print
Mali Jin and Nikolaos Aletras. 2020. Complaint identification in social media with transformer networks.  ...  ., 2020) and complaints Jin and Aletras, 2020, 2021).  ... 
arXiv:2109.00602v1 fatcat:n2xo2hvqq5dhnpslv7iog3jzoy

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling [article]

Atsuki Yamaguchi, George Chrysostomou, Katerina Margatina, Nikolaos Aletras
2021 arXiv   pre-print
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction).
more » ... r, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.
arXiv:2109.01819v1 fatcat:wa6y6pixvngerjbkehntz6fjba

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics [article]

Areej Alokaili, Nikolaos Aletras, Mark Stevenson
2019 arXiv   pre-print
; Aletras and Mittal, 2017; Sorodoc et al., 2017) and corpus pre-processing (Schofield et al., 2017) .  ...  Coherence is the average coherence of the topics, computed using NPMI (Aletras and Stevenson, 2013a) 9 .  ... 
arXiv:1903.12542v1 fatcat:lo7ihldigbawfgvhqfhmm7gjzy

Knowledge Distillation for Quality Estimation [article]

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
2021 arXiv   pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
more » ... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
arXiv:2107.00411v1 fatcat:bkakpl4mwrcajnwyh5ghzw2t3m

Predicting Twitter User Socioeconomic Attributes with Network and Language Information [article]

Nikolaos Aletras, Benjamin Paul Chamberlain
2018 arXiv   pre-print
Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been
more » ... ored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.
arXiv:1804.04095v1 fatcat:f75jv5b3vneobdm7yodc2fdfju

Labeling Topics with Images using Neural Networks [article]

Nikolaos Aletras, Arpit Mittal
2017 arXiv   pre-print
Aletras, N., Stevenson, M.: Representing topics using images. In: NAACL-HLT. pp. 158–167 (2013) 4. Aletras, N., Stevenson, M.: Labelling topics using unsupervised graph-based meth- ods.  ...  Labeling Topics with Images using a Neural Network Nikolaos  ... 
arXiv:1608.00470v2 fatcat:a43y2bhaaba3tmensy3pbeyn7u

Flexible Instance-Specific Rationalization of NLP Models [article]

George Chrysostomou, Nikolaos Aletras
2021 arXiv   pre-print
References Chrysostomou, G.; and Aletras, N. 2021b.  ...  Acknowledgments Chrysostomou, G.; and Aletras, N. 2021a.  ... 
arXiv:2104.08219v2 fatcat:themed5tczhevfk5l3vxucjtqq

Automatic Identification and Classification of Bragging in Social Media [article]

Mali Jin, Daniel Preoţiuc-Pietro, A. Seza Doğruöz, Nikolaos Aletras
2022 arXiv   pre-print
In future work, we plan to study the extent to which bragging is used across various locations Sánchez Villegas and Aletras, 2021) and languages and how it is employed by users across contexts.  ...  BERTweet with Linguistic Features We inject linguistic knowledge that could be related to bragging to the BERTweet model with a similar method proposed by Jin and Aletras (2021), 6 that was found to be  ... 
arXiv:2203.05840v1 fatcat:psagaeb4qzczdi6jvascesfgcy

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience [article]

George Chrysostomou, Nikolaos Aletras
2021 arXiv   pre-print
George Chrysostomou and Nikolaos Aletras. 2021. Im- proving the faithfulness of attention-based explana- tions with task-specific information for text classi- fication.  ...  due to significant information mixing in higher layers of the model, with recent studies showing improvements in the faithfulness of attention-based explanations by addressing this (Chrysostomou and Aletras  ... 
arXiv:2108.13759v1 fatcat:k3ot3e2iijc45p6occplexg6gq

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection [article]

Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr
2022 arXiv   pre-print
Future work includes applying our method on other cross-domain text classification tasks and exploring how explanation faithfulness can be improved in out-of-domain settings (Chrysostomou and Aletras,  ... 
arXiv:2203.12536v1 fatcat:4gbjje2bqvboxcn767vhe5blbu

LEGAL-BERT: The Muppets straight out of Law School [article]

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos
2020 arXiv   pre-print
., 2019a) contains cases from the European Court of Human Rights (Aletras et al., 2016) and can be used for binary and multi-label text classification.  ... 
arXiv:2010.02559v1 fatcat:x4v6mqqbgje2paam3ef3ggqpbi

Combining Humor and Sarcasm for Improving Political Parody Detection [article]

Xiao Ao, Danae Sánchez Villegas, Daniel Preoţiuc-Pietro, Nikolaos Aletras
2022 arXiv   pre-print
This example also highlights the similarities between parody and real tweets, which may pose issues to misinformation classification systems (Mu and Aletras, 2020 These figurative devices have so far  ... 
arXiv:2205.03313v1 fatcat:vx5vkojv55hghnxvwqmukqqy4a
« Previous Showing results 1 — 15 out of 130 results