A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Translation Error Detection as Rationale Extraction
[article]
2021
arXiv
pre-print
Recent Quality Estimation (QE) models based on multilingual pre-trained representations have achieved very competitive results when predicting the overall quality of translated sentences. Predicting translation errors, i.e. detecting specifically which words are incorrect, is a more challenging task, especially with limited amounts of training data. We hypothesize that, not unlike humans, successful QE models rely on translation errors to predict overall sentence quality. By exploring a set of
arXiv:2108.12197v1
fatcat:d4jmmhtdfzbebft7gwdilwnybm
more »
... eature attribution methods that assign relevance scores to the inputs to explain model predictions, we study the behaviour of state-of-the-art sentence-level QE models and show that explanations (i.e. rationales) extracted from these models can indeed be used to detect translation errors. We therefore (i) introduce a novel semi-supervised method for word-level QE and (ii) propose to use the QE task as a new benchmark for evaluating the plausibility of feature attribution, i.e. how interpretable model explanations are to humans.
Complaint Identification in Social Media with Transformer Networks
[article]
2020
arXiv
pre-print
Acknowledgements Nikolaos Aletras is supported by ESRC grant ES/T012714/1. ...
arXiv:2010.10910v1
fatcat:ushn7luf2vb23k66fgcfntrooa
Active Learning by Acquiring Contrastive Examples
[article]
2021
arXiv
pre-print
Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive
arXiv:2109.03764v1
fatcat:6o3lzxumcndg7bcn7fcmd4lpui
more »
... ive Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.
Point-of-Interest Type Prediction using Text and Images
[article]
2021
arXiv
pre-print
Mali Jin and Nikolaos Aletras. 2020. Complaint identification in social media with transformer networks. ...
., 2020) and complaints Jin and Aletras, 2020, 2021). ...
arXiv:2109.00602v1
fatcat:n2xo2hvqq5dhnpslv7iog3jzoy
Frustratingly Simple Pretraining Alternatives to Masked Language Modeling
[article]
2021
arXiv
pre-print
Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction).
arXiv:2109.01819v1
fatcat:wa6y6pixvngerjbkehntz6fjba
more »
... r, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics
[article]
2019
arXiv
pre-print
; Aletras and Mittal, 2017; Sorodoc et al., 2017) and corpus pre-processing (Schofield et al., 2017) . ...
Coherence is the average coherence of the topics, computed using NPMI (Aletras and Stevenson, 2013a) 9 . ...
arXiv:1903.12542v1
fatcat:lo7ihldigbawfgvhqfhmm7gjzy
Knowledge Distillation for Quality Estimation
[article]
2021
arXiv
pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models
arXiv:2107.00411v1
fatcat:bkakpl4mwrcajnwyh5ghzw2t3m
more »
... ined on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
Predicting Twitter User Socioeconomic Attributes with Network and Language Information
[article]
2018
arXiv
pre-print
Inferring socioeconomic attributes of social media users such as occupation and income is an important problem in computational social science. Automated inference of such characteristics has applications in personalised recommender systems, targeted computational advertising and online political campaigning. While previous work has shown that language features can reliably predict socioeconomic attributes on Twitter, employing information coming from users' social networks has not yet been
arXiv:1804.04095v1
fatcat:f75jv5b3vneobdm7yodc2fdfju
more »
... ored for such complex user characteristics. In this paper, we describe a method for predicting the occupational class and the income of Twitter users given information extracted from their extended networks by learning a low-dimensional vector representation of users, i.e. graph embeddings. We use this representation to train predictive models for occupational class and income. Results on two publicly available datasets show that our method consistently outperforms the state-of-the-art methods in both tasks. We also obtain further significant improvements when we combine graph embeddings with textual features, demonstrating that social network and language information are complementary.
Labeling Topics with Images using Neural Networks
[article]
2017
arXiv
pre-print
Aletras, N., Stevenson, M.: Representing topics using images. In: NAACL-HLT.
pp. 158–167 (2013)
4. Aletras, N., Stevenson, M.: Labelling topics using unsupervised graph-based meth-
ods. ...
Labeling Topics with Images using a Neural
Network
Nikolaos ...
arXiv:1608.00470v2
fatcat:a43y2bhaaba3tmensy3pbeyn7u
Flexible Instance-Specific Rationalization of NLP Models
[article]
2021
arXiv
pre-print
References Chrysostomou, G.; and Aletras, N. 2021b. ...
Acknowledgments Chrysostomou, G.; and Aletras, N. 2021a. ...
arXiv:2104.08219v2
fatcat:themed5tczhevfk5l3vxucjtqq
Automatic Identification and Classification of Bragging in Social Media
[article]
2022
arXiv
pre-print
In future work, we plan to study the extent to which bragging is used across various locations Sánchez Villegas and Aletras, 2021) and languages and how it is employed by users across contexts. ...
BERTweet with Linguistic Features We inject linguistic knowledge that could be related to bragging to the BERTweet model with a similar method proposed by Jin and Aletras (2021), 6 that was found to be ...
arXiv:2203.05840v1
fatcat:psagaeb4qzczdi6jvascesfgcy
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
[article]
2021
arXiv
pre-print
George Chrysostomou and Nikolaos Aletras. 2021. Im-
proving the faithfulness of attention-based explana-
tions with task-specific information for text classi-
fication. ...
due to significant information mixing in higher layers of the model, with recent studies showing improvements in the faithfulness of attention-based explanations by addressing this (Chrysostomou and Aletras ...
arXiv:2108.13759v1
fatcat:k3ot3e2iijc45p6occplexg6gq
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
[article]
2022
arXiv
pre-print
Future work includes applying our method on other cross-domain text classification tasks and exploring how explanation faithfulness can be improved in out-of-domain settings (Chrysostomou and Aletras, ...
arXiv:2203.12536v1
fatcat:4gbjje2bqvboxcn767vhe5blbu
LEGAL-BERT: The Muppets straight out of Law School
[article]
2020
arXiv
pre-print
., 2019a) contains cases from the European Court of Human Rights (Aletras et al., 2016) and can be used for binary and multi-label text classification. ...
arXiv:2010.02559v1
fatcat:x4v6mqqbgje2paam3ef3ggqpbi
Combining Humor and Sarcasm for Improving Political Parody Detection
[article]
2022
arXiv
pre-print
This example also highlights the similarities between parody and real tweets, which may pose issues to misinformation classification systems (Mu and Aletras, 2020 These figurative devices have so far ...
arXiv:2205.03313v1
fatcat:vx5vkojv55hghnxvwqmukqqy4a
« Previous
Showing results 1 — 15 out of 130 results