166 Hits in 1.4 sec

Simulating Meaning: an AI-Complete Problem [article]

Christina Lioma
2019 Figshare  
presentation given at the Ada Lovelace Day 2019 in Copenhagen, Denmark
doi:10.6084/m9.figshare.9961793.v1 fatcat:rcxsat25fbf6hpfkb45adkjdd4

Dependencies: Formalising Semantic Catenae for Information Retrieval [article]

Christina Lioma
2017 arXiv   pre-print
Birger Larsen and Christina Lioma. On the Need for and Provision for an 'IDEAL' Scholarly Information Retrieval Test Collection. 28. Brian Brost, Yevgeny Seldin, Ingemar J. Cox, and Christina Lioma.  ...  Niels Dalum Hansen, Christina Lioma, and Kåre Mølbak. Ensemble Learned Vaccination Uptake Prediction using Web Search Queries.  ... 
arXiv:1709.03742v1 fatcat:4fdrnsmwdnb4pe37b6ritmvnme

Disputats for the Degree of Doctor Scientarum [article]

Christina Lioma
2019 Figshare  
Disputats for the Degree of Doctor Scientarum, Copenhagen, Denmark, June 2019
doi:10.6084/m9.figshare.9961829.v2 fatcat:lcoomgchirhg5dvsmo6ohdcppq

Diagnostics-Guided Explanation Generation [article]

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
2021 arXiv   pre-print
Explanations shed light on a machine learning model's rationales and can aid in identifying deficiencies in its reasoning process. Explanation generation models are typically trained in a supervised way given human explanations. When such annotations are not available, explanations are often selected as those portions of the input that maximise a downstream task's performance, which corresponds to optimising an explanation's Faithfulness to a given model. Faithfulness is one of several
more » ... diagnostic properties, which prior work has identified as useful for gauging the quality of an explanation without requiring annotations. Other diagnostic properties are Data Consistency, which measures how similar explanations are for similar input instances, and Confidence Indication, which shows whether the explanation reflects the confidence of the model. In this work, we show how to directly optimise for these diagnostic properties when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks.
arXiv:2109.03756v1 fatcat:nvf52sijuncz3dfblslw7f2ltm

Generating Fact Checking Explanations [article]

Pepa Atanasova and Jakob Grue Simonsen and Christina Lioma and Isabelle Augenstein
2020 arXiv   pre-print
Most existing work on automated fact checking is concerned with predicting the veracity of claims based on metadata, social network spread, language used in claims, and, more recently, evidence supporting or denying claims. A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process -- generating justifications for verdicts on claims. This paper provides the first study of how these explanations can be generated automatically based
more » ... on available claim context, and how this task can be modelled jointly with veracity prediction. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system. The results of a manual evaluation further suggest that the informativeness, coverage and overall quality of the generated explanations are also improved in the multi-task model.
arXiv:2004.05773v1 fatcat:ubg5qw3gyvbkzblip4vqkj7sym

Rhetorical relations for information retrieval [article]

Christina Lioma and Birger Larsen and Wei Lu
2017 arXiv   pre-print
Preprint of: Christina Lioma, Birger Larsen, and Wei Lu. Rhetorical relations for information retrieval.  ... 
arXiv:1704.01599v1 fatcat:czpvgpcj2fgzvde2m7an3yfdnu

Part of Speech Based Term Weighting for Information Retrieval [article]

Christina Lioma, Roi Blanco
2017 arXiv   pre-print
Recently, Lioma & van Rijsbergen [18] proposed deriving term weights from POS n-grams, using Jespersen's Rank Theory.  ...  Practically this means that whereas in [18] Lioma & van Rijsbergen employ four different parameters, which they tune in order to optimise retrieval performance, our proposed weights are parameter-free  ... 
arXiv:1704.01617v1 fatcat:qmgokccfsbdzpjjxwrsrjaimka

Fact Checking with Insufficient Evidence [article]

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
2022 arXiv   pre-print
Automating the fact checking (FC) process relies on information obtained from external sources. In this work, we posit that it is crucial for FC models to make veracity predictions only when there is sufficient evidence and otherwise indicate when it is not enough. To this end, we are the first to study what information FC models consider sufficient by introducing a novel task and advancing it with three main contributions. First, we conduct an in-depth empirical analysis of the task with a new
more » ... fluency-preserving method for omitting information from the evidence at the constituent and sentence level. We identify when models consider the remaining evidence (in)sufficient for FC, based on three trained models with different Transformer architectures and three FC datasets. Second, we ask annotators whether the omitted evidence was important for FC, resulting in a novel diagnostic dataset, SufficientFacts, for FC with omitted evidence. We find that models are least successful in detecting missing evidence when adverbial modifiers are omitted (21% accuracy), whereas it is easiest for omitted date modifiers (63% accuracy). Finally, we propose a novel data augmentation strategy for contrastive self-learning of missing evidence by employing the proposed omission method combined with tri-training. It improves performance for Evidence Sufficiency Prediction by up to 17.8 F1 score, which in turn improves FC performance by up to 2.6 F1 score.
arXiv:2204.02007v1 fatcat:wgf7qvf2jbg4nl3zqu6q2ejcoq

Modelling End-of-Session Actions in Educational Systems [article]

Christian Hansen, Casper Hansen, Stephen Alstrup, Christina Lioma
2019 arXiv   pre-print
In this paper we consider the problem of modelling when students end their session in an online mathematics educational system. Being able to model this accurately will help us optimize the way content is presented and consumed. This is done by modelling the probability of an action being the last in a session, which we denote as the End-of-Session probability. We use log data from a system where students can learn mathematics through various kinds of learning materials, as well as multiple
more » ... s of exercises, such that a student session can consist of many different activities. We model the End-of-Session probability by a deep recurrent neural network in order to utilize the long term temporal aspect, which we experimentally show is central for this task. Using a large scale dataset of more than 70 million student actions, we obtain an AUC of 0.81 on an unseen collection of students. Through a detailed error analysis, we observe that our model is robust across different session structures and across varying session lengths.
arXiv:1909.06856v1 fatcat:ljmabnxnnvhvbjls3ts7hgityu

Adaptive Distributional Extensions to DFR Ranking [article]

Casper Petersen and Jakob Grue Simonsen and Kalervo Jarvelin and Christina Lioma
2016 arXiv   pre-print
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of
more » ... -informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).
arXiv:1609.00969v1 fatcat:cxx2gyha6zcevi6rebtlsbqfeu

Ensemble Learned Vaccination Uptake Prediction using Web Search Queries [article]

Niels Dalum Hansen and Christina Lioma and Kåre Mølbak
2016 arXiv   pre-print
We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. The clinical data is official vaccination registries, and the web data is query frequencies collected from Google Trends. Experiments with official vaccine records show that our method predicts vaccination uptake effectively (4.7 Root Mean Squared Error). Whereas performance is best when combining clinical and web data, using solely web data yields
more » ... parative performance. To our knowledge, this is the first study to predict vaccination uptake using web data (with and without clinical data).
arXiv:1609.00689v1 fatcat:5r7hzpdlnfaotaax2ecnsnidfi

Evaluation Measures for Relevance and Credibility in Ranked Lists [article]

Christina Lioma and Jakob Grue Simonsen and Birger Larsen
2017 arXiv   pre-print
Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both
more » ... the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation.
arXiv:1708.07157v1 fatcat:la33o5i2onalrggedjzhwcllp4

A Diagnostic Study of Explainability Techniques for Text Classification [article]

Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
2020 arXiv   pre-print
Pepa Atanasova, Jakob Grue Simonsen, Christina Li- oma, and Isabelle Augenstein. 2020a. Generating Fact Checking Explanations. In ACL, pages 7352- 7364. Association for Computational Linguistics.  ... 
arXiv:2009.13295v1 fatcat:jujt7taijbhgjdosg74ab6rzem

Preliminary Experiments using Subjective Logic for the Polyrepresentation of Information Needs [article]

Christina Lioma and Birger Larsen and Peter Ingwersen
2017 arXiv   pre-print
We extend the work of Lioma et al. (2010), by providing a practical application and analysis of the model.  ...  This validates the model of Lioma et al. (2010) for this dataset and retrieval scenario.  ...  This is exactly the topic we address in this work, based on the model proposed in Lioma et al. (2010) .  ... 
arXiv:1704.01603v1 fatcat:6bkldohitnf4fbvc3ziekxeq4m

Non-Compositional Term Dependence for Information Retrieval [article]

Christina Lioma and Jakob Grue Simonsen and Birger Larsen and Niels Dalum Hansen
2015 arXiv   pre-print
Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the
more » ... ency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR.
arXiv:1507.08198v1 fatcat:qav2yroh5vaf7ouu5dnkj7xmr4
« Previous Showing results 1 — 15 out of 166 results