Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis

Viktoria Benz, Mohammad Fazleh Elahi, Basil Ell, Philipp Cimiano
2020 Zenodo  
Most approaches to question answering over linked data (QALD) frame the task as a machine learning problem, consisting in learning a mapping from natural language questions into SPARQL queries by parametrizing a model from training data given in the form of pairs of natural language (NL) question and SPARQL query. In this preliminary work we present an alternative approach to developing a QA system using machine learning that relies on the automatic generation of a QA grammar from a lemon
more » ... n. This model-based approach comes with a number of advantages compared to a machine learning approach. First, our approach gives maximum control over the QA interface to the developer of the system as every entry added to the lexicon increases the coverage of the grammar and thus of the QA system in a predictable way. This is in contrast to machine learning approaches where the impact of the addition of a single training example is difficult to predict. A further advantage of our approach is that the QA system operates on the basis of a symbolic grammar that can be used to provide guidance and auto-completion functionality to users. Our system is indeed intended to be used in the context of an auto-completion interface that allows users to ask only questions that the grammar can cover. We present very preliminary results showing that a large percentage of the questions of the training set of QALD-7 can be rephrased in terms of questions that our grammar can parse. We show that with a hand-crafted lexicon, we can in principle get very high micro-F1 scores of 62.5% on the training data of QALD-7 when questions are manually rephrased to fit our grammar. Although these preliminary results do not constitute a proper evaluation of our approach, they hint at the fact that an approach as we propose seems feasible.
doi:10.5281/zenodo.6641630 fatcat:27n5ieql5zfndciv44wzyepzgq