Filters








44 Hits in 1.1 sec

LIDIOMS: A Multilingual Linked Idioms Data Set [article]

Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, Axel-Cyrille Ngonga Ngomo
2018 arXiv   pre-print
In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian. The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. Herein, we present
more » ... the model devised for structuring the data. We also provide the details of linking LIDIOMS to well-known multilingual data sets such as BabelNet. The resulting data set complies with best practices according to Linguistic Linked Open Data Community.
arXiv:1802.08148v1 fatcat:dockltoiuvai3ao4rqycahy26u

Entity Linking in 40 Languages using MAG [article]

Diego Moussallem, Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo
2018 arXiv   pre-print
A plethora of Entity Linking (EL) approaches has recently been developed. While many claim to be multilingual, the MAG (Multilingual AGDISTIS) approach has been shown recently to outperform the state of the art in multilingual EL on 7 languages. With this demo, we extend MAG to support EL in 40 different languages, including especially low-resources languages such as Ukrainian, Greek, Hungarian, Croatian, Portuguese, Japanese and Korean. Our demo relies on online web services which allow for an
more » ... easy access to our entity linking approaches and can disambiguate against DBpedia and Wikidata. During the demo, we will show how to use MAG by means of POST requests as well as using its user-friendly web interface. All data used in the demo is available at https://hobbitdata.informatik.uni-leipzig.de/agdistis/
arXiv:1805.11467v1 fatcat:o3uaeub5r5hpzowrv4lejxvdbe

Expeditious Generation of Knowledge Graph Embeddings [article]

Tommaso Soru, Stefano Ruberto, Diego Moussallem, André Valdestilhas, Alexander Bigerl, Edgard Marx, Diego Esteves
2018 arXiv   pre-print
Knowledge Graph Embedding methods aim at representing entities and relations in a knowledge base as points or vectors in a continuous vector space. Several approaches using embeddings have shown promising results on tasks such as link prediction, entity recommendation, question answering, and triplet classification. However, only a few methods can compute low-dimensional embeddings of very large knowledge bases without needing state-of-the-art computational resources. In this paper, we propose
more » ... G2Vec, a simple and fast approach to Knowledge Graph Embedding based on the skip-gram model. Instead of using a predefined scoring function, we learn it relying on Long Short-Term Memories. We show that our embeddings achieve results comparable with the most scalable approaches on knowledge graph completion as well as on a new metric. Yet, KG2Vec can embed large graphs in lesser time by processing more than 250 million triples in less than 7 hours on common hardware.
arXiv:1803.07828v2 fatcat:u5gumfjwlfckjjrn7vzidy73py

MEX vocabulary

Diego Esteves, Diego Moussallem, Ciro Baron Neto, Tommaso Soru, Ricardo Usbeck, Markus Ackermann, Jens Lehmann
2015 Proceedings of the 11th International Conference on Semantic Systems - SEMANTICS '15  
Over the last decades many machine learning experiments have been published, giving benefit to the scientific progress. In order to compare machine-learning experiment results with each other and collaborate positively, they need to be performed thoroughly on the same computing environment, using the same sample datasets and algorithm configurations. Besides this, practical experience shows that scientists and engineers tend to have large output data in their experiments, which is both
more » ... to analyze and archive properly without provenance metadata. However, the Linked Data community still misses a lightweight specification for interchanging machine-learning metadata over different architectures to achieve a higher level of interoperability. In this paper, we address this gap by presenting a novel vocabulary dubbed MEX. We show that MEX provides a prompt method to describe experiments with a special focus on data provenance and fulfills the requirements for a long-term maintenance.
doi:10.1145/2814864.2814883 dblp:conf/i-semantics/EstevesMNSUAL15 fatcat:seq7mmhrqnhn5o4lkzf527bjdq

Neural Machine Translation for Query Construction and Composition [article]

Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Esteves, Diego Moussallem, Gustavo Publio
2018 arXiv   pre-print
Research on question answering with knowledge base has recently seen an increasing use of deep architectures. In this extended abstract, we study the application of the neural machine translation paradigm for question parsing. We employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph query language and their compositions. Instead of inducing the programs through question-answer pairs, we expect a semi-supervised approach, where alignments between questions and queries
more » ... e built through templates. We argue that the coverage of language utterances can be expanded using late notable works in natural language generation.
arXiv:1806.10478v2 fatcat:46cxe5h5lba3hai66ynbg5p4o4

Augmenting Neural Machine Translation with Knowledge Graphs [article]

Diego Moussallem and Mihael Arčan and Axel-Cyrille Ngonga Ngomo and Paul Buitelaar
2019 arXiv   pre-print
Experimental Setup In our experiments, we used the multilingual EL system introduced by Moussallem et al. (2017) which is language and KB agnostic.  ...  According to a recent survey (Moussallem et al., 2018) , the idea of using a structured KB in MT systems started with the work of Knight and Luk (1994) .  ... 
arXiv:1902.08816v1 fatcat:xj75zls7lvap5eoydodrms4z3m

A Holistic Natural Language Generation Framework for the Semantic Web [article]

Axel-Cyrille Ngonga Ngomo and Diego Moussallem and Lorenz Bühmann
2019 arXiv   pre-print
With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework for verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our
more » ... esults suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by non-experts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91\% of the accuracy of domain experts.
arXiv:1911.01248v1 fatcat:jub6inckvjdlphgswtdncdquu4

Convolutional Hypercomplex Embeddings for Link Prediction [article]

Caglar Demir, Diego Moussallem, Stefan Heindorf, Axel-Cyrille Ngonga Ngomo
2021 arXiv   pre-print
Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, ℝ and ℂ. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield state-of-the-art results for link prediction. In this paper, we investigate a composition of convolution operations with hypercomplex multiplications. We propose the four
more » ... oaches QMult, OMult, ConvQ and ConvO to tackle the link prediction problem. QMult and OMult can be considered as quaternion and octonion extensions of previous state-of-the-art approaches, including DistMult and ComplEx. ConvQ and ConvO build upon QMult and OMult by including convolution operations in a way inspired by the residual learning framework. We evaluated our approaches on seven link prediction datasets including WN18RR, FB15K-237 and YAGO3-10. Experimental results suggest that the benefits of learning hypercomplex-valued vector representations become more apparent as the size and complexity of the knowledge graph grows. ConvO outperforms state-of-the-art approaches on FB15K-237 in MRR, Hit@1 and Hit@3, while QMult, OMult, ConvQ and ConvO outperform state-of-the-approaches on YAGO3-10 in all metrics. Results also suggest that link prediction performances can be further improved via prediction averaging. To foster reproducible research, we provide an open-source implementation of approaches, including training and evaluation scripts as well as pretrained models.
arXiv:2106.15230v2 fatcat:phiv2jebo5h2zb7uflt4r6l5lu

NeuralREG: An end-to-end approach to referring expression generation [article]

Thiago Castro Ferreira, Diego Moussallem, Ákos Kádár, Sander Wubben, Emiel Krahmer
2018 arXiv   pre-print
Besides the REG task, these data can be useful for many other tasks related to, for instance, the NLG process (Reiter and Dale, 2000; Gatt and Krahmer, 2018) and Wikification (Moussallem et al., 2017  ... 
arXiv:1805.08093v1 fatcat:rwzvfgjtnzdjld5rlfr6yejne4

Where is Linked Data in Question Answering over Linked Data? [article]

Tommaso Soru, Edgard Marx, André Valdestilhas, Diego Moussallem, Gustavo Publio, Muhammad Saleem
2020 arXiv   pre-print
We argue that "Question Answering with Knowledge Base" and "Question Answering over Linked Data" are currently two instances of the same problem, despite one explicitly declares to deal with Linked Data. We point out the lack of existing methods to evaluate question answering on datasets which exploit external links to the rest of the cloud or share common schema. To this end, we propose the creation of new evaluation settings to leverage the advantages of the Semantic Web to achieve AI-complete question answering.
arXiv:2005.03640v1 fatcat:i7gxlwcrlbadxetjdjkmtzaltq

Enriching the WebNLG corpus

Thiago Castro Ferreira, Diego Moussallem, Emiel Krahmer, Sander Wubben
2018 Proceedings of the 11th International Conference on Natural Language Generation  
This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silverstandard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English.
doi:10.18653/v1/w18-6521 dblp:conf/inlg/FerreiraMKW18 fatcat:ltryhdpytnbjbfsa7mvlksdm2e

RDF2PT: Generating Brazilian Portuguese Texts from RDF Data [article]

Diego Moussallem, Thiago Castro Ferreira, Marcos Zampieri, Maria Claudia Cavalcanti, Geraldo Xexéo, Mariana Neves, Axel-Cyrille Ngonga Ngomo
2018 arXiv   pre-print
The generation of natural language from Resource Description Framework (RDF) data has recently gained significant attention due to the continuous growth of Linked Data. A number of these approaches generate natural language in languages other than English, however, no work has been proposed to generate Brazilian Portuguese texts out of RDF. We address this research gap by presenting RDF2PT, an approach that verbalizes RDF data to Brazilian Portuguese language. We evaluated RDF2PT in an open
more » ... tionnaire with 44 native speakers divided into experts and non-experts. Our results suggest that RDF2PT is able to generate text which is similar to that generated by humans and can hence be easily understood.
arXiv:1802.08150v1 fatcat:4n7ppcb7rfbgzpjj5itqdcvfwm

NABU - Multilingual Graph-based Neural RDF Verbalizer [article]

Diego Moussallem and Dwaraknath Gnaneshwar and Thiago Castro Ferreira and Axel-Cyrille Ngonga Ngomo
2020 arXiv   pre-print
The RDF-to-text task has recently gained substantial attention due to continuous growth of Linked Data. In contrast to traditional pipeline models, recent studies have focused on neural models, which are now able to convert a set of RDF triples into text in an end-to-end style with promising results. However, English is the only language widely targeted. We address this research gap by presenting NABU, a multilingual graph-based neural model that verbalizes RDF data to German, Russian, and
more » ... sh. NABU is based on an encoder-decoder architecture, uses an encoder inspired by Graph Attention Networks and a Transformer as decoder. Our approach relies on the fact that knowledge graphs are language-agnostic and they hence can be used to generate multilingual text. We evaluate NABU in monolingual and multilingual settings on standard benchmarking WebNLG datasets. Our results show that NABU outperforms state-of-the-art approaches on English with 66.21 BLEU, and achieves consistent results across all languages on the multilingual scenario with 56.04 BLEU.
arXiv:2009.07728v2 fatcat:a3aqjv6ubrf4hhdpl4tlfx5t2e

Offline Question Answering over Linked Data using Limited Resources

Paramjot Kaur, Vincent Blücher, Rricha Jalota, Diego Moussallem, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck
2019 International Conference on Semantic Systems  
Question Answering over Linked Data provides concise information to the user from a natural language request instead of flooding them with documents. However, the accessibility of Linked Data resources, e.g., SPARQL endpoints, is bound to an online connection. We present OQA, the first offline Question Answering system over Linked Data for mobile devices. We built OQA with the limited resources of an Android mobile device, such as battery power, computational power, or memory consumption in
more » ... . Our OQA system has three main components: 1) question analysis and 2) query generation which identify the type of the question and reform it into a semantically meaningful data structure, i.e., a SPARQL query. Finally, the 3) query execution uses a novel mobile triple store, implemented with RDF4J. Our evaluation suggests that OQA is feasible for daily use in terms of battery consumption and able to answer domain-specific questions with up to 72% accuracy.
dblp:conf/i-semantics/KaurBJMNU19 fatcat:klpqs7mfebez3o5npkz427uaw4

The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task Overview and Evaluation Results (WebNLG+ 2020)

Thiago Castro Ferreira, Claire Gardent, Nikolai Ilinykh, Chris Van Der Lee, Simon Mille, Diego Moussallem, Anastasia Shimorina
2020 Zenodo  
WebNLG+ offers two challenges: (i) mapping sets of RDF triples to English or Russian text (generation) and (ii) converting English or Russian text to sets of RDF triples (semantic parsing). Compared to the eponymous WebNLG challenge, WebNLG+ provides an extended dataset that enable the training, evaluation, and comparison of microplanners and semantic parsers. In this paper, we present the results of the generation and semantic parsing task for both English and Russian and provide a brief description of the participating systems.
doi:10.5281/zenodo.6552784 fatcat:2csgl6vfhndbdhg2rafngnephy
« Previous Showing results 1 — 15 out of 44 results