Filters








583 Hits in 1.8 sec

Tribrid: Stance Classification with Neural Inconsistency Detection [article]

Song Yang, Jacopo Urbani
2021 arXiv   pre-print
We study the problem of performing automatic stance classification on social media with neural architectures such as BERT. Although these architectures deliver impressive results, their level is not yet comparable to the one of humans and they might produce errors that have a significant impact on the downstream task (e.g., fact-checking). To improve the performance, we present a new neural architecture where the input also includes automatically generated negated perspectives over a given
more » ... s over a given claim. The model is jointly learned to make simultaneously multiple predictions, which can be used either to improve the classification of the original perspective or to filter out doubtful predictions. In the first case, we propose a weakly supervised method for combining the predictions into a final one. In the second case, we show that using the confidence scores to remove doubtful predictions allows our method to achieve human-like performance over the retained information, which is still a sizable part of the original input.
arXiv:2109.06508v1 fatcat:pq446e4w3ncrlbbbnftrs3umma

Adaptive Low-level Storage of Very Large Knowledge Graphs [article]

Jacopo Urbani, Ceriel Jacobs
2020 arXiv   pre-print
For instance, Urbani et al. [88] have shown that a careful choice of the IDs can introduce importance speedups due to the improved data locality.  ... 
arXiv:2001.09078v1 fatcat:thbu2agwj5ae7nwtuarfrbxeai

Checking Chase Termination over Ontologies of Existential Rules with Equality [article]

David Carral, Jacopo Urbani
2019 arXiv   pre-print
rule engine for existential rules (Urbani et al. 2018) .  ...  All the used rule sets are available online. 2 To verify (I), we implemented the "renaming" chase variant presented in Definition 3 in VLog (Urbani, Jacobs, and Krötzsch 2016), which is an efficient  ... 
arXiv:1911.10981v1 fatcat:x2tspmgp4bhdpj4k4bqalh5fvy

Expressive Stream Reasoning with Laser [article]

Hamid R. Bazoobandi and Harald Beck and Jacopo Urbani
2017 arXiv   pre-print
An increasing number of use cases require a timely extraction of non-trivial knowledge from semantically annotated data streams, especially on the Web and for the Internet of Things (IoT). Often, this extraction requires expressive reasoning, which is challenging to compute on large streams. We propose Laser, a new reasoner that supports a pragmatic, non-trivial fragment of the logic LARS which extends Answer Set Programming (ASP) for streams. At its core, Laser implements a novel evaluation
more » ... novel evaluation procedure which annotates formulae to avoid the re-computation of duplicates at multiple time points. This procedure, combined with a judicious implementation of the LARS operators, is responsible for significantly better runtimes than the ones of other state-of-the-art systems like C-SPARQL and CQELS, or an implementation of LARS which runs on the ASP solver Clingo. This enables the application of expressive logic-based reasoning to large streams and opens the door to a wider range of stream reasoning use cases.
arXiv:1707.08876v2 fatcat:ukghntgzdjh5jkp6er3un6bal4

Hybrid reasoning on OWL RL

Jacopo Urbani, Robert Piro, Frank van Harmelen, Henri Bal
2014 Semantic Web Journal  
E-mail: jacopo@cs.vu.nl. by applying a reasoning algorithm on the input data.  ... 
doi:10.3233/sw-130120 fatcat:2r4i2m7kqzb67niw7ep4d5l5o4

Scalable and Parallel Reasoning in the Semantic Web [chapter]

Jacopo Urbani
2010 Lecture Notes in Computer Science  
The current state of the art regarding scalable reasoning consists of programs that run on a single machine. When the amount of data is too large, or the logic is too complex, the computational resources of a single machine are not enough. We propose a distributed approach that overcomes these limitations and we sketch a research methodology. A distributed approach is challenging because of the skew in data distribution and the difficulty in partitioning Semantic Web data. We present initial
more » ... present initial results which are promising and suggest that the approach may be successful.
doi:10.1007/978-3-642-13489-0_49 fatcat:mkhixjupf5ewda45z4amky4cry

Scalable Distributed Reasoning Using MapReduce [chapter]

Jacopo Urbani, Spyros Kotoulas, Eyal Oren, Frank van Harmelen
2009 Lecture Notes in Computer Science  
We address the problem of scalable distributed reasoning, proposing a technique for materialising the closure of an RDF graph based on MapReduce. We have implemented our approach on top of Hadoop and deployed it on a compute cluster of up to 64 commodity machines. We show that a naive implementation on top of MapReduce is straightforward but performs badly and we present several non-trivial optimisations. Our algorithm is scalable and allows us to compute the RDFS closure of 865M triples from
more » ... 865M triples from the Web (producing 30B triples) in less than two hours, faster than any other published approach.
doi:10.1007/978-3-642-04930-9_40 fatcat:djqm7scjibfp5cjbebilb5ualu

KOGNAC: Efficient Encoding of Large Knowledge Graphs [article]

Jacopo Urbani, Sourav Dutta, Sairam Gurajada, Gerhard Weikum
2016 arXiv   pre-print
Sampling provides the reference technique for a fast approximation [Urbani et al., 2013] .  ...  A longer version of this paper, with more details and experiments, is available online at [Urbani et al., 2016] .  ... 
arXiv:1604.04795v2 fatcat:ejs3c7kg45dqhpxd6lypidvxoi

Scalable RDF data compression with MapReduce

Jacopo Urbani, Jason Maassen, Niels Drost, Frank Seinstra, Henri Bal
2012 Concurrency and Computation  
The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a
more » ... ch uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. SCALABLE RDF DATA COMPRESSION WITH MAPREDUCE 25 make dictionary encoding a feasible technique on a very large input, a distributed implementation is required. To the best of our knowledge, no distributed approach exists to solve this problem. In this paper, we propose a technique to compress and decompress RDF statements using the MapReduce programming model [6] . Our approach uses a dictionary encoding technique that maintains the original structure of the data. This technique can be used by all RDF applications that need to efficiently process a large amount of data, such as RDF storage engines, network analysis tools, and reasoners. Our compression technique was essential in our recent work on Semantic Web inference engines, as it allowed us to reason directly on the compressed statements with a consequent increase of performance. As a result, we were able to reason over tens of billions of statements [7, 8] , advancing the current state of the art in the field significantly. The compression technique we present in this paper has the following: (i) performance that scales linearly; (ii) the ability to build a very large dictionary of hundreds of millions of entries; and (iii) the ability to handle load balancing issues with sampling and caching. This paper is structured as follows. In Section 2, we discuss the conventional approach to dictionary encoding and highlight the problems that arise. Sections 3 and 4 describe how we have implemented the data compression and decompression in MapReduce. Section 5 evaluates our approach, and Section 6 describes related work. Finally, we conclude and discuss future work in Section 7. DICTIONARY ENCODING Dictionary encoding is often used because of its simplicity. In our case, dictionary encoding has also the additional advantage that the compressed data can still be manipulated by the application. Traditional techniques such as gzip or bzip2 hide the original data so that reading without decompression is impossible. Algorithm 1 shows a sequential algorithm to compress and decompress RDF statements. The compression algorithm starts by initializing the dictionary table. The table has two columns, one that contains the terms in their textual representation and one that contains the
doi:10.1002/cpe.2840 fatcat:ya6dtyxr3ndqpkkakfhill7vbi

Streaming the Web: Reasoning Over Dynamic Data

Alessandro Margara, Jacopo Urbani, Frank van Harmelen, Henri Bal
2014 Social Science Research Network  
This allows Email addresses: a.margara@vu.nl (Alessandro Margara), jacopo@cs.vu.nl (Jacopo Urbani), frank.van.harmelen@cs.vu.nl (Frank van Harmelen), bal@cs.vu.nl (Henri Bal) 1 http://www.go-gulf.com  ... 
doi:10.2139/ssrn.3199091 fatcat:5xit5vvie5aztidgyl7hxawb7a

Handling Impossible Derivations During Stream Reasoning [chapter]

Hamid R. Bazoobandi, Henri Bal, Frank van Harmelen, Jacopo Urbani
2020 Lecture Notes in Computer Science  
With the rapid expansion of the Web and the advent of the Internet of Things, there is a growing need to design tools for intelligent analytics and decision making on streams of data. Logic-based frameworks like LARS allow the execution of complex reasoning on such streams, but it is paramount that the computation is completed in a timely manner before the stream expires. To reduce the runtime, we can extend the validity of inferred conclusions to the future to avoid repeated derivations, but
more » ... derivations, but this is not enough to avoid all sources of redundant computation. To further alleviate this problem, this paper introduces a new technique that infers the impossibility of certain derivations in the future and blocks the reasoner from performing computation that is doomed to fail anyway. An experimental analysis on microbenchmarks shows that our technique leads to a significant reduction of the reasoning runtime.
doi:10.1007/978-3-030-49461-2_1 fatcat:calthoqtqfbdpmssfizr6pzccu

Adaptive Low-level Storage of Very Large Knowledge Graphs

Jacopo Urbani, Ceriel Jacobs
2020 Proceedings of The Web Conference 2020  
The increasing availability and usage of Knowledge Graphs (KGs) on the Web calls for scalable and general-purpose solutions to store this type of data structures. We propose Trident, a novel storage architecture for very large KGs on centralized systems. Trident uses several interlinked data structures to provide fast access to nodes and edges, with the physical storage changing depending on the topology of the graph to reduce the memory footprint. In contrast to single architectures designed
more » ... tectures designed for single tasks, our approach offers an interface with few low-level and general-purpose primitives that can be used to implement tasks like SPARQL query answering, reasoning, or graph analytics. Our experiments show that Trident can handle graphs with 10 11 edges using inexpensive hardware, delivering competitive performance on multiple workloads.
doi:10.1145/3366423.3380246 dblp:conf/www/UrbaniJ20 fatcat:pryxyqw2ovhetbgjfa5hmdgq5u

DynamiTE: Parallel Materialization of Dynamic RDF Data [chapter]

Jacopo Urbani, Alessandro Margara, Ceriel Jacobs, Frank van Harmelen, Henri Bal
2013 Lecture Notes in Computer Science  
One of the main advantages of using semantically annotated data is that machines can reason on it, deriving implicit knowledge from explicit information. In this context, materializing every possible implicit derivation from a given input can be computationally expensive, especially when considering large data volumes. Most of the solutions that address this problem rely on the assumption that the information is static, i.e., that it does not change, or changes very infrequently. However, the
more » ... tly. However, the Web is extremely dynamic: online newspapers, blogs, social networks, etc., are frequently changed so that outdated information is removed and replaced with fresh data. This demands for a materialization that is not only scalable, but also reactive to changes. In this paper, we consider the problem of incremental materialization, that is, how to update the materialized derivations when new data is added or removed. To this purpose, we consider the ρdf RDFS fragment [11] , and present a parallel system that implements a number of algorithms to quickly recalculate the derivation. In case new data is added, our system uses a parallel version of the well-known semi-naive evaluation of Datalog. In case of removals, we have implemented two algorithms, one based on previous theoretical work, and another one that is more efficient since it does not require a complete scan of the input for every update. We have evaluated the performance using a prototype system called DynamiTE , which organizes the knowledge bases with a number of indices to facilitate the query process and exploits parallelism to improve the performance. The results show that our methods are indeed capable to recalculate the derivation in a short time, opening the door to reasoning on much more dynamic data than is currently possible.
doi:10.1007/978-3-642-41335-3_41 fatcat:ns2v6wqukbfcbbgsw36fl7ifpe

Extracting Novel Facts from Tables for Knowledge Graph Completion [chapter]

Benno Kruit, Peter Boncz, Jacopo Urbani
2019 Lecture Notes in Computer Science  
We propose a new end-to-end method for extending a Knowledge Graph (KG) from tables. Existing techniques tend to interpret tables by focusing on information that is already in the KG, and therefore tend to extract many redundant facts. Our method aims to find more novel facts. We introduce a new technique for table interpretation based on a scalable graphical model using entity similarities. Our method further disambiguates cell values using KG embeddings as additional ranking method. Other
more » ... g method. Other distinctive features are the lack of assumptions about the underlying KG and the enabling of a fine-grained tuning of the precision/recall trade-off of extracted facts. Our experiments show that our approach has a higher recall during the interpretation process than the state-of-the-art, and is more resistant against the bias observed in extracting mostly redundant facts since it produces more novel extractions.
doi:10.1007/978-3-030-30793-6_21 fatcat:ocfjoeu6nzfrnplfy7ibddhuv4

Streaming the Web: Reasoning over dynamic data

Alessandro Margara, Jacopo Urbani, Frank van Harmelen, Henri Bal
2014 Journal of Web Semantics  
This allows Email addresses: a.margara@vu.nl (Alessandro Margara), jacopo@cs.vu.nl (Jacopo Urbani), frank.van.harmelen@cs.vu.nl (Frank van Harmelen), bal@cs.vu.nl (Henri Bal) 1 http://www.go-gulf.com  ... 
doi:10.1016/j.websem.2014.02.001 fatcat:lxuiqe2s2vetdoais5maohwp7q
« Previous Showing results 1 — 15 out of 583 results