Filters








14 Hits in 1.4 sec

Entity Linking in Queries: Efficiency vs. Effectiveness [chapter]

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
2017 Lecture Notes in Computer Science  
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between
more » ... ss and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
doi:10.1007/978-3-319-56608-5_4 fatcat:u3wntwzxtzc4dm5nk34zb6cp6y

Improving the Performance of Pipelined Query Processing with Skipping [chapter]

Simon Jonassen, Svein Erik Bratsberg
2012 Lecture Notes in Computer Science  
Web search engines need to provide high throughput and short query latency. Recent results show that pipelined query processing over a term-wise partitioned inverted index may have superior throughput. However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method. In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing. Further, we introduce a novel idea of
more » ... sing Max-Score pruning within pipelined query processing and a new term assignment heuristic, partitioning by Max-Score. Our current results indicate a significant improvement over the state-of-the-art approach and lead to several further optimizations, which include dynamic load balancing, intra-query concurrent processing and a hybrid combination between pipelined and non-pipelined execution.
doi:10.1007/978-3-642-35063-4_1 fatcat:uc7ngktrenb2do5ualxmpyftfm

Dynamic optimization of queries in pivot-based indexing

Svein Erik Bratsberg, Magnus Lie Hetland
2010 Multimedia tools and applications  
This paper evaluates the use of standard database indexes and query processing as a way to do metric indexing in the LAESA approach. By utilizing B-trees and R-trees as pivot-based indexes, we may use well-known optimization techniques from the database field within metric indexing and search. The novelty of this paper is that we use a cost-based approach to dynamically evaluate which and how many pivots to use in the evaluation of each query. By a series of measurements using our database
more » ... type we are able to evaluate the performance of this approach. Compared to using all available pivots for filtering, the optimized approach gives half the response times for main memory data, but much more varied results for disk resident data. However, by use of the cost model we are able to dynamically determine when to bypass the indexes and simply perform a sequential scan of the base data. The conclusion of this evaluation is that it is beneficial to create many pivots, but to use only the most selective ones during evaluation of each query. R-trees give better performance than B-trees when utilizing all pivots, but when being able to dynamically select the best pivots, B-trees often provide better performance.
doi:10.1007/s11042-010-0614-z fatcat:xypjserx6zfmpntmiapfd3nnsi

On the Reproducibility of the TAGME Entity Linking System [chapter]

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
2016 Lecture Notes in Computer Science  
Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API,
more » ... ile the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts. The recent SIGIR 2015 workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR) 4 defined these properties as follows: -Repeatability: "Repeating a previous result under the original conditions (e.g., same dataset and system configuration)." -Reproducibility: "Reproducing a previous result under different, but comparable conditions (e.g., different, but comparable dataset)." -Generalizability: "Applying an existing, empirically validated technique to a different IR task/domain than the original." We address each of these aspects in our study, as explained below. Repeatability. Although TAGME facilitates comparison by providing a publicly available API, it is not sufficient for the purpose of repeatability. The main reason is that the API works much like a black-box; it is impossible to check whether it corresponds to the system described in [8] . Actually, it is acknowledged that the API deviates from the original publication, 5 but the differences are not documented anywhere. Another limiting factor is that the API cannot be used for efficiency comparisons due to the network overhead. We report on the challenges around repeating the experiments in [8] and discuss why the results are not repeatable. Reproducibility. TAGME has been re-implemented in several research papers, see, e.g., [2, 3, 11] , these, however, do not report on the reproducibility of results. In addition, there are some technical challenges involved in the TAGME approach that have not always been dealt with properly in the original paper and accordingly in these reimplementations (as confirmed by some of the respective authors). 6 We examine the reproducibility of TAGME, as introduced in [8], and show that some of the results are not reproducible, while others are reproducible only through the TAGME API. Generalizability. We test generalizability by applying TAGME to a different task: entity linking in queries (ELQ). This task has been devised by the Entity Recognition and Disambiguation (ERD) workshop [1], and has been further elaborated on in [11] . The main difference between conventional entity linking and ELQ is that the latter accepts that a query might have multiple interpretations, i.e., the output in not a single annotation, but (possibly multiple) sets of entities that are semantically related to each other. Even though TAGME has been developed for a different problem (where only a single interpretation is returned), we show that it is generalizable to the ELQ task. Before we proceed let us make a disclaimer. In the course of this study, we made a best effort to reproduce the results presented in [8] based on the information available to us: the TAGME papers [8, 9] and the source code kindly provided by the authors. Our main goal with this work is to learn about reproducibility, and is in no way intended to be a criticism of TAGME. The communication with the TAGME authors is summarized in Sect. 6. The resources developed within this paper as well as detailed responses from the TAGME authors (and any possible future updates) are made publicly available at
doi:10.1007/978-3-319-30671-1_32 fatcat:wodkhjek55dqzbxlpwsmxclzgy

Entity Linking in Queries

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
2015 Proceedings of the 2015 International Conference on Theory of Information Retrieval - ICTIR '15  
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and
more » ... ce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
doi:10.1145/2808194.2809473 dblp:conf/ictir/HasibiBB15 fatcat:pjmzixgqwzbz7axllnwrahdxbm

Non-hierarchical Structures: How to Model and Index Overlaps? [article]

Faegheh Hasibi, Svein Erik Bratsberg
2016 arXiv   pre-print
Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA
more » ... from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both reachability and overlapping relationships.
arXiv:1408.1011v3 fatcat:2kr5cuw6dfhl3cjh7vysjctrvu

Improving the performance of pipelined query processing with skipping—and its comparison to document-wise partitioning

Simon Jonassen, Svein Erik Bratsberg
2013 World wide web (Bussum)  
Web search engines need to provide high throughput and short query latency. Recent results show that pipelined query processing over a term-wise partitioned inverted index may have superior throughput. However, the query processing latency and scalability with respect to the collections size are the main challenges associated with this method. In this paper, we evaluate the effect of inverted index skipping on the performance of pipelined query processing. Further, we introduce a novel idea of
more » ... sing Max-Score pruning within pipelined query processing and a new term assignment heuristic, partitioning by Max-Score. Our current results indicate a significant improvement over the state-of-the-art approach and lead to several further optimizations, which include dynamic load balancing, intra-query concurrent processing and a hybrid combination between pipelined and non-pipelined execution.
doi:10.1007/s11280-013-0260-2 fatcat:czrimud3ifd2tkm75bkt2u3tby

Dynamic Factual Summaries for Entity Cards

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
Entity cards are being used frequently in modern web search engines to o er a concise overview of an entity directly on the results page. ese cards are composed of various elements, one of them being the entity summary: a selection of facts describing the entity from an underlying knowledge base. ese summaries, while presenting a synopsis of the entity, can also directly address users' information needs. In this paper, we make the rst e ort towards generating and evaluating such factual
more » ... s. We introduce and address the novel problem of dynamic entity summarization for entity cards, and break it down to two speci c subtasks: fact ranking and summary generation. We perform an extensive evaluation of our method using crowdsourcing. Our results show the e ectiveness of our fact ranking approach and validate that users prefer dynamic summaries over static ones.
doi:10.1145/3077136.3080810 dblp:conf/sigir/HasibiBB17 fatcat:zs4xxfolfzeajn4g2ul243uevm

A greedy algorithm for finding sets of entity linking interpretations in queries

Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
2014 Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14  
We describe our participation in the short text track of the Entity Recognition and Disambiguation (ERD) challenge, where the task is to find all interpretations of entity-related queries and link them to entities in a knowledge base. We approached this task using a multi-stage framework. First, we recognize entity mentions based on known surface forms. Next, we score candidate entities using a learning-to-rank method. Finally, we use a greedy algorithm to find all valid interpretation sets for
more » ... the query. We report on evaluation results using the official ERD challenge platform.
doi:10.1145/2633211.2634356 dblp:conf/sigir/HasibiBB14 fatcat:zceq2korjbgkplvby2g3qkohoe

UBAS Nordisk

Universitetet Bergen, Arkeologiske Skrifter
unpublished
Foto: Svein Skare, Bergen Museum, Universitetet i Bergen. Figur 5 . 5 Relieffspenne fra Sørheim, Sogn. Nr. 16. Lengde 10,0 cm. Foto: Svein Skare, Bergen Museum, Universitetet i Bergen.  ...  Foto: Svein Skare, Bergen Museum, Universitetet i Bergen. Figur 3 . 3 Relieffspenne fraSandal, Sunnfjord. Nr. 17. Lengde 18,4 cm. Foto: Svein Skare, Bergen Museum, Universitetet i Bergen.  ... 
fatcat:xconj3cxijgmvmvn7fnakgwzmy

Category classes: Flexible classification and evolution in object-oriented databases [chapter]

Erik Odberg
1994 Lecture Notes in Computer Science  
Acknowledgments Svein Erik Bratsberg and Reidar Conradi arc acknowledged forcomments and discussions.  ... 
doi:10.1007/3-540-58113-8_186 fatcat:vksnffemf5bepppakal2c4euxa

Efficient query processing in distributed search engines

Simon Jonassen
2012 SIGIR Forum  
Svein Erik Bratsberg and Dr. Øystein Acknowledgment: This work was supported by the iAd Project funded by the Research Council of Norway and the Norwegian University of Science and Technology.  ...  Paper A.III: Improving the Performance of Pipelined Query Processing with Skipping Simon Jonassen and Svein Erik Bratsberg.  ...  Paper B.I: Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries Simon Jonassen and Svein Erik Bratsberg.  ... 
doi:10.1145/2492189.2492201 fatcat:uwasxhngrfgntemkhawyv3te64

Shrinking data balls in metric indexes

Bilegsaikhan Naidan, Magnus Lie Hetland
unpublished
ACKNOWLEDGEMENTS We wish to thank Øystein Torbjørnsen and Svein Erik Bratsberg for helpful discussions.  ... 
fatcat:e3pleaunjndujadynciucyducm

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models [article]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych
2021 arXiv   pre-print
End-to-End Retrieval in Continuous Space. 3 [19] [21] Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan. 2017.  ... 
arXiv:2104.08663v4 fatcat:fow5uqghbzggjclobtv7bpjtaa