Filters








23 Hits in 2.4 sec

The compressed permuterm index

Paolo Ferragina, Rossano Venturini
2010 ACM Transactions on Algorithms  
In this article we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical  ...  Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size.  ...  The authors would like to thank the anonymous referees and Gonzalo Navarro for their valuable technical comments and their help in improving the presentation of the article.  ... 
doi:10.1145/1868237.1868248 fatcat:ghfsoazcw5bzdlhmxc5p6wmt6y

Compressed permuterm index

Paolo Ferragina, Rossano Venturini
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
In this paper we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in optimal query time, i.e. time proportional to the length of the searched pattern, and space close  ...  Unfortunately the Permuterm index is space inefficient because its quadruples the dictionary size.  ...  Our compressed permuterm index allows to avoid the materialization of these two sets by working only on the compressed index built on the string SD.  ... 
doi:10.1145/1277741.1277833 dblp:conf/sigir/FerraginaV07 fatcat:kvjkt2lmhreurba5befqbowtji

Index structures for efficiently searching natural language text

Pirooz Chubak, Davood Rafiei
2010 Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10  
We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick  ...  Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase.  ...  ACKNOWLEDGEMENTS This research was supported by the Natural Sciences and Engineering Research Council and the BIN network.  ... 
doi:10.1145/1871437.1871527 dblp:conf/cikm/ChubakR10 fatcat:tttoakw3czf4xphh7dsx5j77ei

A practical index for approximate dictionary matching with few mismatches [article]

Aleksander Cisłak, Szymon Grabowski
2016 arXiv   pre-print
We also demonstrate that a basic compression technique consisting in q-gram substitution can significantly reduce the index size (up to 50 the query time relatively low.  ...  We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive  ...  Ferragina and Venturini [15] proposed a compressed permuterm index in order to overcome the limitations of the original structure with respect to space.  ... 
arXiv:1501.04948v4 fatcat:xjnt5y4lfvggvo4p6nmcjvoqm4

Compressed String Dictionary Look-Up with Edit Distance One [chapter]

Djamal Belazzougui, Rossano Venturini
2012 Lecture Notes in Computer Science  
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space.  ...  The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary.  ...  We notice that the number of distinct lengths and, thus, compressed permuterm indexes is O( √ n).  ... 
doi:10.1007/978-3-642-31265-6_23 fatcat:4p7efh656rfkpp4qplghourgee

Compressed String Dictionary Search with Edit Distance One

Djamal Belazzougui, Rossano Venturini
2015 Algorithmica  
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space.  ...  The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary.  ...  The compressed permuterm index [22] is a compressed index for dictionaries of strings based on the Burrows-Wheeler Transform (Bwt).  ... 
doi:10.1007/s00453-015-9990-0 fatcat:ps65mplaprflljgqgge7wrzfay

Computed tomography in the investigation of dementia

J. R Bradshaw, J L G Thomson, M J Campbell
1983 BMJ (Clinical Research Edition)  
These last three were eventually found by searching the Sci'ence Citation Index permuterm subject index in conjunction with the source index based on the permuterm "interscalene."  ...  Other than these, there were no references under this term in the permuterm subject index that were not already found by the citation index search.  ... 
doi:10.1136/bmj.286.6368.891-c fatcat:srcqpdoorzg6flwmslttr7g6ji

Lightweight merging of compressed indices based on BWT variants [article]

Lavinia Egidi, Giovanni Manzini
2019 arXiv   pre-print
We then expand our technique for merging compressed tries and circular/permuterm compressed indices, two compressed data structures for which there were hitherto no known merging algorithms.  ...  In this paper we propose a flexible and lightweight technique for merging compressed indices based on variants of Burrows-Wheeler transform (BWT), thus addressing the need for algorithms that compute compressed  ...  Compressed permuterm indices Finally, we consider the case in which cbwt 01 is to be used as the core of a compressed permuterm index [11] .  ... 
arXiv:1903.01465v1 fatcat:rkymsliqzjcqvjh2b4ynayuj2i

Wheeler graphs: A framework for BWT-based data structures

Travis Gagie, Giovanni Manzini, Jouni Sirén
2017 Theoretical Computer Science  
We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state  ...  We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs.  ...  Instead, the Compressed Permuterm Index is built computing the single-string BWT of the concatenation s 1 $s 2 · · · $s d that does not support naturally the search for circular patterns.  ... 
doi:10.1016/j.tcs.2017.06.016 pmid:29276331 pmcid:PMC5727778 fatcat:lon5o2wmwravdnyifdf3uvuate

Full-text and Keyword Indexes for String Searching [article]

Aleksander Cisłak
2015 arXiv   pre-print
The first contribution is the FM-bloated index, which is a modification of the well-known FM-index (a compressed, full-text index) that trades space for speed.  ...  In the category of keyword indexes we present the so-called split index, which can efficiently solve the k-mismatches problem, especially for 1 error.  ...  Ferragina and Venturini [FV10] proposed a compressed permuterm index in order to overcome the limitations of the original structure with respect to space.  ... 
arXiv:1508.06610v1 fatcat:5pmce2d72veuxpw3s5u6hbidim

A Compact RDF Store Using Suffix Arrays [chapter]

Nieves R. Brisaboa, Ana Cerdeira-Pena, Antonio Fariña, Gonzalo Navarro
2015 Lecture Notes in Computer Science  
Our storage format, RDFCSA, builds on compressed suffix arrays.  ...  On the other hand, supporting efficient SPARQL queries on RDF datasets requires index data structures to accompany the data, which hampers compactness.  ...  However, the permuterm index is built on an FM-index [15] , which on large alphabets like our [1, n s + n p + n o ] is implemented on a wavelet tree [16] .  ... 
doi:10.1007/978-3-319-23826-5_11 fatcat:2ry5npsynrf7bcjrbi3fczhkrq

Compressed Indexes for String Searching in Labeled Graphs

Paolo Ferragina, Francesco Piccinno, Rossano Venturini
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15  
But, as far as we know, all these results are limited to design compressed graph indexes which support basic access operations onto the link structure of the input graph, such as: given a node u, return  ...  This paper takes inspiration from the Facebook Unicorn's platform and proposes some compressed-indexing schemes for large graphs whose nodes are labeled with strings of variable length-i.e., node's attributes  ...  Acknowledgments We wish to warmly thank Domenico Dato and Daniele Vitale (Istella, Tiscali) for exposing us to some of the problems we addressed in this paper and for fruitful discussions.  ... 
doi:10.1145/2736277.2741140 dblp:conf/www/FerraginaPV15 fatcat:asrd4n5mkbevxd6w5576pybw3q

Introduction to information retrieval

2009 ChoiceReviews  
Tomasic and Garcia-Molina (1993) and Jeong and Omiecinski (1995) are key early papers evaluating term partitioning versus document partitioning for distributed indexes.  ...  But the outcome depends on the details of the distributed system; at least one thread of work has reached the opposite conclusion Barbosa 1998, Badue et al. 2001).  ...  Permuterm indexes Our first special index for general wildcard queries is the permuterm index, PERMUTERM INDEX a form of inverted index.  ... 
doi:10.5860/choice.46-2715 fatcat:ruwoe46pgzcupjygnwbnit4z3u

Compression of RDF dictionaries

Miguel A. Martínez-Prieto, Javier D. Fernández, Rodrigo Cánovas
2012 Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12  
This paper focuses on this scenario by adapting compression techniques for string dictionaries to the case of RDF.  ...  We propose a novel technique: Dcomp, which can be tuned to represent the dictionary in compressed space (22−64%) and to perform in a few microseconds (1−50µs).  ...  Brisaboa, Francisco Claude and Gonzalo Navarro by their advices about compressed string dictionaries, and to Claudio Gutierrez by his continued support and his magistral lessons about the Web of Data.  ... 
doi:10.1145/2245276.2245343 dblp:conf/sac/Martinez-PrietoFC12 fatcat:eh2oicovznel5pnvtvte3auy7m

LZ-Compressed String Dictionaries [article]

Julian Arz, Johannes Fischer
2013 arXiv   pre-print
We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text.  ...  We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive.  ...  Acknowledgments We thank Giuseppe Ottaviano for providing his data sets, and Francisco Claude and MiguelÁngel Martínez for the source codes of their implementations.  ... 
arXiv:1305.0674v1 fatcat:yzsaiuw27nb35ngw3xtc5mqz3i
« Previous Showing results 1 — 15 out of 23 results