A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
The compressed permuterm index
2010
ACM Transactions on Algorithms
In this article we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical ...
Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size. ...
The authors would like to thank the anonymous referees and Gonzalo Navarro for their valuable technical comments and their help in improving the presentation of the article. ...
doi:10.1145/1868237.1868248
fatcat:ghfsoazcw5bzdlhmxc5p6wmt6y
Compressed permuterm index
2007
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07
In this paper we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in optimal query time, i.e. time proportional to the length of the searched pattern, and space close ...
Unfortunately the Permuterm index is space inefficient because its quadruples the dictionary size. ...
Our compressed permuterm index allows to avoid the materialization of these two sets by working only on the compressed index built on the string SD. ...
doi:10.1145/1277741.1277833
dblp:conf/sigir/FerraginaV07
fatcat:kvjkt2lmhreurba5befqbowtji
Index structures for efficiently searching natural language text
2010
Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10
We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick ...
Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. ...
ACKNOWLEDGEMENTS This research was supported by the Natural Sciences and Engineering Research Council and the BIN network. ...
doi:10.1145/1871437.1871527
dblp:conf/cikm/ChubakR10
fatcat:tttoakw3czf4xphh7dsx5j77ei
A practical index for approximate dictionary matching with few mismatches
[article]
2016
arXiv
pre-print
We also demonstrate that a basic compression technique consisting in q-gram substitution can significantly reduce the index size (up to 50 the query time relatively low. ...
We present a surprisingly simple solution called a split index, which is based on the Dirichlet principle, for matching a keyword with few mismatches, and experimentally show that it offers competitive ...
Ferragina and Venturini [15] proposed a compressed permuterm index in order to overcome the limitations of the original structure with respect to space. ...
arXiv:1501.04948v4
fatcat:xjnt5y4lfvggvo4p6nmcjvoqm4
Compressed String Dictionary Look-Up with Edit Distance One
[chapter]
2012
Lecture Notes in Computer Science
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. ...
The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary. ...
We notice that the number of distinct lengths and, thus, compressed permuterm indexes is O( √ n). ...
doi:10.1007/978-3-642-31265-6_23
fatcat:4p7efh656rfkpp4qplghourgee
Compressed String Dictionary Search with Edit Distance One
2015
Algorithmica
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. ...
The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary. ...
The compressed permuterm index [22] is a compressed index for dictionaries of strings based on the Burrows-Wheeler Transform (Bwt). ...
doi:10.1007/s00453-015-9990-0
fatcat:ps65mplaprflljgqgge7wrzfay
Computed tomography in the investigation of dementia
1983
BMJ (Clinical Research Edition)
These last three were eventually found by searching the Sci'ence Citation Index permuterm subject index in conjunction with the source index based on the permuterm "interscalene." ...
Other than these, there were no references under this term in the permuterm subject index that were not already found by the citation index search. ...
doi:10.1136/bmj.286.6368.891-c
fatcat:srcqpdoorzg6flwmslttr7g6ji
Lightweight merging of compressed indices based on BWT variants
[article]
2019
arXiv
pre-print
We then expand our technique for merging compressed tries and circular/permuterm compressed indices, two compressed data structures for which there were hitherto no known merging algorithms. ...
In this paper we propose a flexible and lightweight technique for merging compressed indices based on variants of Burrows-Wheeler transform (BWT), thus addressing the need for algorithms that compute compressed ...
Compressed permuterm indices Finally, we consider the case in which cbwt 01 is to be used as the core of a compressed permuterm index [11] . ...
arXiv:1903.01465v1
fatcat:rkymsliqzjcqvjh2b4ynayuj2i
Wheeler graphs: A framework for BWT-based data structures
2017
Theoretical Computer Science
We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state ...
We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs. ...
Instead, the Compressed Permuterm Index is built computing the single-string BWT of the concatenation s 1 $s 2 · · · $s d that does not support naturally the search for circular patterns. ...
doi:10.1016/j.tcs.2017.06.016
pmid:29276331
pmcid:PMC5727778
fatcat:lon5o2wmwravdnyifdf3uvuate
Full-text and Keyword Indexes for String Searching
[article]
2015
arXiv
pre-print
The first contribution is the FM-bloated index, which is a modification of the well-known FM-index (a compressed, full-text index) that trades space for speed. ...
In the category of keyword indexes we present the so-called split index, which can efficiently solve the k-mismatches problem, especially for 1 error. ...
Ferragina and Venturini [FV10] proposed a compressed permuterm index in order to overcome the limitations of the original structure with respect to space. ...
arXiv:1508.06610v1
fatcat:5pmce2d72veuxpw3s5u6hbidim
A Compact RDF Store Using Suffix Arrays
[chapter]
2015
Lecture Notes in Computer Science
Our storage format, RDFCSA, builds on compressed suffix arrays. ...
On the other hand, supporting efficient SPARQL queries on RDF datasets requires index data structures to accompany the data, which hampers compactness. ...
However, the permuterm index is built on an FM-index [15] , which on large alphabets like our [1, n s + n p + n o ] is implemented on a wavelet tree [16] . ...
doi:10.1007/978-3-319-23826-5_11
fatcat:2ry5npsynrf7bcjrbi3fczhkrq
Compressed Indexes for String Searching in Labeled Graphs
2015
Proceedings of the 24th International Conference on World Wide Web - WWW '15
But, as far as we know, all these results are limited to design compressed graph indexes which support basic access operations onto the link structure of the input graph, such as: given a node u, return ...
This paper takes inspiration from the Facebook Unicorn's platform and proposes some compressed-indexing schemes for large graphs whose nodes are labeled with strings of variable length-i.e., node's attributes ...
Acknowledgments We wish to warmly thank Domenico Dato and Daniele Vitale (Istella, Tiscali) for exposing us to some of the problems we addressed in this paper and for fruitful discussions. ...
doi:10.1145/2736277.2741140
dblp:conf/www/FerraginaPV15
fatcat:asrd4n5mkbevxd6w5576pybw3q
Introduction to information retrieval
2009
ChoiceReviews
Tomasic and Garcia-Molina (1993) and Jeong and Omiecinski (1995) are key early papers evaluating term partitioning versus document partitioning for distributed indexes. ...
But the outcome depends on the details of the distributed system; at least one thread of work has reached the opposite conclusion Barbosa 1998, Badue et al. 2001). ...
Permuterm indexes Our first special index for general wildcard queries is the permuterm index, PERMUTERM INDEX a form of inverted index. ...
doi:10.5860/choice.46-2715
fatcat:ruwoe46pgzcupjygnwbnit4z3u
Compression of RDF dictionaries
2012
Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12
This paper focuses on this scenario by adapting compression techniques for string dictionaries to the case of RDF. ...
We propose a novel technique: Dcomp, which can be tuned to represent the dictionary in compressed space (22−64%) and to perform in a few microseconds (1−50µs). ...
Brisaboa, Francisco Claude and Gonzalo Navarro by their advices about compressed string dictionaries, and to Claudio Gutierrez by his continued support and his magistral lessons about the Web of Data. ...
doi:10.1145/2245276.2245343
dblp:conf/sac/Martinez-PrietoFC12
fatcat:eh2oicovznel5pnvtvte3auy7m
LZ-Compressed String Dictionaries
[article]
2013
arXiv
pre-print
We show how to compress string dictionaries using the Lempel-Ziv (LZ78) data compression algorithm. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. ...
We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive. ...
Acknowledgments We thank Giuseppe Ottaviano for providing his data sets, and Francisco Claude and MiguelÁngel Martínez for the source codes of their implementations. ...
arXiv:1305.0674v1
fatcat:yzsaiuw27nb35ngw3xtc5mqz3i
« Previous
Showing results 1 — 15 out of 23 results