A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
Efficient approximate search on string collections
2009
Proceedings of the VLDB Endowment
This tutorial provides a comprehensive overview of recent research progress on the important problem of approximate search in string collections. ...
We identify existing indexes, search algorithms, filtering strategies, selectivity-estimation techniques and other work, and comment on their respective merits and limitations. ...
A closely related problem is that of selectivity estimation for approximate-string-matching queries. ...
doi:10.14778/1687553.1687623
fatcat:hsji7ybwr5brzfdkduwntzvpl4
Approximate string search in spatial databases
2010
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)
MHR-tree supports a wide range of query predicates efficiently, including range and nearest neighbor queries. We also discuss how to estimate range query selectivity accurately. ...
This work presents a novel index structure, MHRtree, for efficiently answering approximate string match queries in large spatial databases. ...
Our main contributions are summarized as follows: • We formalize the notion of spatial approximate string queries and selectivity estimation for spatial approximate string range queries in Section II. ...
doi:10.1109/icde.2010.5447836
dblp:conf/icde/YaoLHH10
fatcat:cqbngi3gpja3lfoe25b7kbiolq
Spatial Approximate String Search
2013
IEEE Transactions on Knowledge and Data Engineering
We also discuss how to estimate the selectivity of a SAS query in Euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information ...
Specifically, we investigate range queries augmented with a string similarity search predicate in both Euclidean space and road networks. We dub this query the spatial approximate string (SAS) query. ...
Selectivity estimation for ESAS queries Another interesting topic for approximate string queries in spatial databases is selectivity estimation. ...
doi:10.1109/tkde.2012.48
fatcat:43ee5uizfvbczjmspj66j4pbq4
Supporting Similarity Operations Based on Approximate String Matching on the Web
[chapter]
2004
Lecture Notes in Computer Science
To minimize the local processing costs and the required network traffic, the mapping uses materialized information on the selectivity of string samples such as ¤ -samples, substrings, and keywords. ...
Based on the predicate mapping similarity selections and joins are described and the quality and required effort of the operations is evaluated experimentally. ...
, if we set the selectivity threshold to 5%, we have to reject approximately 3% of the queries using s -samples and -samples and approximately 14% of queries using -samples. ...
doi:10.1007/978-3-540-30468-5_16
fatcat:6ubzldbpzjfm3kafbqcygdim3u
A Fast Spatial String Search with Service Composition Method
2014
IOSR Journal of Engineering
space.These called as the spatial approximate string (SAS) query. ...
The experiments have been done using C#.net and data set created using SQL server. General Terms:-Approximate string search, range query, road network, spatial databases. ...
PROBLEM DESCRIPTION The problem is want to search in a collection (unordered set) of strings to find those similar to a single query string ("selection query").Selectivity estimation of range queries on ...
doi:10.9790/3021-04364954
fatcat:wd36i3kb6rgebghyegavnirfrm
Generalized substring selectivity estimation
2003
Journal of computer and system sciences (Print)
Existing methods for the case of multidimensional conjunctive queries approximate selectivities by explicitly storing cross-counts of frequently co-occurring combinations of substrings; estimates are obtained ...
We present a novel approach to selectivity estimation for generalized Boolean substring queries with a focus on the two cases of (1) conjunctive multidimensional and (2) Boolean queries. ...
Estimating the selectivity of Boolean queries In this section, we show how to compute the selectivity of any general Boolean query q: Let T be the full suffix tree constructed on the collection of strings ...
doi:10.1016/s0022-0000(02)00031-4
fatcat:dsrih3esffflbgnxs4sskyzi7y
Approximate substring selectivity estimation
2009
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09
We study the problem of estimating selectivity of approximate substring queries. ...
To begin with, we consider edit distance for the similarity between a pair of strings. ...
Full string selectivity estimation: Given a query string sq and a bag of strings DB, estimate the number of strings s ∈ DB satisfying ed(sq, s) ≤ ∆, where ∆ is the edit distance threshold. Example 1. ...
doi:10.1145/1516360.1516455
dblp:conf/edbt/LeeNS09
fatcat:dkleqy5zejhdxfnqfwfl6cevfm
Estimating the selectivity of tf-idf based cosine similarity predicates
2007
SIGMOD record
To the best of our knowledge, there are no known methods for this problem. In this paper, we present the first approach for estimating the selectivity of tf.idf based cosine similarity predicates. ...
An increasing number of database applications today require sophisticated approximate string matching capabilities. Examples of such application areas include data integration and data cleaning. ...
This simple approximation leads to some very good estimates. ...
doi:10.1145/1361348.1361351
fatcat:m2cizhlzz5cxnjsdjdfm5nprt4
Estimating the selectivity of tf-idf based cosine similarity predicates
2007
SIGMOD record
To the best of our knowledge, there are no known methods for this problem. In this paper, we present the first approach for estimating the selectivity of tf.idf based cosine similarity predicates. ...
An increasing number of database applications today require sophisticated approximate string matching capabilities. Examples of such application areas include data integration and data cleaning. ...
This simple approximation leads to some very good estimates. ...
doi:10.1145/1328854.1328855
fatcat:g2tk6ni4pvdtxngbxfybvxyosu
Sublinear Algorithms for Approximating String Compressibility
2012
Algorithmica
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. ...
In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution. ...
We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [18] and whose comments greatly improved drafts of this article. ...
doi:10.1007/s00453-012-9618-6
fatcat:qfbpob63y5be3h7l2tukivyvta
Sublinear Algorithms for Approximating String Compressibility
[chapter]
2007
Lecture Notes in Computer Science
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. ...
In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution. ...
We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [18] and whose comments greatly improved drafts of this article. ...
doi:10.1007/978-3-540-74208-1_44
fatcat:f4gnwf6wpvhj5lvoczmhkhp7zq
Sublinear Algorithms for Approximating String Compressibility
[article]
2007
arXiv
pre-print
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. ...
In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution. ...
We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [17] and whose comments greatly improved drafts of this article. ...
arXiv:0706.1084v1
fatcat:xbtuqt4rvzdwtp2anxhoceiriu
SEPIA: estimating selectivities of approximate string predicates in large Databases
2008
The VLDB journal
In this paper, we study the problem of estimating selectivities of fuzzy string predicates. We develop a novel technique, called Sepia, to solve the problem. ...
Query optimization needs the selectivity of such a fuzzy predicate, i.e., the fraction of records in the database that satisfy the condition. ...
To estimate the selectivity of a wildcard predicate, these techniques divide the query string into disjoint or overlapping substrings, and estimate the selectivity of each substring using the summary structure ...
doi:10.1007/s00778-007-0061-2
fatcat:axxg7w7tkjdbhjkus4vzeu5acu
One-dimensional and multi-dimensional substring selectivity estimation
2000
The VLDB journal
Effective query optimization in this context requires good selectivity estimates. ...
With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. ...
The research effort of H.V. Jagadish was supported in part by the NSF under grant IIS-9986030. We would like to thank Flip Korn and Zhiyuan Chen for their comments on an earlier version of the paper. ...
doi:10.1007/s007780000029
fatcat:sy35a43oovef3czwa6iiho763a
Selectivity estimation for hybrid queries over text-rich data graphs
2013
Proceedings of the 16th International Conference on Extending Database Technology - EDBT '13
Existing work on selectivity estimation focuses either on string or on structured query predicates alone. ...
In our experiments on real-world data, we show that capturing dependencies between structured and textual data in this way greatly improves the accuracy of selectivity estimates without compromising the ...
However, previous works estimate the selectivity of single string predicates. ...
doi:10.1145/2452376.2452421
dblp:conf/edbt/WagnerBT13
fatcat:e2g7j4e2wvfa7nqagc7wdem6wy
« Previous
Showing results 1 — 15 out of 32,854 results