Filters








32,854 Hits in 4.6 sec

Efficient approximate search on string collections

Marios Hadjieleftheriou, Chen Li
2009 Proceedings of the VLDB Endowment  
This tutorial provides a comprehensive overview of recent research progress on the important problem of approximate search in string collections.  ...  We identify existing indexes, search algorithms, filtering strategies, selectivity-estimation techniques and other work, and comment on their respective merits and limitations.  ...  A closely related problem is that of selectivity estimation for approximate-string-matching queries.  ... 
doi:10.14778/1687553.1687623 fatcat:hsji7ybwr5brzfdkduwntzvpl4

Approximate string search in spatial databases

Bin Yao, Feifei Li, Marios Hadjieleftheriou, Kun Hou
2010 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)  
MHR-tree supports a wide range of query predicates efficiently, including range and nearest neighbor queries. We also discuss how to estimate range query selectivity accurately.  ...  This work presents a novel index structure, MHRtree, for efficiently answering approximate string match queries in large spatial databases.  ...  Our main contributions are summarized as follows: • We formalize the notion of spatial approximate string queries and selectivity estimation for spatial approximate string range queries in Section II.  ... 
doi:10.1109/icde.2010.5447836 dblp:conf/icde/YaoLHH10 fatcat:cqbngi3gpja3lfoe25b7kbiolq

Spatial Approximate String Search

Feifei Li, Bin Yao, Mingwang Tang, Marios Hadjieleftheriou
2013 IEEE Transactions on Knowledge and Data Engineering  
We also discuss how to estimate the selectivity of a SAS query in Euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information  ...  Specifically, we investigate range queries augmented with a string similarity search predicate in both Euclidean space and road networks. We dub this query the spatial approximate string (SAS) query.  ...  Selectivity estimation for ESAS queries Another interesting topic for approximate string queries in spatial databases is selectivity estimation.  ... 
doi:10.1109/tkde.2012.48 fatcat:43ee5uizfvbczjmspj66j4pbq4

Supporting Similarity Operations Based on Approximate String Matching on the Web [chapter]

Eike Schallehn, Ingolf Geist, Kai-Uwe Sattler
2004 Lecture Notes in Computer Science  
To minimize the local processing costs and the required network traffic, the mapping uses materialized information on the selectivity of string samples such as ¤ -samples, substrings, and keywords.  ...  Based on the predicate mapping similarity selections and joins are described and the quality and required effort of the operations is evaluated experimentally.  ...  , if we set the selectivity threshold to 5%, we have to reject approximately 3% of the queries using s -samples and -samples and approximately 14% of queries using -samples.  ... 
doi:10.1007/978-3-540-30468-5_16 fatcat:6ubzldbpzjfm3kafbqcygdim3u

A Fast Spatial String Search with Service Composition Method

S. Udhayakumar
2014 IOSR Journal of Engineering  
space.These called as the spatial approximate string (SAS) query.  ...  The experiments have been done using C#.net and data set created using SQL server. General Terms:-Approximate string search, range query, road network, spatial databases.  ...  PROBLEM DESCRIPTION The problem is want to search in a collection (unordered set) of strings to find those similar to a single query string ("selection query").Selectivity estimation of range queries on  ... 
doi:10.9790/3021-04364954 fatcat:wd36i3kb6rgebghyegavnirfrm

Generalized substring selectivity estimation

Zhiyuan Chen, Flip Korn, Nick Koudas, S. Muthukrishnan
2003 Journal of computer and system sciences (Print)  
Existing methods for the case of multidimensional conjunctive queries approximate selectivities by explicitly storing cross-counts of frequently co-occurring combinations of substrings; estimates are obtained  ...  We present a novel approach to selectivity estimation for generalized Boolean substring queries with a focus on the two cases of (1) conjunctive multidimensional and (2) Boolean queries.  ...  Estimating the selectivity of Boolean queries In this section, we show how to compute the selectivity of any general Boolean query q: Let T be the full suffix tree constructed on the collection of strings  ... 
doi:10.1016/s0022-0000(02)00031-4 fatcat:dsrih3esffflbgnxs4sskyzi7y

Approximate substring selectivity estimation

Hongrae Lee, Raymond T. Ng, Kyuseok Shim
2009 Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09  
We study the problem of estimating selectivity of approximate substring queries.  ...  To begin with, we consider edit distance for the similarity between a pair of strings.  ...  Full string selectivity estimation: Given a query string sq and a bag of strings DB, estimate the number of strings s ∈ DB satisfying ed(sq, s) ≤ ∆, where ∆ is the edit distance threshold. Example 1.  ... 
doi:10.1145/1516360.1516455 dblp:conf/edbt/LeeNS09 fatcat:dkleqy5zejhdxfnqfwfl6cevfm

Estimating the selectivity of tf-idf based cosine similarity predicates

Sandeep Tata, Jignesh M. Patel
2007 SIGMOD record  
To the best of our knowledge, there are no known methods for this problem. In this paper, we present the first approach for estimating the selectivity of tf.idf based cosine similarity predicates.  ...  An increasing number of database applications today require sophisticated approximate string matching capabilities. Examples of such application areas include data integration and data cleaning.  ...  This simple approximation leads to some very good estimates.  ... 
doi:10.1145/1361348.1361351 fatcat:m2cizhlzz5cxnjsdjdfm5nprt4

Estimating the selectivity of tf-idf based cosine similarity predicates

Sandeep Tata, Jignesh M. Patel
2007 SIGMOD record  
To the best of our knowledge, there are no known methods for this problem. In this paper, we present the first approach for estimating the selectivity of tf.idf based cosine similarity predicates.  ...  An increasing number of database applications today require sophisticated approximate string matching capabilities. Examples of such application areas include data integration and data cleaning.  ...  This simple approximation leads to some very good estimates.  ... 
doi:10.1145/1328854.1328855 fatcat:g2tk6ni4pvdtxngbxfybvxyosu

Sublinear Algorithms for Approximating String Compressibility

Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, Adam Smith
2012 Algorithmica  
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time.  ...  In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution.  ...  We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [18] and whose comments greatly improved drafts of this article.  ... 
doi:10.1007/s00453-012-9618-6 fatcat:qfbpob63y5be3h7l2tukivyvta

Sublinear Algorithms for Approximating String Compressibility [chapter]

Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, Adam Smith
2007 Lecture Notes in Computer Science  
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time.  ...  In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution.  ...  We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [18] and whose comments greatly improved drafts of this article.  ... 
doi:10.1007/978-3-540-74208-1_44 fatcat:f4gnwf6wpvhj5lvoczmhkhp7zq

Sublinear Algorithms for Approximating String Compressibility [article]

Sofya Raskhodnikova and Dana Ron and Ronitt Rubinfeld and Adam Smith
2007 arXiv   pre-print
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time.  ...  In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution.  ...  We would like to thank Amir Shpilka, who was involved in a related paper on distribution support testing [17] and whose comments greatly improved drafts of this article.  ... 
arXiv:0706.1084v1 fatcat:xbtuqt4rvzdwtp2anxhoceiriu

SEPIA: estimating selectivities of approximate string predicates in large Databases

Liang Jin, Chen Li, Rares Vernica
2008 The VLDB journal  
In this paper, we study the problem of estimating selectivities of fuzzy string predicates. We develop a novel technique, called Sepia, to solve the problem.  ...  Query optimization needs the selectivity of such a fuzzy predicate, i.e., the fraction of records in the database that satisfy the condition.  ...  To estimate the selectivity of a wildcard predicate, these techniques divide the query string into disjoint or overlapping substrings, and estimate the selectivity of each substring using the summary structure  ... 
doi:10.1007/s00778-007-0061-2 fatcat:axxg7w7tkjdbhjkus4vzeu5acu

One-dimensional and multi-dimensional substring selectivity estimation

H.V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava
2000 The VLDB journal  
Effective query optimization in this context requires good selectivity estimates.  ...  With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching.  ...  The research effort of H.V. Jagadish was supported in part by the NSF under grant IIS-9986030. We would like to thank Flip Korn and Zhiyuan Chen for their comments on an earlier version of the paper.  ... 
doi:10.1007/s007780000029 fatcat:sy35a43oovef3czwa6iiho763a

Selectivity estimation for hybrid queries over text-rich data graphs

Andreas Wagner, Veli Bicer, Thanh D. Tran
2013 Proceedings of the 16th International Conference on Extending Database Technology - EDBT '13  
Existing work on selectivity estimation focuses either on string or on structured query predicates alone.  ...  In our experiments on real-world data, we show that capturing dependencies between structured and textual data in this way greatly improves the accuracy of selectivity estimates without compromising the  ...  However, previous works estimate the selectivity of single string predicates.  ... 
doi:10.1145/2452376.2452421 dblp:conf/edbt/WagnerBT13 fatcat:e2g7j4e2wvfa7nqagc7wdem6wy
« Previous Showing results 1 — 15 out of 32,854 results