A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Generalized substring selectivity estimation
2003
Journal of computer and system sciences (Print)
In providing such selectivity estimates, the correlation between different occurrences of substrings is crucial. ...
The cross-counts generated by our methods are not exact, but they are adequate for selectivity estimation. ...
work on substring selectivity estimation, the more general problem of selectivity estimation on Boolean substring predicates has not been studied. ...
doi:10.1016/s0022-0000(02)00031-4
fatcat:dsrih3esffflbgnxs4sskyzi7y
Approximate substring selectivity estimation
2009
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09
We study the problem of estimating selectivity of approximate substring queries. ...
The experimental results show that MOF is a light-weight algorithm that gives fairly accurate estimations. ...
These concepts are not applicable to substring selectivity estimation. ...
doi:10.1145/1516360.1516455
dblp:conf/edbt/LeeNS09
fatcat:dkleqy5zejhdxfnqfwfl6cevfm
Supporting Similarity Operations Based on Approximate String Matching on the Web
[chapter]
2004
Lecture Notes in Computer Science
To minimize the local processing costs and the required network traffic, the mapping uses materialized information on the selectivity of string samples such as ¤ -samples, substrings, and keywords. ...
Based on the predicate mapping similarity selections and joins are described and the quality and required effort of the operations is evaluated experimentally. ...
The key criteria considered during evaluation are the selectivity of generated pre-selections, the quality of our selectivity estimation, and the applicability to actual data values. ...
doi:10.1007/978-3-540-30468-5_16
fatcat:6ubzldbpzjfm3kafbqcygdim3u
One-dimensional and multi-dimensional substring selectivity estimation
2000
The VLDB journal
In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). ...
Effective query optimization in this context requires good selectivity estimates. ...
To estimate substring selectivity in multiple dimensions, we need to generalize the PST to multiple dimensions. ...
doi:10.1007/s007780000029
fatcat:sy35a43oovef3czwa6iiho763a
CXHist : An On-line Classification-Based Histogram for XML String Selectivity Estimation
2005
Very Large Data Bases Conference
Hence, XML string selectivity estimation is a harder problem than relational substring selectivity estimation, because the correlation between path and substring statistics needs to be captured as well ...
The main difference between the XML string selectivity estimation problem and the relational substring selectivity estimation problem is that a correlated path (whether implicitly encoded as a path ID ...
dblp:conf/vldb/LimWV05
fatcat:2o222gg2cfcvlju2xirro4mh24
When is an estimation of distribution algorithm better than an evolutionary algorithm?
2009
2009 IEEE Congress on Evolutionary Computation
Despite the wide-spread popularity of estimation of distribution algorithms (EDAs), there has been no theoretical proof that there exist optimisation problems where EDAs perform significantly better than ...
Here, it is proved rigorously that on a problem called SUBSTRING, a simple EDA called univariate marginal distribution algorithm (UMDA) is efficient, whereas the (1+1) EA is highly inefficient. ...
be the populations before and after the selection at the t th generation (t ∈ N + ) respectively, p t,i (1) (p t,i (0)) be the estimated marginal probability of the i th bit of an individual to be 1 ( ...
doi:10.1109/cec.2009.4983116
dblp:conf/cec/ChenLTY09
fatcat:qgqog6v3djf4tnvllzwkl32oti
Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
2014
Journal of Advanced Research
We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated ...
On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. ...
(iii) If the substring is to be selected from A li n V li , then generate a character sequence according to the joint distribution P int (w|l i ). ...
doi:10.1016/j.jare.2014.01.001
pmid:25685511
pmcid:PMC4294760
fatcat:lpxqtbssefgljiqexfphlaouj4
Page 3295 of Mathematical Reviews Vol. , Issue 2004d
[page]
2004
Mathematical Reviews
substring selectivity estimation. ...
The cross-counts generated by our methods are not exact, but they are adequate for selectivity estimation. ...
A partition-based method for string similarity joins with edit-distance constraints
2013
ACM Transactions on Database Systems
Finally, we verify the candidates to generate the final answer. We devise efficient techniques to select substrings and prove that our method can minimize the number of selected substrings. ...
Then for each string, we select some of its substrings, identify the selected substrings from the inverted indices, and take strings on the inverted lists of the found substrings as candidates of this ...
The substring set W m (s, l) generated by the multimatch-aware selection method has the minimum size among all the substring sets generated by the substring selection methods that satisfy completeness. ...
doi:10.1145/2487259.2487261
fatcat:k3tft2ydnvc53ptmunsoqb3fxu
Sequences Dimensionality-Reduction by K-mer Substring Space Sampling Enables Effective Resemblance- and Containment-Analysis for Large-Scale omics-data
[article]
2019
bioRxiv
pre-print
We proposed a new sequence sketching technique named k-mer substring space decomposition (kssd), which sketches sequences via k-mer substring space sampling instead of local-sensitive hashing. ...
Kssd is more accurate and faster for resemblance estimation than other sketching methods developed so far. Notably, kssd is robust even when two sequences are of very different sizes. ...
To address this, we generalized k-mer space sampling/shuffling to kmer substring space sampling/shuffling (kssd)--where a substring of the k-mer is selected according to a predefined pattern, so that we ...
doi:10.1101/729665
fatcat:e3rjbqym25anpoa4li3bq6wwfu
Interpolated Spectral NGram Language Models
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
First, in order to capture long-range dependencies of the data, the method must use statistics from long substrings, which results in very large matrices that are difficult to decompose. ...
The spectral method is based on computing a Hankel matrix that contains statistics of expectations over substrings generated by the target language. ...
The ability of the spectral method for PNFA to estimate substring expectations can be exploited in other contexts. ...
doi:10.18653/v1/p19-1594
dblp:conf/acl/QuattoniC19
fatcat:umjd54ixobcalfga6he45bypmu
Text classification stream-based R-measure approach using frequency of substring repetition
2015
Vestnik Tomskogo gosudarstvennogo universiteta Upravlenie vychislitel naya tekhnika i informatika
An accuracy of text classification is estimated by Van Rijsbergen's effectiveness measure known as F-measure. ...
Stream-based approach of R-measure using frequency of substring repetition in text classification is offered. ...
To estimate feasible performance of using frequencies in classification the approach based on R-measure modification that can use frequencies of substring repetition is offered. ...
doi:10.17223/19988605/33/1
fatcat:zystzkvgqvcrvnhmps36bzijke
PASS-JOIN: A Partition-based Method for Similarity Joins
[article]
2011
arXiv
pre-print
We devise efficient techniques to select the substrings and prove that our method can minimize the number of selected substrings. ...
Then for each string, Pass-Join selects some of its substrings and uses the selected substrings to find candidate pairs using the inverted indices. ...
The substring set Wm(s, l) generated by the multi-match-aware selection method has the minimum size among all the substring sets generated by the substring selection methods that satisfy completeness. ...
arXiv:1111.7171v1
fatcat:ygefsyrcuzc2rap42sf25enmei
The reference string indexing method
[chapter]
1978
Lecture Notes in Computer Science
Generally f(s) denotes the frequency of a substring s in S, and RSj denotes the set of refstrings with length j. ...
Exploiting this assumption, a (small) set of "reference strings" is generated by a statistical analysis of collected queries or -if not available -by usage estimation with the original data. ...
doi:10.1007/3-540-08934-9_92
fatcat:dettile55bda7lqowqd4qthluq
Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis
2021
Genome Biology
Here, we develop k -mer substring space decomposition (Kssd), a sketching technique which is significantly faster and more accurate than current sketching methods. ...
First, the k-mers with p-selected-substring (green substring in 1st column) belonging to the red subspace s are selected, where the p-selected-substrings are recoded by the lexically ordered dimension ...
Sketching: for a given sequence, k-mers with its p-selected-substring presented in the chosen k-mer substring subspace are selected and recoded into sketch (Fig. 2b ). ...
doi:10.1186/s13059-021-02303-4
pmid:33726811
pmcid:PMC7962209
fatcat:ylta5ntqqjflno5wen5r22665u
« Previous
Showing results 1 — 15 out of 13,176 results