A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Improved count suffix trees for natural language data
2008
Proceedings of the 2008 international symposium on Database engineering & applications - IDEAS '08
With more and more natural language text stored in databases, handling respective query predicates becomes very important. Optimizing queries with predicates includes (sub)string estimation, i.e., estimating the selectivity of query terms based on small summary statistics before query execution. Count Suffix Trees (CST) are commonly used to this end. While CST yield good estimates, they are expensive to build and require a large amount of memory to be stored. To fit in the data dictionary of
doi:10.1145/1451940.1451972
dblp:conf/ideas/SautterAB08
fatcat:qwfey72jkrd5peykukoznrud2i