119,321 Hits in 2.7 sec

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space [chapter]

Dong Kyue Kim, Heejin Park
2005 Lecture Notes in Computer Science  
The compressed suffix array and the compressed suffix tree for a given string S are full-text index data structures occupying O(n log |Σ|) bits where n is the length of S and Σ is the alphabet from which  ...  However, compressed suffix trees supporting the pattern search in O(m log |Σ|) time are not constructed by these methods.  ...  Recently, Kim et al. [20, 21] developed a new child table supporting O(m log |Σ|)-time pattern search. They called the enhanced suffix array with the new child table linearized suffix tree.  ... 
doi:10.1007/11496656_4 fatcat:zdtzslecdvftfndeezfy3osefa

Filtering Multi-set Tree: Data Structure for Flexible Matching Using Multi-track Data

Kazuyuki NARISAWA, Takashi KATSURA, Hiroyuki OTA, Ayumi SHINOHARA
2015 Interdisciplinary Information Sciences  
The FILM tree is a complete binary tree based on a spectral Bloom filter (SBF) with hash functions.  ...  The permuted pattern matching problem aims to determine the occurrences of multi-track patterns in multi-track text by allowing the order of the pattern tracks to be permuted.  ...  We performed a comparison with MTST for the full-permuted pattern matching problem and demonstrated that FILM tree can search patterns faster than MTST.  ... 
doi:10.4036/iis.2015.37 fatcat:sbuftk2uorelpemoxz55i5rb7i

Suffix arrays with a twist [article]

Tomasz Kowalski, Szymon Grabowski, Kimmo Fredriksson, Marcin Raniszewski
2016 arXiv   pre-print
The suffix array is a classic full-text index, combining effectiveness with simplicity.  ...  In short, we show that (i) how we search for the right interval boundary impacts significantly the overall search speed, (ii) a B-tree data layout easily wins over the standard one, (iii) the well-known  ...  Introduction Everybody knows the suffix array (SA) [4] , a simple full-text index data structure capable of finding the occ occurrences of a pattern P of length m in O(m log n + occ) time, where n is  ... 
arXiv:1607.08176v1 fatcat:6glbjvzlf5hftatnbzpszaxifi

The Continued Saga of DB-IR Integration [chapter]

2004 Proceedings 2004 VLDB Conference  
</instrument> </incision> Conditions on Text Equality: //section[title="Procedure"] Full-text: //section[contains(title, "Procedure")] Full-text Requirements -I • Full-text predicates • Search query  ...  " A collection of data trees (C) A collection of scored data trees A scored pattern tree ( p ) SCORED SELECTION A collection of data trees (C) ) ( ' , C L ! !  ... 
doi:10.1016/b978-012088469-8/50118-2 fatcat:dktiusnpj5hcfbu2fopto7psqq

The Continued Saga of DB-IR Integration [chapter]

Ricardo Baeza-Yates, Mariano Consens
2004 Proceedings 2004 VLDB Conference  
</instrument> </incision> Conditions on Text Equality: //section[title="Procedure"] Full-text: //section[contains(title, "Procedure")] Full-text Requirements -I • Full-text predicates • Search query  ...  " A collection of data trees (C) A collection of scored data trees A scored pattern tree ( p ) SCORED SELECTION A collection of data trees (C) ) ( ' , C L ! !  ... 
doi:10.1016/b978-012088469-8.50118-2 dblp:conf/vldb/Baeza-YatesC04 fatcat:2lzk6qlgurgbdoj6do2qtxy2za


Cédric du Mouza, Witold Litwin, Philippe Rigaux, Thomas Schwarz
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09  
AS-Index is a new index structure for exact string search in disk resident databases. It uses hashing, unlike known alternatives, whether baesd on trees or tries.  ...  Use of hashing provides for constant index access time for arbitrarily long patterns, unlike other structures whose search cost is at best logarithmic.  ...  This data is at least partly unstructured, which creates the need for full text searches (or pattern matching) [17] .  ... 
doi:10.1145/1645953.1645993 dblp:conf/cikm/MouzaLRS09 fatcat:bficsylhvrbbjlctnnwwuripxm

A New Indexing Method for Approximate String Matching [chapter]

Gonzalo Navarro, Ricardo Baeza-Yates
1999 Lecture Notes in Computer Science  
The method is based on a su x tree combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the retrieval time is O(n ), for 0 < < 1, whenever < 1 ?  ...  We experimentally show that this index outperforms by far all other algorithms for indexed approximate searching, also being the rst experiments that compare the di erent existing schemes.  ...  We also thank Erkki Sutinen for his code to build the su x tree, and Gene Myers and Archie Cobbs for sending us their implemented indices.  ... 
doi:10.1007/3-540-48452-3_13 fatcat:miajo6js6nc7jdnhp2pnvnlj64

String algorithms and data structures [article]

Paolo Ferragina
2008 arXiv   pre-print
The engineered version of String B-trees (Section 3.4) has been devised in collaboration with Roberto Grossi; the randomized algorithm for string sorting in external memory (Section 3.6) is a joint result  ...  with Mikkel Thorup; finally, the WFMindex (Section 4.3) is a recent advancement achieved together with Giovanni Manzini.  ...  are paths into the hierarchical tree structure of an XML document), and vocabulary implementations to support exact or complex pattern searches (even the inverted indexes might benefit of full-text indexes  ... 
arXiv:0801.2378v1 fatcat:2subtyqbm5hkpefrsio33ioktm

Page 9215 of Mathematical Reviews Vol. , Issue 2004k [page]

2004 Mathematical Reviews  
Summary: “The suffix array is a widely used full-text index that allows fast searches on the text.  ...  Summary: “We present new search algorithms to detect the occur- rences of any pattern from a given pattern set in a text, allowing in the occurrences a limited number of spurious text characters among  ... 

CPS-tree: A Compact Partitioned Suffix Tree for Disk-based Indexing on Large Genome Sequences

Swee-Seong Wong, Wing-Kin Sung, Limsoon Wong
2007 2007 IEEE 23rd International Conference on Data Engineering  
Suffix tree allows for efficient pattern search with time independent of the sequence length.  ...  Second, our storage scheme improves the access pattern and reduces the number of page fault resulting in efficient search retrieval and efficient tree traversal operations.  ...  Introduction Suffix tree is an important data structure for indexing text string since it can answer pattern searching query efficiently independent of the text string size.  ... 
doi:10.1109/icde.2007.369009 dblp:conf/icde/WongSW07 fatcat:6uqq6wfjtje5rkdunxbwkjk3zq

A Hardware-Efficient Pattern Matching Architecture Using Process Element Tree for Deep Packet Inspection

Seongyong AHN, Hyejeong HONG, HyunJin KIM, Jin-Ho AHN, Dongmyong BAEK, Sungho KANG
2010 IEICE transactions on communications  
The proposed pattern matching architecture detects the start point of pattern matching from multi-character input using input text alignment.  ...  This paper proposes a new pattern matching architecture with multi-character processing for deep packet inspection.  ...  When the first substring of a pattern is s 1 ={c 1 ,c 2 ,· · ·,c n } with process width n, the substring is searched for in the input text T ={t 1 ,t 2 ,· · ·,t x , x→ ∞}. c x and t x represent a character  ... 
doi:10.1587/transcom.e93.b.2440 fatcat:unoo26b6qnakrm6p65cjhsy2ge

I/O-Efficient Compressed Text Indexes: From Theory to Practice

Sheng-Yuan Chiu, Wing-Kai Hon, Rahul Shah, Jeffrey Scott Vitter
2010 2010 Data Compression Conference  
Databases supporting full-text indexing functionality on text data are now widely used by biologists.  ...  Pattern matching on text data has been a fundamental field of Computer Science for nearly 40 years.  ...  For the pattern queries, apart from testing patterns with different lengths, we also test patterns selected from the following sets: 1) Set-0: Patterns with no occurrences in the text. 2) Set-1000: Patterns  ... 
doi:10.1109/dcc.2010.45 dblp:conf/dcc/ChiuHSV10 fatcat:xwg4nqpehrbfbjcsv44fe7sbru

Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays [article]

Simon Gog, Alistair Moffat, J. Shane Culpepper, Andrew Turpin, and Anthony Wirth
2013 arXiv   pre-print
The suffix array is an efficient data structure for in-memory pattern search.  ...  Experiments using 64 GB of English web text and a laptop computer with just 4 GB of main memory demonstrate the speed and versatility of the new approach.  ...  tree to the (full) first suffix string of the block.  ... 
arXiv:1303.6481v1 fatcat:soh5dytslfezjffzutf6lwfxbi

Efficient indexing algorithms for approximate pattern matching in text

Matthias Petri, J. Shane Culpepper
2012 Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS '12  
Efficient solutions to approximate pattern matching can be applied to natural language keyword queries with spelling mistakes, OCR scanned text incorporated into indexes, language model ranking algorithms  ...  Approximate pattern matching is an important computational problem with a wide variety of applications in Information Retrieval.  ...  The approximate pattern matching problem can be defined as follows: LOCATE, COUNT, or EXTRACT all occurrences of pattern P of length m in a text T of size n with at most k errors.  ... 
doi:10.1145/2407085.2407087 dblp:conf/adcs/PetriC12 fatcat:n5wzkfhvxrhwdmaxr7brmc4xxi

FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation [chapter]

Andriy Nikolov, Andreas Schwarte, Christian Hütter
2013 Lecture Notes in Computer Science  
Combining structured queries with full-text search provides a powerful means to access distributed linked data.  ...  However, executing hybrid search queries in a federation of multiple data sources presents a number of challenges due to data source heterogeneity and lack of statistical data about keyword selectivity  ...  -Queries include different proportion of full-text vs graph clauses: 2 queries are full-text only, 2 queries are hybrid with 1 full-text search clause, and 2 queries are hybrid with 2 full-text search  ... 
doi:10.1007/978-3-642-41335-3_27 fatcat:tt7gztkmsjg5zbypo2rbgf45ta
« Previous Showing results 1 — 15 out of 119,321 results