8,139 Hits in 5.4 sec

Incremental Text Indexing for Fast Disk-Based Search

Giorgos Margaritis, Stergios V. Anastasiadis
2014 ACM Transactions on the Web  
Incremental text indexing for fast disk-based search.  ...  For the support of fast search over disk-based storage, we take a fresh look at incremental text indexing in the context of current architectural features.  ...  Permission to use the ClueWeb09 Text Research Collections has been granted by Organization Agreement with Carnegie Mellon University.  ... 
doi:10.1145/2560800 fatcat:bdwpapill5g6lawoih67ojljaa

Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems

Andrew W. Leung, Minglong Shao, Timothy Bisson, Shankar Pasupathy, Ethan L. Miller
2009 USENIX Conference on File and Storage Technologies  
A novel index versioning mechanism provides both fast index updates and "back-in-time" search of metadata.  ...  Signature files are used to significantly reduce a query's search space, improving performance and scalability. Snapshot-based metadata collection allows incremental crawling of only modified files.  ...  We thank the industrial affiliates of the SSRC for their support.  ... 
dblp:conf/fast/LeungSBPM09 fatcat:yjsikzhp3ja6xp3yunpcv6sqcu

A Very Fast Algorithm for Detecting Partially Plagiarized Documents Using FM-Index

Chang SeokOck, JongKyu Seo, Sung-Hwan Kim, Hwan-Gue Cho
2013 International Journal of Computer and Communication Engineering  
The method is based on the Burrows-Wheeler Transform (BWT) and the FM-index for BWT search.  ...  We use disk-based techniques and Genome assembly used in Next Generation Sequencing (NGS) to overcome this disadvantage.  ...  disk-based BWT.  Minimizing the time and space complexity of the search using FM-index.  Detecting similar sections using the incremental density method.  ... 
doi:10.7763/ijcce.2013.v2.194 fatcat:lvt5msu6gnbz5jnwrlmfr3m4aq

String algorithms and data structures [article]

Paolo Ferragina
2008 arXiv   pre-print
This survey is aimed at illustrating the key ideas which should constitute, in our opinion, the current background of every index designer.  ...  We also discuss the positive features and drawback of known indexing schemes and algorithms, and devote much attention to detail research issues and open problems both on the theoretical and the experimental  ...  I finally thanks Valentina Ciriani and Giovanni Manzini for carefully reading and commenting the preliminary versions of this survey.  ... 
arXiv:0801.2378v1 fatcat:2subtyqbm5hkpefrsio33ioktm

Indexing time vs. query time

Stefan Büttcher, Charles L. A. Clarke
2005 Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05  
Two aspects of the retrieval system -fast, incremental updates and garbage collection for delayed document deletions -are discussed in detail, with a focus on the respective trade-offs.  ...  Special attention is given to a particular case of dynamic search systems -desktop and file system search.  ...  It is similar to the incremental indexing scheme used by Lucene 5 .  ... 
doi:10.1145/1099554.1099645 dblp:conf/cikm/ButtcherC05 fatcat:bz45mu7renbmziryddjexbqxxi

An Intelligent Backend System for Text Processing Applications

Hans Diel, Horst Schukat
1988 Open research Areas in Information Retrieval  
This paper describes concepts and design of a backend system suitable for text processing applications.  ...  It should enable the formulation of powerful high level query requests which are well suited, for example, to full-text databases and office systems.  ...  ACKNOWLEDGEMENTS The design and prototyping of the intelligent disk controller have been done in a joint project between IBM Germany and the Technical University of Braun schweig. Germany.  ... 
dblp:conf/riao/DielS88 fatcat:roxno6b6jzccpf7hej3bbpr2eq

On the Feasibility of Peer-to-Peer Web Indexing and Search [chapter]

Jinyang Li, Boon Thau Loo, Joseph M. Hellerstein, M. Frans Kaashoek, David R. Karger, Robert Morris
2003 Lecture Notes in Computer Science  
This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web.  ...  The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects on performance, and concludes that in combination these optimizations  ...  DHTs are well-suited for exact match lookups using unique identifiers, but do not directly support text search. There have been recent proposals for P2P text search [17, 20, 11, 10] over DHTs.  ... 
doi:10.1007/978-3-540-45172-3_19 fatcat:s5eomcwnrbeolefw4de7wavoxq

Synthetic Workload Performance Analysis of Incremental Updates [chapter]

Kurt Shoens, Anthony Tomasic, Hector Garcia-Molina
1994 SIGIR '94  
The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks.  ...  Declining disk and CPU costs have kindled a renewed interest in e cient d o c u m e n t indexing techniques.  ...  Acknowledgments: Thanks to Mendel Rosenblum for discussions on le system mechanisms related to this paper.  ... 
doi:10.1007/978-1-4471-2099-5_34 dblp:conf/sigir/ShoensTG94 fatcat:bny2dt7gqzeendmeoo4nlahvvu

Surrogate subsets: a free space management strategy for the index of a text retrieval system

F. J. Burkowski
1990 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '90  
This paper presents a new data structure and an associated strategy to be utilized by indexing facilities for text retrieval systemsThe paperstarts by reviewing some of the goals that may be considered  ...  when designing such an index and continues with a small survey of various current strategies.  ...  Dennis Ablett of Infomart in Toronto for his help in arranging the acquisition of this text.  ... 
doi:10.1145/96749.98226 dblp:conf/sigir/Burkowski90 fatcat:atvuybsv3vcqlasik3eih2hq6y

q-gram based database searching using a suffix array (QUASAR)

Stefan Burkhardt, Andreas Crauser, Paolo Ferragina, Hans-Peter Lenhof, Eric Rivals, Martin Vingron
1999 Proceedings of the third annual international conference on Computational molecular biology - RECOMB '99  
Two v ersions were developed, one for a RAM resident su x array and one for access to the su x array on disk.  ...  Here we p r e s e n t a new data base searching algorithm dubbed QUASAR (Q-gram Alignment based on Su x ARrays) which w as designed to quickly detect sequences with strong similarity to the query in a  ...  By using an index data structure for all q-grams in D we hope to direct the search f o r Q towards small portions of D and thus to avoid a scan of the whole data base.  ... 
doi:10.1145/299432.299460 dblp:conf/recomb/BurkhardtCFLRV99 fatcat:bximeplvbbcldal7fj35zg3kiu

Emerging technologies to speed information access

Del Satterthwaite, Marjorie M.K. Hlava
2011 Information Services and Use  
Indexing and search technology has remained essentially unchanged since the 1980s. The industry still relies on these old technologies to access information.  ...  New and emerging technology is available that dramatically increase the speed, precision, and scalability of the index and search solutions we have today.  ...  It also provides scalability by using disk based indexes that allow for markedly larger datasets to be searched than traditional methods and incremental indexing methodologies that allow for quick additions  ... 
doi:10.3233/isu-2010-0616 fatcat:u4xw7fmzzzh7fena64qegc7ffq


Oren Laadan, Ricardo A. Baratto, Dan B. Phung, Shaya Potter, Jason Nieh
2007 Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles - SOSP '07  
DejaView records visual output, checkpoints corresponding application and file system state, and captures displayed text with contextual information to index the record.  ...  for interactive use.  ...  DejaView captures displayed text and associates it with visual output to index the display record for searching.  ... 
doi:10.1145/1294261.1294289 dblp:conf/sosp/LaadanBPPN07 fatcat:dqefaa24uja45lctwatmc5avua


Oren Laadan, Ricardo A. Baratto, Dan B. Phung, Shaya Potter, Jason Nieh
2007 ACM SIGOPS Operating Systems Review  
DejaView records visual output, checkpoints corresponding application and file system state, and captures displayed text with contextual information to index the record.  ...  for interactive use.  ...  DejaView captures displayed text and associates it with visual output to index the display record for searching.  ... 
doi:10.1145/1323293.1294289 fatcat:6f6p2t63efcldml3bmrcy5d5v4

Modern B-Tree Techniques

Goetz Graefe
2010 Foundations and Trends in Databases  
Flash memory might also remain hidden, perhaps as large and fast virtual memory or as fast disk storage.  ...  With more index-to-index navigation, tuning the set of indexes including automatic incremental index creation, growth, optimization, etc. will come more into focus in future database engines.  ... 
doi:10.1561/1900000028 fatcat:uisqivwiqre4jg2yug4vf6u3ve

A Database Index to Large Biological Sequences

Ela Hunt, Malcolm P. Atkinson, Robert W. Irving
2001 Very Large Data Bases Conference  
Our implementation technique is novel, in that it allows us to build su x trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters.  ...  We propose to use such indexes as an alternative to the current practice of serial scanning.  ...  It enables arbitrarily large sequences to be indexed and the su x tree built incrementally on disk.  ... 
dblp:conf/vldb/HuntAI01 fatcat:zx3urj3o45hcfpakr2spefbocy
« Previous Showing results 1 — 15 out of 8,139 results