Filters








70,504 Hits in 4.7 sec

Block addressing indices for approximate text retrieval

Ricardo Baeza-Yates, Gonzalo Navarro
1997 Proceedings of the sixth international conference on Information and knowledge management - CIKM '97  
Despite their existence, little is known about the expected behavior of these \block addressing" indices, and even less is known when it comes to cope with approximate search.  ...  Our main contribution is an analytical study of the space-time trade-o s for indexed text searching. We study the space overhead and retrieval times as functions of the block size.  ...  Conclusions and Future Work We focused on the problem of block addressing for approximate word retrieving indices.  ... 
doi:10.1145/266714.266719 dblp:conf/cikm/Baeza-YatesN97 fatcat:f5kzuqtcmrfjtnyjt2ey6lntbm

Block addressing indices for approximate text retrieval

Ricardo Baeza-Yates, Gonzalo Navarro
2000 Journal of the American Society for Information Science  
Despite their existence, little is known about the expected behavior of these \block addressing" indices, and even less is known when it comes to cope with approximate search.  ...  Our main contribution is an analytical study of the space-time trade-o s for indexed text searching. We study the space overhead and retrieval times as functions of the block size.  ...  Conclusions and Future Work We focused on the problem of block addressing for approximate word retrieving indices.  ... 
doi:10.1002/(sici)1097-4571(2000)51:1<69::aid-asi10>3.0.co;2-c fatcat:cvswmhcyenalblmmt2i65e6yea

The Sequoia 2000 Electronic Repository

Ray R. Larson, Christian Plaunt, Allison Woodruff, Marti A. Hearst
1995 Digital technical journal of Digital Equipment Corporation  
The highest resulting sums indicate which documents should be retrieved.  ...  We chose this approach rather than adopting a separate retrieval system for full-text indexing and retrieval for the following reasons: 1.  ... 
dblp:journals/dtj/LarsonPWH95 fatcat:oe5jsg3ywfckbb6tcy6emv2kle

Approximate textual retrieval [article]

Pere Constans
2007 arXiv   pre-print
An approximate textual retrieval algorithm for searching sources with high levels of defects is presented.  ...  This procedure reduces the probability of missed occurrences due to source defects, yet diminishes the retrieval of irrelevant, non-contextual occurrences.  ...  APPROXIMATE TEXTUAL RETRIEVAL ALGORITHM Let T be a text document constituted by a sequence t 1 t 2 ... of words, which, in turn, are sequences of characters over an alphabet Σ.  ... 
arXiv:0705.0751v1 fatcat:joef4npj3vc6hnto5rfbpewofa

A Generic Genetic Algorithm to Automate an Attack on Classical Ciphers

Anukriti Dureha, Arashdeep Kaur
2013 International Journal of Computer Applications  
The algorithm proposed in this paper aspires to address such issues.  ...  While numerous algorithms have been proposed to automate this process for variegated ciphers, these approaches are yet isolated from each other.  ...  Whereas, for cipher-text length=197, 64.97% of correct letters were retrieved with key-length=6, while 80.71% of correct letters were retrieved with key- length=19.These results indicate that if the  ... 
doi:10.5120/10687-5588 fatcat:pngoquvegfgqnjoogmin4aeyha

Subtopic structuring for full-length document access

Marti A. Hearst, Christian Plaunt
1993 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93  
that comprise the text.  ...  describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval  ...  Acknowledgments The authors would like to thank Bill Cooper and Ray Larson for their help and encouragement, and David Lewis and two anonymous reviewers for their suggestions of improvements to this paper  ... 
doi:10.1145/160688.160695 dblp:conf/sigir/HearstP93 fatcat:4skenl335jdrfgdvgpousbguta

Real-time information retrieval from Identity cards [article]

Niloofar Tavakolian, Azadeh Nazemi, Donal Fitzpatrick
2020 arXiv   pre-print
Information is frequently retrieved from valid personal ID cards by the authorised organisation to address different purposes.  ...  The experimental results of this research prove that utilising the methods based on deep learning, such as Efficient and Accurate Scene Text (EAST) detector and Deep Neural Network (DNN) for face detection  ...  The following block diagram (Figure 1 ) illustrates the overview of ID information retrieval. Fig. 1 : The overview block diagram of information retrieval from an identity passport page II.  ... 
arXiv:2003.12103v1 fatcat:vdtuvju7nzgqvj3ihyyunsrx44

Block-LDA: Jointly modeling entity-annotated text and entity-entity links [chapter]

Ramnath Balasubramanyan, William W. Cohen
2011 Proceedings of the 2011 SIAM International Conference on Data Mining  
For the article retrieval task, the model trained with the text + MIPS resulted in the higher mean precision@10 whereas for the protein retrieval task, the text + Wetlab PPI dataset returned a higher mean  ...  This indicates that the latent block structure in the links is beneficial while shaping latent topics from text. Table 13.1.  ... 
doi:10.1137/1.9781611972818.39 dblp:conf/sdm/BalasubramanyanC11 fatcat:3punzsvsbbcdfnpwd6m6exvnim

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound [article]

Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
2022 arXiv   pre-print
We introduce an audiovisual method for long-range text-to-video retrieval.  ...  Unlike previous approaches designed for short video retrieval (e.g., 5-15 seconds in duration), our approach aims to retrieve minute-long videos that capture complex human actions.  ...  Our results indicate that video retrieval performance decreases when we use fewer audiovisual attention blocks.  ... 
arXiv:2204.02874v3 fatcat:rxkjg5r22zgxbp24kuvqq2lvfi

Web2Text: Deep Structured Boilerplate Removal [article]

Thijs Vogels, Octavian-Eugen Ganea, Carsten Eickhoff
2018 arXiv   pre-print
To address this issue, we introduce a novel model that performs sequence labeling to collectively classify all text blocks in an HTML page as either boilerplate or main content.  ...  Web pages are a valuable source of information for many natural language processing and information retrieval tasks.  ...  In total, we collect 128 features for each text block, e.g.  ... 
arXiv:1801.02607v3 fatcat:wha5oi5hubcurnpddtggzegziy

Decipherment of Substitution Cipher using Enhanced Probability Distribution

Apparao Naidu G, Bhadri MSVS Raju, Vishnu Vardhan B, Pratap Reddy L
2010 International Journal of Computer Applications  
The retrieved efficiency of cipher text only attack on samples of English, Hindi Telugu, Kannada is presented in this paper.  ...  However the amount of confusion and diffusion in terms of statistical distribution parameters between message and cipher text is a point of interest for cryptanalyst.  ...  A decipherment model is proposed for retrieving the plain text from cipher text using the above knowledge.  ... 
doi:10.5120/958-1335 fatcat:2cslaiz62nekfewa24dgtahyp4

Research on Search Method Based on Data Segmentation of Related Attributes

Zhong-wen QIAN, Jian-son ZHANG, Xiang WU, Xiao-ming JU
2018 DEStech Transactions on Computer Science and Engineering  
Combining the text associated attribute word set and the text word vector space model, the keyword sets which are required for constructing the text index is selected.  ...  Experimental analysis shows that the search method constructed in this paper can achieve effective text search and restoration, and is suitable for text storage and search in complex cloud environments  ...  The flag of MF indicates block status, MF=0, indicating that there are still blocks afterwards, MF=1 indicates tail block, and merge integrity constraint [15] .  ... 
doi:10.12783/dtcse/ccnt2018/24678 fatcat:5xw4gopgczdu3kybxwgkqhoie4

Title extraction from bodies of HTML documents and its application to web page retrieval

Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Shuming Shi, Yunbo Cao, Hang Li
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
As application, we consider web page retrieval. We use the TREC Web Track data for evaluation. We propose a new method for HTML documents retrieval using extracted titles.  ...  In this paper, we take a supervised machine learning approach to address the problem. We propose a specification on HTML titles.  ...  ACKNOWLEDGMENTS We thank Dmitriy Meyerzon, Ming Zhou, and Wei-Ying Ma for their encouragements and supports.  ... 
doi:10.1145/1076034.1076079 dblp:conf/sigir/HuXSHSCL05 fatcat:d5o32sdrlfcgln3d23hhozw2ca

Approximate String Matching [chapter]

Gonzalo Navarro
2014 Encyclopedia of Algorithms  
text: 11, 19 words: 33, 40 "l" "m" "t" "w" Two-level Text Retrieval: Block addressed inverted files Idea used in PIRS (Personal Information Retrieval System) [Wu and Manber 1993] The  ...  text is divided in 256 blocks of the same size An inverted file of all the different words of the text is built Each entry indicates only the blocks where the word appears 1 byte per block First  ... 
doi:10.1007/978-3-642-27848-8_363-2 fatcat:ygysi27jpvga5bk2ju6cgbgqfa

Document warehousing based on a multimedia database system

H. Ishikawa, K. Kubota, Y. Noguchi, K. Kato, M. Ono, N. Yoshizawa, Y. Kanemasa
1999 Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)  
So we need a document warehouse as a software framework where multimedia documents are analyzed and managed for corporate-wide information sharing and reuse like a data warehouse for structured data.  ...  Further, unstructured data such as emails, html texts, images, videos, and oftIce documents are increasingly accumulated in personal computer storage due to spread of mailing, Www, and word processing.  ...  We prefer the recall ratio to the precision ration of keyword-based retrieval. Relatively-addressed links (i.e., URL) are transformed to absolutely-addressed links.  ... 
doi:10.1109/icde.1999.754921 dblp:conf/icde/IshikawaKNKOYK99 fatcat:zfqpfjc65zadtnlacvz7ivp5pm
« Previous Showing results 1 — 15 out of 70,504 results