A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Data Fusion for Japanese Term and Character N-gram Search
2015
Proceedings of the 20th Australasian Document Computing Symposium on ZZZ - ADCS '15
In this study, we explore data fusion techniques to answer the following question: if there are multiple ranked lists of documents from both word and n-gram indexes, can we improve overall effectiveness ...
The alternative approach to indexing a segmented collection is n-gram search, where every n-length sequence of symbols is indexed. ...
The minimum and maximum length of index terms is 1 and 20 for both cases. The number of unique terms in the 1gram, 2-gram and 3-gram indexes for NTCIR7 are 5,431, 860,181 and 8,225,898. ...
doi:10.1145/2838931.2838939
dblp:conf/adcs/YasukawaCS15
fatcat:a3bdwzns2rh45brzojnzvoiysu
RICOH at NTCIR-2
2001
NTCIR Conference on Evaluation of Information Access Technologies
The system features (1) hybrid retrieval using a combination of n-gram indexing and wordbased document ranking (2) word-based and ngram-based query expansion (3) a modi ed version of the Okapi's probabilistic ...
Of the eight runs, four runs use the title eld only and the other four use the description eld only. ...
We tried both word-based expansion and n-gram-based expansion. ...
dblp:conf/ntcir/OgawaM01
fatcat:hxk4sdggizhalc5zuhykhv5auu
Character n-Gram Spotting in Document Images
2011
2011 International Conference on Document Analysis and Recognition
In the retrieval phase, the query word is expanded to its constituent n-grams, which are used to query the previously built index. ...
The character n-grams are represented in a visual-feature space and indexed for quick retrieval. ...
Thus, character n-gram spotting encompasses both OCR and word-spotting approaches, and augments them by evidence from matching n-grams. ...
doi:10.1109/icdar.2011.191
dblp:conf/icdar/PraveenSJ11
fatcat:h546noqntzbwpl3fjncsoierym
Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop
2001
NTCIR Conference on Evaluation of Information Access Technologies
Our work with the Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system has made use of overlapping character n-grams in the indexing and retrieval of text. ...
We found that 6-grams performed comparably with English words and that 2-grams and 3-grams perform equally well in Japanese text. ...
Routinely, overlapping character n-grams and simple words are used as indexing terms. ...
dblp:conf/ntcir/McNamee01
fatcat:c6qaboh64ncdfgc7phhvh4v3dm
Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval
[chapter]
2005
Lecture Notes in Computer Science
Second, effective combination methods allow us to combine the best of different strategies. ...
Currently at Archives and Information Studies, Faculty of Humanities, University of Amsterdam. ...
For Finnish, Split+stem indicates that compounds are split, where we stem the words and compound parts. n-Grams: both topic and document words are n-grammed, using the settings discussed in Section 2. ...
doi:10.1007/11519645_12
fatcat:ufe26nid4fg6fax52qim2fpc6a
The HAIRCUT System at TREC-9
2000
Text Retrieval Conference
The result is a stream of blank-separated words. When using n-grams we construct indexing terms from the same sequence of words. ...
We did not segment the text, and instead elected to index the documents using both 2and 3-grams. ...
We remain open to the possibility that other techniques may be better stillfor example, using both 2-grams and 3-grams, or 2grams and segmented words. ...
dblp:conf/trec/McNameeMP00
fatcat:mkpayhdhknhgjejbq7ljkiv5tq
CoLesIR at CLEF 2007: from English to French via Character N-Grams
2007
Conference and Labs of the Evaluation Forum
As in their original proposal, our work is based on the direct translation of character n-grams, avoiding in this way the need for word normalization during indexing or translation, and also dealing with ...
Nevertheless, in contrast with the original approach, our proposal is much faster and transparent, making extensive use of freely available resources. ...
Acknowledgments This research has been partially funded by the European Union (FP6-045389), Ministerio de Educación y Ciencia and FEDER (TIN2004-07246-C03 and HUM2007-66607-C04), and Xunta de Galicia ( ...
dblp:conf/clef/VilaresOF07
fatcat:4y3hbk6iurbfng3czg5itspxie
Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation
2006
International Journal of Computer Processing Of Languages
In combining words and n-grams, we concentrate on generating several ranked lists showing different retrieval characteristics on word and n-gram indexes by incorporating feedback schemes. ...
For monolingual information retrieval, we use a combination strategy that integrates words and n-grams at the ranked list level. ...
For example, both words and n-grams are collected from documents to create a single index. ...
doi:10.1142/s0219427906001463
fatcat:5h5ql5e6xbdw5ax7oukc4fy5g4
Ternary Tree Optimalization for n-gram Indexing
2014
Databases, Texts, Specifications, Objects
N-gram indexing is used in many practical applications. Spam detection, plagiarism detection or comparison of DNA reads. ...
Efficiency of ternary forest is tested and compared to ternary search tree and two-level indexing ternary search tree. ...
The stored root index of n-gram tree is used to found node with index 3. Search is done again in the word tree with index 2 and the last node in the n-gram tree is found. ...
dblp:conf/dateso/RobenekPS14
fatcat:fqxmdkga3fetlp652bufdqrl3e
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
1993
Very Large Data Bases Conference
In this paper we describe how to use a compressed inverted file index to search such a lexicon for entries that match a pattern or partially specified term. ...
The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases. ...
Acknowledgmnents We would like to thank Abe Bookstcin, Andrew ~IIIIII~, Alan Kent, and Ihmi Klrin for thcbir advice and hc~lpl'ul discussion. ...
dblp:conf/vldb/ZobelMS93
fatcat:efaevb6opfaufmb3drzhhw335q
Scalable Multilingual Information Access
[chapter]
2003
Lecture Notes in Computer Science
In particular, we investigate the use of character n-grams for monolingual retrieval, pre-translation expansion as a technique to mitigate errors due to limited translation resources, and translation of ...
The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a uniform set of topics and assessment methodology ...
Methodology For the monolingual tasks we created sixteen indexes, a word and an n-gram (n=6) index for each of the eight languages. ...
doi:10.1007/978-3-540-45237-9_17
fatcat:6wihlivr3jf4zkcx4u3puphjey
Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval
[chapter]
2001
Lecture Notes in Computer Science
In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. ...
Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/Ngram-based approach, are used in the present work. ...
Subword-level Indexing Using The HMM/N-gram-based Model The retrieval results obtained when the HMM/N-gram-based retrieval approach was applied are shown in Table 3 . ...
doi:10.1007/3-540-45453-5_78
fatcat:kzd5sdmzzbfxdc5s5ifpeztnqu
On the use of words and n-grams for Chinese information retrieval
2000
Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00
Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances. ...
In this study, we carry out more experiments on different ways to segment documents and queries, and to combine words with n-grams. ...
Instead of using words, n-grams may also be used as indexes. One may use only bi-grams. ...
doi:10.1145/355214.355235
dblp:conf/iral/NieGZZ00
fatcat:ln25xi4ngbdopo4iqsuvpqxf5u
Single n-gram stemming
2003
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03
We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages. ...
Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. ...
Automatic selection of n-gram length and the use of two or more n-grams for some words are other potentially fruitful directions. ...
doi:10.1145/860500.860528
fatcat:nyqybzoftrevpmadj3s23hd53a
Single n-gram stemming
2003
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03
We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages. ...
Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. ...
Automatic selection of n-gram length and the use of two or more n-grams for some words are other potentially fruitful directions. ...
doi:10.1145/860435.860528
dblp:conf/sigir/MayfieldM03
fatcat:kimj6rjgwbajhl5gk4aovy54aq
« Previous
Showing results 1 — 15 out of 75,963 results