Filters








7,097 Hits in 5.0 sec

Character N-Grams Translation in Cross-Language Information Retrieval [chapter]

Jesús Vilares, Michael P. Oakes, Manuel Vilares
Lecture Notes in Computer Science  
This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems.  ...  Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.  ...  Conclusions and Future Work This paper describes a system for character n-gram-level alignment in a parallel corpus and its use for direct translation of character n-grams in Cross-Language Information  ... 
doi:10.1007/978-3-540-73351-5_19 fatcat:oa2o6ezt5fdurftydog7vmcmmu

Translating pieces of words

Paul McNamee, James Mayfield
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
Translation for cross-language information retrieval need not be word-based. We show that character n-grams in one language can be 'translated' into character n-grams of another language.  ...  We demonstrate that such translations produce retrieval results on par with, and often exceeding, those of word-based and stem-based translation.  ...  In this study we use character n-grams.  ... 
doi:10.1145/1076034.1076169 dblp:conf/sigir/McNameeM05 fatcat:znoqeng6jrfitgnhduva4uzaey

Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval [chapter]

Jaap Kamps, Christof Monz, Maarten de Rijke, Börkur Sigurbjörnsson
2004 Lecture Notes in Computer Science  
n-gramming.  ...  We look at four different cross-lingual information retrieval tasks: monolingual, bilingual, multilingual, and domain-specific retrieval.  ...  Character n-grams are an old technique for improving retrieval effectiveness. An excellent overview of n-gramming techniques for cross-lingual information retrieval is given in [13] .  ... 
doi:10.1007/978-3-540-30222-3_14 fatcat:cte52lxclndylkv7hk7g2mtvai

Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval [chapter]

Jaap Kamps, Sisay Fissaha Adafre, Maarten de Rijke
2005 Lecture Notes in Computer Science  
Our approach to cross-lingual document retrieval starts from the assumption that effective monolingual retrieval is at the core of any cross-language retrieval system.  ...  Finally, effective translation methods for translating queries or documents turn a monolingual retrieval system into a cross-lingual retrieval system proper.  ...  We retain the original compound words, and add their parts to the documents; queries are processed similarly. n-Gramming For all languages, we used character n-gramming to index all character-sequences  ... 
doi:10.1007/11519645_12 fatcat:ufe26nid4fg6fax52qim2fpc6a

Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation

IN-SU KANG, SEUNG-HOON NA, JONG-HYEOK LEE
2006 International Journal of Computer Processing Of Languages  
For cross-language information retrieval, we attempt a dictionary-based bi-directional combination of query translation and document translation.  ...  In combining words and n-grams, we concentrate on generating several ranked lists showing different retrieval characteristics on word and n-gram indexes by incorporating feedback schemes.  ...  Monolingual Information Retrieval Coupling Words and N-grams Table 1 shows various stages of coupling words and n-grams in a retrieval system.  ... 
doi:10.1142/s0219427906001463 fatcat:5h5ql5e6xbdw5ax7oukc4fy5g4

The University of Amsterdam at the CLEF Cross Language Speech Retrieval Track 2007

Bouke Huurnink
2007 Conference and Labs of the Evaluation Forum  
In this paper we present the contents of the University of Amsterdam submission in the CLEF Cross Language Speech Retrieval 2007 English task.  ...  We describe the effects of using character n-grams and field combinations on both monolingual English retrieval, and crosslingual Dutch to English retrieval.  ...  Character n-Gram Experiments Character n-gram tokenisation has been shown to boost retrieval in certain situations [2] , such as retrieval from English newspapers [1] .  ... 
dblp:conf/clef/Huurnink07 fatcat:squwhc46b5c3bf3knfbgd7txme

Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Apporach

Md Maruf Hasan, Yuji Matsumoto
2000 International Journal of Computational Linguistics and Chinese Language Processing  
In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information.  ...  We report the results of mono-and cross-language information retrieval on a Kanji space where documents and queries are represented in terms of Kanji oriented vectors.  ...  Comparison of experimental results in monolingual IR using single character indexing, n-gram character indexing and (segmented) word indexing in Chinese information retrieval is reported in [19, 30, 31  ... 
dblp:journals/ijclclp/HasanM00 fatcat:vjj645dn3bdcvicuayagxv2ybm

Clairvoyance CLEF-2003 Experiments [chapter]

Yan Qu, Gregory Grefenstette, David A. Evans
2004 Lecture Notes in Computer Science  
In CLEF 2003, Clairvoyance participated in the bilingual retrieval track with the German and Italian language pair.  ...  The translated Italian topics and the document collections were indexed using three different kinds of units: (1) linguistically meaningful units, (2) character 6-grams, and (3) a combination of 1 and  ...  CLARIT Cross-Language Information Retrieval In CLEF 2003, we adopted query translation as the means for bridging the language gap between the query language and the document language for cross-language  ... 
doi:10.1007/978-3-540-30222-3_22 fatcat:lvc6ror3wbexjjayzn5gmkhf7a

Statistical transliteration for english-arabic cross language information retrieval

Nasreen AbdulJaleel, Leah S. Larkey
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
Out of vocabulary (OOV) words are problematic for cross language information retrieval.  ...  We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage  ...  INTRODUCTION Out of vocabulary (OOV) words are a common source of errors in cross language information retrieval (CLIR).  ... 
doi:10.1145/956888.956890 fatcat:3djbh6tqk5fenmump7vrisabju

Statistical transliteration for english-arabic cross language information retrieval

Nasreen AbdulJaleel, Leah S. Larkey
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
Out of vocabulary (OOV) words are problematic for cross language information retrieval.  ...  We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage  ...  INTRODUCTION Out of vocabulary (OOV) words are a common source of errors in cross language information retrieval (CLIR).  ... 
doi:10.1145/956863.956890 dblp:conf/cikm/AbdulJaleelL03 fatcat:apleye7mdvglrni3ttxwldcbgy

Experiments in the Retrieval of Unsegmented Japanese Text at the NTCIR-2 Workshop

Paul McNamee
2001 NTCIR Conference on Evaluation of Information Access Technologies  
Our work with the Hopkins Automated Information Retriever for Combing Unstructured Text (HAIRCUT) system has made use of overlapping character n-grams in the indexing and retrieval of text.  ...  This paper describes results in monolingual Japanese and English retrieval and in cross-language retrieval using each language as a source language for the other.  ...  In later work using the BMIR-J2 collection [11] , they investigated 'character-class' n-grams, where certain n-grams are ignored, in particular, n-grams containing hiragana were discarded.  ... 
dblp:conf/ntcir/McNamee01 fatcat:c6qaboh64ncdfgc7phhvh4v3dm

Integrated Information Access Technology for Digital Libraries: Access across Languages, Periods, and Cultures [chapter]

Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, Fuminori Kimura, Akira Mae
2011 Digital Libraries - Methods and Applications  
We divide the archaic sentences into N-grams and treat those N-grams as archaic words. An N-gram is a sequence of N characters from a given string.  ...  We repeat this shifting-and-extracting process until the N th character in the N-gram is the last character of the target string.  ... 
doi:10.5772/14432 fatcat:7rmdya33x5dahea7xshudpggs4

Mandarin–English Information (MEI): investigating translingual speech retrieval

Helen M. Meng, Berlin Chen, Sanjeev Khudanpur, Gina-Anne Levow, Wai-Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin-min Wang, Jianqiang Wang
2004 Computer Speech and Language  
Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval.  ...  This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese  ...  Acknowledgements The MEI project was conducted during the Johns Hopkins University Summer Workshop 2000 (an NSF Workshop). 16 We thank Erika Grams from Advanced Analytic Tools for her active participation  ... 
doi:10.1016/j.csl.2003.09.003 fatcat:ixr4gkjaqrhs7b7gkrabq7p3ca

Exploring New Languages with HAIRCUT at CLEF 2005 [chapter]

Paul McNamee
2006 Lecture Notes in Computer Science  
JHU/APL has long espoused the use of language-neutral methods for cross-language information retrieval.  ...  We found that character n-grams remain an attractive option for representing documents and queries in these new languages.  ...  In addition to the use of character n-gram tokenization we make use of a statistical language model of retrieval and combination of evidence from multiple retrievals.  ... 
doi:10.1007/11878773_17 fatcat:ugsqm4uofzbo3kixvk2pswsszy

Scalable Multilingual Information Access [chapter]

Paul McNamee, James Mayfield
2003 Lecture Notes in Computer Science  
In particular, we investigate the use of character n-grams for monolingual retrieval, pre-translation expansion as a technique to mitigate errors due to limited translation resources, and translation of  ...  The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a uniform set of topics and assessment methodology  ...  Figure 1 . 1 Comparing words and character n-grams (n=6) by language. .  ... 
doi:10.1007/978-3-540-45237-9_17 fatcat:6wihlivr3jf4zkcx4u3puphjey
« Previous Showing results 1 — 15 out of 7,097 results