213 Hits in 7.9 sec

Using Unigram and Bigram Language Models for Monolingual and Cross-Language IR

Lixin Shi, Jian-Yun Nie
2007 NTCIR Conference on Evaluation of Information Access Technologies  
For Japanese and Korean IR, bigrams or a combination of bigrams and unigrams produce the highest effectiveness.  ...  In this paper, we compare the utilization of words and n-grams for both monolingual and cross-lingual IR in these languages.  ...  A similar approach has been used in [5] for CLIR between European languages, in which s j and t i are words.  ... 
dblp:conf/ntcir/ShiN07 fatcat:zma4thgsyvg5tesyvvywxmxuxm

Comparing Different Units for Query Translation in Chinese Cross-Language Information Retrieval

Lixin Shi, Jian-Yun Nie, Jing Bai
2007 Proceedings of the 2nd International ICST Conference on Scalable Information Systems  
For cross-language IR with Chinese, word translation has been used in all previous studies. In this paper, we re-examine the use of n-grams and words for monolingual Chinese IR.  ...  For CLIR with Chinese, we investigate the possibility of using bigrams and unigrams as translation units.  ...  A similar approach has been used in [14] for CLIR between European languages, in which s j and t i are words. For CLIR with Chinese (as the target language), t i can either be words or n-grams.  ... 
doi:10.4108/infoscale.2007.932 dblp:conf/infoscale/ShiNB07 fatcat:54uhgulbc5f77azeeyxt73ejq4

Statistical and Comparative Evaluation of Various Indexing and Search Models [chapter]

Samir Abdou, Jacques Savoy
2006 Lecture Notes in Computer Science  
for traditional Chinese.  ...  While no clear conclusion was reached for the Japanese language, the bigram-based indexing strategy seems to be the best choice for Korean, and the combined "unigram & bigram" indexing strategy is best  ...  For the Japanese language, we defined a stopword list of 30 words and another of 20 bigrams, and for Korean our stoplist was composed of 91 bigrams and 85 words.  ... 
doi:10.1007/11880592_28 fatcat:luem5wtnazhbhnxnyzarvx5lbi

Comparative study of monolingual and multilingual search models for use with asian languages

Jacques Savoy
2005 ACM Transactions on Asian Language Information Processing  
the Chinese, Japanese, Korean, and English languages.  ...  Our second goal is to analyze the relative merits of the various automated and freely available toolsto translate the English-language topics into Chinese, Japanese, or Korean, and then submit the resultant  ...  For the Chinese language, for example, we defined and removed a list of the 215 most frequent bigrams; for Japanese, 105 bigrams; and for Korean, 80 bigrams.  ... 
doi:10.1145/1105696.1105701 fatcat:zpiscx4znzgyvfyzqqsxxckniq

Automatic thesaurus for enhanced Chinese text retrieval

Schubert Foo, Siu Cheung Hui, Hong Koon Lim, Li Hui
2000 Library Review  
Asian languages like Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain.  ...  This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user's queries to enhance retrieval effectiveness.  ...  Thus, unlike the English and other European language, identifying words out of the Chinese text becomes a difficult task.  ... 
doi:10.1108/00242530010331754 fatcat:6ssoqsuyibe5rmfoc6bmaddzbq

Report on CLIR Task for the NTCIR-5 Evaluation Campaign

Samir Abdou, Jacques Savoy
2005 NTCIR Conference on Evaluation of Information Access Technologies  
Our participation is motivated by four objectives: 1) study the retrieval performances of various IR models for these languages; 2) compare the relative retrieval effectiveness of bigram and automatic  ...  wordsegmenting approaches for Chinese and Japanese languages; 3) propose a new blind-query expansion hopefully capable of improving mean average precision; and 4) evaluate the relative performance of the  ...  For the Chinese and Japanese languages we used both the bigram and an automatic word segmentation approach.  ... 
dblp:conf/ntcir/AbdouS05 fatcat:deeskicgkrcpbl7gbnsa7il5om

On the use of words and n-grams for Chinese information retrieval

Jian-Yun Nie, Jiangfeng Gao, Jian Zhang, Ming Zhou
2000 Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00  
In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes.  ...  Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances.  ...  A closer comparison between Chinese IR and IR in European languages is possible.  ... 
doi:10.1145/355214.355235 dblp:conf/iral/NieGZZ00 fatcat:ln25xi4ngbdopo4iqsuvpqxf5u

Report on CLIR Task for the NTCIR-4 Evaluation Campaign

Jacques Savoy
2004 NTCIR Conference on Evaluation of Information Access Technologies  
freely available translation tools used to translate English-language topics into Chinese, Japanese or Korean; and 3) to evaluate the relative performance of the various merging strategies used to combine  ...  Our project has three objectives: 1) to compare the retrieval performances of eleven IR models used to carry out monolingual retrievals with these languages; 2) to analyze the relative merit of various  ...  Acknowledgments This research was supported by the Swiss NSF (Grants #21-66 742.01 and #20-103420/1).  ... 
dblp:conf/ntcir/Savoy04 fatcat:hbbre7bk6fd25d6unvylyapq6a

Uniform Indexing and Retrieval Scheme for Chinese, Japanese, and Korean

Da-Wei Juang, Yuen-Hsien Tseng
2002 NTCIR Conference on Evaluation of Information Access Technologies  
Based on the n-gram indexing model and a phrase formulation method to extract longer key terms for indexing, no language-dependent modifications were made to apply the system to Japanese and Korean IR.  ...  A Chinese IR system is applied to all document sets in these three languages.  ...  Conclusions An IR system designed for Chinese text retrieval is applied to Japanese and Korean IR task without modifications.  ... 
dblp:conf/ntcir/JuangT02 fatcat:l2gx3chuanclnc5cmiqcwiz4d4

RALI Experiments in IR4QA at NTCIR-7

Lixin Shi, Jian-Yun Nie, Guihong Cao
2008 NTCIR Conference on Evaluation of Information Access Technologies  
In particular, Wikipedia will be exploited for identifying personal names and their translation, as well as biography-related keywords.  ...  To build an IR system for Chinese, the next step is to choose the index unit. Different from most European languages, there is no natural word boundary in Chinese texts.  ...  In Indri retrieval system, we combine the original query term and bigrams as follows: where t 1 , … t n are the segmented words; 1.0 and w bi are the respective weight set for words and bigrams.  ... 
dblp:conf/ntcir/ShiNC08 fatcat:26im3fpqtvh5reia63hkfrhygu

Vietnamese Text Retrieval: Test Collection and First Experimentations

Bao-Quoc Ho
2007 NTCIR Conference on Evaluation of Information Access Technologies  
Our experiments have shown how different types of Vietnamese index terms: "ti ng", words, compound words, combination of word and compound word contribute to Vietnamese text processing and retrieval.  ...  We also introduce our Vietnamese test collection on which experimentations have been done and report the method used to construct this test collection.  ...  We are continuing to construct our Vietnamese test collection by adding more topics and modifying the relevance assessments.  ... 
dblp:conf/ntcir/Ho07 fatcat:lo7z5i3jzzdvjbxmzfxjs5cr3m

TREC-9 CLIR Experiments at MSRCN

Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun, Yi Su, Ming Zhou, Changning Huang
2000 Text Retrieval Conference  
Our work involved two aspects: finding good methods for Chinese IR, and finding effective translation means between English and Chinese.  ...  Our method incorporates three improvements over the simple lexicon-based translation: (1) word/term disambiguation using co-occurrence, (2) phrase detecting and translation using a statistical language  ...  Kwok for his helpful suggestions, and Aitao Chen for his comments on the paper.  ... 
dblp:conf/trec/GaoNZXSZH00 fatcat:c6msri7weffexby5zwobwrw4tm

Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News

Wai-Kit Lo, Helen M. Meng, P.C. Ching
2004 International Journal of Speech Technology  
Words are basic units in a language that carry lexical meaning and subword units (such as phonemes, syllables or characters) are building components for words.  ...  Multi-scale refers to the use of both words and subwords for retrieval.  ...  Hui and Y.C. Li for their assistance. 8  ... 
doi:10.1023/b:ijst.0000017020.53797.a0 fatcat:nlzdy2rjlbgkfakec6sy65sy2m

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

Wai-Kit Lo, Helen Meng, P. C. Ching
2003 ACM Transactions on Asian Language Information Processing  
Experimental results demonstrate that improvement in CL-SDR retrieval performance can be achieved by fusion of word and subword scales.  ...  In this work the extended HMM-based retrieval model has been applied to an English-Mandarin CL-SDR task, which is to search the Mandarin spoken document collection with English queries at word and subword  ...  These include the early work in European languages (such as English, German, Spanish etc.)  ... 
doi:10.1145/964161.964162 fatcat:meyid3zxrjglvpf3zm4nn3u2wu

Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche

Christophe Coupé, Yoon Oh, Dan Dediu, François Pellegrino
2019 Science Advances  
These findings highlight the intimate feedback loops between languages' structural properties and their speakers' neurocognition and biology under communicative pressures.  ...  We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties  ...  Blasi for suggestions and feedback on the statistical analysis and on previous versions of this paper. We also thank E. Castelli for help with collecting the Vietnamese data.  ... 
doi:10.1126/sciadv.aaw2594 pmid:32047854 pmcid:PMC6984970 fatcat:mempum22l5h6lkl7jl436wt6om
« Previous Showing results 1 — 15 out of 213 results