Filters








28,983 Hits in 8.7 sec

Chinese word segmentation and its effect on information retrieval

Schubert Foo, Hui Li
2004 Information Processing & Management  
A set of IR experiments was carried out to study the impact of Chinese word segmentation and its effect on information retrieval (IR) at the Division of Information Studies, Nanyang Technological University  ...  Character-based (Single Character) Word-based (Longest Match ) Word-based Character-based (Single Character, Bigram) Word-based (Longest Match / Statistic-based )  ...  Introduction Research interest in Chinese information retrieval (CIR) has increased as a result of the large growth rate of online Chinese literature.  ... 
doi:10.1016/s0306-4573(02)00079-1 fatcat:wyt5ii2wcfgobcbhnc3sygfxmu

On the use of words and n-grams for Chinese information retrieval

Jian-Yun Nie, Jiangfeng Gao, Jian Zhang, Ming Zhou
2000 Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00  
In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes.  ...  Words and n-grams have been used as indexes in several previous studies, which showed that both kinds of indexes lead to comparable IR performances.  ...  Introduction It is now well known that the major difference between Chinese information retrieval (IR) and IR in European languages lies in the absence of word boundaries in sentences.  ... 
doi:10.1145/355214.355235 dblp:conf/iral/NieGZZ00 fatcat:ln25xi4ngbdopo4iqsuvpqxf5u

Chinese information extraction and retrieval

Sean Boisen, Mary Ellen Okurowski, Michael Crystal, Erik Peterson, Ralph Weischedel, John Broglio, Jamie Callan, Bruce Croft, Theresa Hand, Thomas Keenan
1996 Proceedings of a workshop on held at Vienna, Virginia May 6-8, 1996 -  
This paper provides a summary of the following topics: I. what was learned from porting the INQUERY information retrieval engine and the INFINDER term finder to Chinese 2. experiments at the University  ...  of Massachusetts evaluating INQUERY performance on Chinese newswire (Xinhua), 3. what was learned from porting selected components of PLUM to Chinese 4. experiments evaluating the POST part of speech  ...  transliterated into the same character set as those used for common words.  ... 
doi:10.3115/1119018.1119047 dblp:conf/tipster/BoisenCPWBCCHKO96 fatcat:krshjrabcfgqfp6own6rew2vue

Comparing representations in Chinese information retrieval

K. L. Kwok
1997 SIGIR Forum  
Three representation methods are empirically investigated for Chinese information retrieval: 1-gram (single character), bigram (two contiguous overlapping characters), and short-word indexing based on  ...  The retrieval collection is the approximately 170 MB TREC-5 Chinese corpus of news articles, and 28 queries that are long and rich in wordings.  ...  Introduction While information retrieval (IR) in English has over thirty years of history, IR in Chinese is relatively recent.  ... 
doi:10.1145/278459.258531 fatcat:nn4ykcejbnbnxcl42mqubwqzfu

Comparing representations in Chinese information retrieval

K. L. Kwok
1997 Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '97  
Three representation methods are empirically investigated for Chinese information retrieval: 1-gram (single character), bigram (two contiguous overlapping characters), and short-word indexing based on  ...  The retrieval collection is the approximately 170 MB TREC-5 Chinese corpus of news articles, and 28 queries that are long and rich in wordings.  ...  Introduction While information retrieval (IR) in English has over thirty years of history, IR in Chinese is relatively recent.  ... 
doi:10.1145/258525.258531 dblp:conf/sigir/Kwok97 fatcat:g44yl7qpz5hphjpwalrnthquxm

A Hybrid Chinese Information Retrieval Model [chapter]

Zhihan Li, Yue Xu, Shlomo Geva
2010 Lecture Notes in Computer Science  
A distinctive feature of Chinese test is that a Chinese document is a sequence of Chinese with no space or boundary between Chinese words.  ...  This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since  ...  For Chinese information retrieval, the query is usually a set of Chinese words rather than a sequence of Chinese characters.  ... 
doi:10.1007/978-3-642-15470-6_28 fatcat:secubjvn6vaa3kkup5koezppb4

Lexicon Effects on Chinese Information Retrieval

K. L. Kwok
1997 Conference on Empirical Methods in Natural Language Processing  
We investigate the effects of lexicon size and stopwords on Chinese information retrieval using our method of short-word segmentation based on simple language usage rules and statistics.  ...  These rules allow us to employ a small lexicon of only 2,175 entries and provide quite admirable retrieval results. It is noticed that accurate segmentation is not essential for good retrieval.  ...  Introduction It is well known that a sentence in Chinese (or several other oriental languages) consists of a continuous string of 'characters' without delimiting white spaces to identify words.  ... 
dblp:conf/emnlp/Kwok97 fatcat:7celyfp63fcxrgqewnud7yn6bq

Discovering Chinese words from unsegmented text (poster abstract)

Xianping Ge, Wanda Pratt, Padhraic Smyth
1999 Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99  
Thus, effective information retrieval of Chinese text first requires good word segmentation.  ...  Using the probabilities of the words, word segmentation is done according to the maximum likelihood principle.  ...  [l] reviews previous works on Chinese word segmentation and studies the effect on information retrieval.  ... 
doi:10.1145/312624.313472 dblp:conf/sigir/GePS99 fatcat:bffzfr3lezgfpczakpdshmkwba

Suffix Tree Based Approach for Chinese Information Retrieval

Jin Hu Huang, David Powers
2008 2008 Eighth International Conference on Intelligent Systems Design and Applications  
These studies show that using either words or n-grams leads to comparable performances. Higher word segmentation accuracy does not necessarily result in better retrieval performance.  ...  The absence of word boundaries in Chinese language makes Chinese information retrieval(IR) different to European IR.  ...  Conclusions This paper proposes a suffix tree based approach for Chinese information retrieval. It uses n-gram as indexes without word segmentation.  ... 
doi:10.1109/isda.2008.365 dblp:conf/isda/HuangP08a fatcat:ufkyvyl7cbffzbw3isxnf2nhlu

PAT-tree-based keyword extraction for Chinese information retrieval

Lee-Feng Chien
1997 Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '97  
Many Chinese language processing applications therefore step ahead from character level to word/phrase level,  ...  urgent need to promote Chinese in this paper we will raise the significance of keyword extraction using a new PAT-treebased approach, which is efficient in automatic keyword extraction from a set of relevant  ...  With the first kind of method, it ignores the concept of words and uses character-level information to replace word-level information in the construction of IR 'systems.  ... 
doi:10.1145/258525.258534 dblp:conf/sigir/ChienHC97 fatcat:xximvzv7n5ba3nnxh444m6bati

Chinese Information Retrieval Based on Terms and Ontology

Lingpeng Yang, Donghong Ji, Li Tang
2004 NTCIR Conference on Evaluation of Information Access Technologies  
In this paper, we describe our approach for single language information retrieval (SLIR) on Chinese language of NTCIR4 tasks.  ...  Finally, we use long terms to reorder the top N retrieved documents. Experiments show that the method achieves good results for both T-run and D-Run SLIR tasks of Chinese language.  ...  Chinese Character, bi-gram, n-gram (n>2) and word are the most widely used indexing units. The effectiveness of single Chinese Characters as indexing units has been reported in [7] .  ... 
dblp:conf/ntcir/YangJT04 fatcat:46nbj6zj3badnfpqbdfak4i32q

Multilingual Information Retrieval Using English and Chinese Queries [chapter]

Aitao Chen
2002 Lecture Notes in Computer Science  
Our interests in these tasks are to test the utility of applying Chinese word segmentation algorithms to German decompounding, to experiment with techniques for combining translations from diverse resources  ...  , and to experiment with different approaches to multilingual retrieval.  ...  was supported by DARPA (Department of Defense Advanced Research Projects Agency) under research contract N66001-97-8541; AO# F477: Search Support for Unfamiliar Metadata Vocabularies within the DARPA Information  ... 
doi:10.1007/3-540-45691-0_4 fatcat:e5bkqvm3knec3eiceoex45bw5e

Modeling Variable Dependencies between Characters in Chinese Information Retrieval [chapter]

Lixin Shi, Jian-Yun Nie
2010 Lecture Notes in Computer Science  
Chinese IR can work on words and/or character n-grams. In previous studies, when several types of index are used, independence is usually assumed between them, which obviously is not true in reality.  ...  The results confirm the necessity to integrate dependent pairs of characters in Chinese IR and to use them according to their possible contribution to IR.  ...  Introduction A crucial problem in Chinese Information Retrieval (IR) is to determine the appropriate elements to serve as index.  ... 
doi:10.1007/978-3-642-17187-1_51 fatcat:3vvywzvdmne7tcqibbu7lrlkym

PAT-tree-based keyword extraction for Chinese information retrieval

Lee-Feng Chien
1997 SIGIR Forum  
Many Chinese language processing applications therefore step ahead from character level to word/phrase level,  ...  urgent need to promote Chinese in this paper we will raise the significance of keyword extraction using a new PAT-treebased approach, which is efficient in automatic keyword extraction from a set of relevant  ...  With the first kind of method, it ignores the concept of words and uses character-level information to replace word-level information in the construction of IR 'systems.  ... 
doi:10.1145/278459.258534 fatcat:5yu7cho2jjhvdmqt3sramwly6a

Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Apporach

Md Maruf Hasan, Yuji Matsumoto
2000 International Journal of Computational Linguistics and Chinese Language Processing  
In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information.  ...  Due to the ideographic nature of Japanese and Chinese, complicated with the existence of several encoding standards in use, efficient processing (representation, indexing, retrieval, etc.) of such information  ...  Their tools helped us to speed up our research. Thanks to Dr. Akira Maeda for allowing us to use his correlation calculation tool and Dr. Michael Berry for the LSI++ and SVDPACK packages.  ... 
dblp:journals/ijclclp/HasanM00 fatcat:vjj645dn3bdcvicuayagxv2ybm
« Previous Showing results 1 — 15 out of 28,983 results