12,225 Hits in 4.6 sec

Chinese text retrieval without using a dictionary

Aitao Chen, Jianzhang He, Liangjie Xu, Fredric C. Gey, Jason Meggs
1997 Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '97  
It is generafly believed that words, rather than characters, should be the smallest indexing unit for Chinese text retrieval systems, and that it is essential to have a comprehensive Chinese dictionary  ...  Chinese text has no delimiters to mark woni boundaries. As a result, any text retrieval systems that build word-based indexes need to segment text into words.  ...  Acknowledgments A portion of this work was supported by grant NSF IRI-9630765 from the Database and Expert Systems program of the Computer and Information Science and Engineering Directorate of the National  ... 
doi:10.1145/258525.258532 dblp:conf/sigir/ChenHXGM97 fatcat:6qyd6kwixvcuhfoqvvkxcacdpe

Using the web for automated translation extraction in cross-language information retrieval

Ying Zhang, Phil Vines
2004 Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04  
The method can be applied to both Chinese-English and English-Chinese CLIR, correctly extracting translations of OOV terms from the Web automatically, and thus is a significant improvement on earlier work  ...  We use a method that extends earlier work in this area by augmenting this with statistical analysis, and corpus-based translation disambiguation to dynamically discover translations of OOV terms.  ...  In our first run we used the given Chinese queries without any of the Chinese equivalents of the English OOV terms (C-C), and used this to compare the performance of the translated English queries without  ... 
doi:10.1145/1008992.1009022 dblp:conf/sigir/ZhangV04 fatcat:zf7ngcjuibgkvalaxy72pifpcm

On Chinese text retrieval

Jian-Yun Nie, Martin Brisebois, Xiaobo Ren
1996 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '96  
Acknowledgment: Wc would like to thank Chris BuckIcy who gave us useful hinL for the adaptation of SMART to Chinese.  ...  This approach has been used in several experimental systems for both Chinese [6] and Japanese text retrieval [8, 17, 181.  ...  Finally, wc suggest that Chinese text retrieval should move further to include a thesaurus in order to cope with dle rich vocabulary of Chinese.  ... 
doi:10.1145/243199.243270 dblp:conf/sigir/NieBR96 fatcat:ufhbmb33nvfihcptqestd2bxxq

Research on Lucene-based English-Chinese Cross-Language Information Retrieval

Yuejie Zhang, Tao Zhang, Shijie Chen
2005 International Journal of Asian Language Processing  
On Chinese monolingual retrieval, we investigated the use of different entities as indexes and implement our retrieval system based on the Lucene toolkit.  ...  On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge resource to acquire correct translations.  ...  Dictionary-based approach is a popular word-based approach for text segmentation. In this approach, segmented texts are matched against a dictionary prior to being indexed.  ... 
dblp:journals/jclc/ZhangZC05 fatcat:362ieayuj5aqhdj5ivxepzrxy4

Automatic thesaurus for enhanced Chinese text retrieval

Schubert Foo, Siu Cheung Hui, Hong Koon Lim, Li Hui
2000 Library Review  
This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user's queries to enhance retrieval effectiveness.  ...  In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent.  ...  This segmentation process will be used for all Chinese text processing related fields such as machine translation, natural language processing and information retrieval.  ... 
doi:10.1108/00242530010331754 fatcat:6ssoqsuyibe5rmfoc6bmaddzbq

Detection and translation of OOV terms prior to query time

Ying Zhang, Phil Vines
2004 Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04  
We have successfully developed new techniques to extract and translate out of vocabulary terms using the Web and add them into a translation dictionary prior to query time.  ...  Several new techniques to improve the translation of out of vocabulary terms in English-Chinese cross-language information retrieval have been developed.  ...  Our first approach was to collect English text from the Web and exclude all terms that can be found in a translation dictionary.  ... 
doi:10.1145/1008992.1009102 dblp:conf/sigir/ZhangV04a fatcat:zxxsptznffhz3mwpgelmjmd77a

Discovering Chinese words from unsegmented text (poster abstract)

Xianping Ge, Wanda Pratt, Padhraic Smyth
1999 Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99  
In this paper, we investigate an efficient algorithm to discover the words and their occurrence probabilities from a corpus of unsegmented text without using a dictionary.  ...  Thus, effective information retrieval of Chinese text first requires good word segmentation.  ...  In the following sections, we investigate how to discover the words and their probabilities from a corpus of unsegmented text without using a dictionary.  ... 
doi:10.1145/312624.313472 dblp:conf/sigir/GePS99 fatcat:bffzfr3lezgfpczakpdshmkwba

Chinese Information Retrieval Using Lemur: NTCIR-5 CIR Experiments at UNT

Jiangping Chen, Rowena Li, Fei Li
2005 NTCIR Conference on Evaluation of Information Access Technologies  
This paper describes our participation in NTCIR-5 Chinese Information Retrieval (IR) evaluation. The main purpose is to evaluate Lemur, a freely available information retrieval toolkit.  ...  We also compared manual queries vs. automatic queries for Chinese IR. The results show that manually generated queries did not have much effect on IR performance.  ...  We applied dictionary based approach to segment the text using forward maximum matching between a Chinese sentence and the dictionary because it was fast and easy to implement.  ... 
dblp:conf/ntcir/ChenLL05 fatcat:e5vqkrrjb5d7bhzzfjrn7paxb4

Combining multiple sources for short query translation in Chinese-English cross-language information retrieval

Aitao Chen, Hailing Jiang, Fredric Gey
2000 Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00  
We used two transfer dictionaries and a Chinese search engine to translate short Chinese queries into English.  ...  In this paper, we examine various factors that affect the retrieval performance of Chinese-English cross-language retrieval.  ...  They used the parallel text to construct a Chinese-English bilingual dictionary that was used to translate queries. The parallel text complements existing bilingual dictionaries.  ... 
doi:10.1145/355214.355217 dblp:conf/iral/ChenJG00 fatcat:bppqa6jwhvafdeevfcnsaq4bka

Error correction in a Chinese OCR test collection

Yuen-Hsien Tseng
2002 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02  
This article proposes a technique for correcting Chinese OCR errors to support retrieval of scanned documents.  ...  Improved retrieval effectiveness on a single term query experiment is demonstrated.  ...  CONCLUSIONS A fully automatic error correction method is proposed for use in Chinese OCR text retrieval.  ... 
doi:10.1145/564376.564478 dblp:conf/sigir/Tseng02 fatcat:hip6qumb4fgzzewddwffv4yavm

Multilingual Information Retrieval Using English and Chinese Queries [chapter]

Aitao Chen
2002 Lecture Notes in Computer Science  
The coefficients were determined by fitting training data to the logistic regression model using a statistical software package. We refer readers to reference [3] for more details.  ...  This paper describes our retrieval experiments. ½ ½· ÐÓ Ç´Ê É µ The documents are ranked in decreasing order by their relevance probability È´Ê É µ with respect to a query.  ...  First, we examine all the possible ways to segment a Chinese text into words found in a Chinese dictionary.  ... 
doi:10.1007/3-540-45691-0_4 fatcat:e5bkqvm3knec3eiceoex45bw5e

CMU in Cross-Language Information Retrieval at NTCIR-3

Yiming Yang, Nianli Ma
2002 NTCIR Conference on Evaluation of Information Access Technologies  
online dictionary.  ...  We participated in the Cross-Language Information Retrieval evaluation at NTCIR-3 for the English-Chinese and English-Japanese tasks.  ...  of parallel text; by MRD-based we mean to use an online-readable dictionary.  ... 
dblp:conf/ntcir/YangM02 fatcat:3pgl4cszxndwpkehyme65kqr6i

From Text to Image: Generating Visual Query for Image Retrieval [chapter]

Wen-Cheng Lin, Yih-Chen Chang, Hsin-Hsi Chen
2005 Lecture Notes in Computer Science  
The retrieval results using textual and visual queries are combined to generate the final ranked list. We conducted English monolingual and Chinese-English cross-language retrieval experiments.  ...  The relationships between text and images are modeled. Visual queries are constructed from textual queries using the relationships.  ...  The bilingual dictionary is integrated from four resources, including the LDC Chinese-English dictionary, Denisowski's CEDICT 1 , BDC Chinese-English dictionary v2.2 2 and a dictionary used in query translation  ... 
doi:10.1007/11519645_65 fatcat:vigdrkpeufdubbegcufud4kzcq

Trans-EZ at NTCIR-2 : Synset Co-occurrence Method for English-Chinese Cross-Lingual Information Retrieval

Guo-Wei Bian, Chi-Ching Lin
2001 NTCIR Conference on Evaluation of Information Access Technologies  
In this paper, a new method for English-Chinese cross-lingual information retrieval is proposed and evaluated in NTCIR-II project.  ...  An English-Chinese WordNet and a synset co-occurrence model are adopted t o solve the problem of word sense ambiguity.  ...  The resources that we use are a bilingual dictionary, an English-Chinese WordNet, and a target language corpus.  ... 
dblp:conf/ntcir/BianL01 fatcat:itzz3axyjvgzzlp624z2eldigu

Search Between Chinese and Japanese Text Collections

Fredric C. Gey
2007 NTCIR Conference on Evaluation of Information Access Technologies  
We also utilized Machine Translation (MT) software between Japanese and Chinese, with English as a pivot language.  ...  While Chinese search without translation against Japanese documents performed credibly well for title only runs, the reverse (Japanese topic search of Chinese documents without translation) was poor.  ...  We have again found that when a Japanese version of an NTCIR topic consists of primarily Kanji text, then use of the Chinese topic directly (after character code conversion) against Japanese documents  ... 
dblp:conf/ntcir/Gey07 fatcat:gh4bixhhznggtjqp7wfixyfkqy
« Previous Showing results 1 — 15 out of 12,225 results