Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation

Mohammed Aljlayl, Ophir Frieder
2001 Proceedings of the tenth international conference on Information and knowledge management - CIKM'01  
In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. Machine-Readable Dictionary (MRD) and Machine Translation (MT) are important resources for query translation in CLIR. We investigate MT and MRD to Arabic-English CLIR. The translation ambiguity associated with these resources is the key problem. We present three methods of query translation using a bilingual dictionary for Arabic-English CLIR. First, we present the Every-Match
more » ... (EM) method. This method yields ambiguous translations since many extraneous terms are added to the original query. To disambiguate the query translation, we present the First-Match (FM) method that considers the first match in the dictionary as the candidate term. Finally, we present the Two-Phase (TP) method. We show that good retrieval effectiveness can be achieved without complex resources using the Two-Phase method for Arabic-English CLIR. We also empirically evaluate the effectiveness of the MT-based method using short, medium, and long queries from TREC. The effects of the query length on the quality of the MT-based CLIR are investigated.
doi:10.1145/502585.502635 dblp:conf/cikm/AljlaylF01 fatcat:eu2qayl4jje5zfiu6rsfvh3rsm