Filters








1,111 Hits in 5.3 sec

Exploiting the Web as the multilingual corpus for unknown query translation

Jenq-Haur Wang, Jei-Wen Teng, Wen-Hsiang Lu, Lee-Feng Chien
2006 Journal of the American Society for Information Science and Technology  
In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries  ...  They propose a Webbased term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine.  ...  We intend to exploit the Web as the corpus to find effective translations automatically for query terms not included in a dictionary (unknown terms).  ... 
doi:10.1002/asi.20328 fatcat:xmr2ekbzorfjffegzfvbxy6zoq

Introduction to the special topic section on multilingual information systems

Christopher C. Yang, Wai Lam
2006 Journal of the American Society for Information Science and Technology  
Parallel and comparable corpora are important for generating a statistical translation model to overcome the limitations of a manually generated dictionary.  ...  All of this reveals the importance of research in multilingual information systems. There are several essential components in multilingual information systems as depicted in Figure 1 .  ...  the Web as the multilingual corpus source for translating unknown query terms.  ... 
doi:10.1002/asi.20325 fatcat:rjg7qleo7fh3zmdmhclksbg53i

Anchor text mining for translation of Web queries

Wen-Hsiang Lu, Lee-Feng Chien, Hsi-Jian Lee
2004 ACM Transactions on Information Systems  
To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons  ...  Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries.  ...  ACKNOWLEDGMENTS The authors would like to thank Prof. Mark Sanderson and the anonymous reviewers for their valuable comments and suggestions. Many thanks are given to Mr.  ... 
doi:10.1145/984321.984324 fatcat:75mnaq3qmza6vdduluhn3yhm5m

Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries [chapter]

Wen-Hsiang Lu, Jenq-Haur Wang, Lee-Feng Chien
2003 Lecture Notes in Computer Science  
Web mining methods that can exploit huge amounts of multilingual and wide-scoped Web resources as live bilingual corpora have received great attentions to alleviate the translation difficulties of query  ...  methods, which exploit huge amounts of multilingual and wide-scoped Web resources as live bilingual corpora to alleviate translation difficulties, and have been proven particularly effective for extracting  ...  Therefore, we present search-result-based approaches to fully exploiting Web resources where search result pages of queries submitted to real search engines are used as the corpus for extracting translations  ... 
doi:10.1007/978-3-540-24594-0_8 fatcat:zdak4nwubndkzksnk2mt5xmmxi

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Pratibha Bajpai .
2014 International Journal of Research in Engineering and Technology  
This makes cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) for Web applications a valuable need of the day.  ...  It will also discuss the issues related to the English to Hindi language translation. We had tested 30 queries manually using suggested prototype and found that the precision level is quite good.  ...  Wang et. al. ( 2004 ) exploit the bilingual search result pages obtained from a real search engine as a corpus for automatic translation of unknown query terms not included in the dictionary.  ... 
doi:10.15623/ijret.2014.0322010 fatcat:zcjebdivyfcivnzjfvxpk2ec5u

TNO at CLEF-2001: Comparing Translation Resources [chapter]

Wessel Kraaij
2002 Lecture Notes in Computer Science  
The main contribution of this paper is a systematic comparison of three types of translation resources for bilingual retrieval based on query translation.  ...  This paper describes the official runs of TNO TPD for CLEF-2001. We participated in the monolingual, bilingual and multilingual tasks.  ...  We also thank George Foster and Jian-Yun Nie (also RALI) for general discussions about the application of statistical translation models for CLIR.  ... 
doi:10.1007/3-540-45691-0_6 fatcat:iacbryudt5fbbj2477uiyrtkdq

Precision at K in Multilingual Information Retrieval

Pothula Sujatha, P. Dhavachelvan
2011 International Journal of Computer Applications  
Multilingual Information Retrieval (MLIR) system helps the users to pose the query in one language and retrieve the documents in more than one language.  ...  Information Retrieval (IR) is used to store and represent the knowledge and the retrieval of information relevant for a special user query.  ...  The MLIR techniques are: An approach for exploiting the Web as the multilingual corpus source for translating unknown query terms have been proposed by [2] .  ... 
doi:10.5120/2990-3929 fatcat:dwhgticdujaffjyeuif53fqq5u

Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval [chapter]

Djoerd Hiemstra, Wessel Kraaij, Renée Pohlmann, Thijs Westerveld
2001 Lecture Notes in Computer Science  
Finally, we performed preliminary experiments to exploit the web to generate translation probabilities and bilingual dictionaries, notably for English-Italian and English-Dutch.  ...  This paper describes the official runs of the Twenty-One group for the first CLEF workshop. The Twenty-One group participated in the monolingual, bilingual and multilingual tasks.  ...  Acknowledgements We would like to thank the Druid project for sponsoring the translation of the topic set into Dutch. We thank Xerox XRCE for making the Xelda morphological toolkit available to us.  ... 
doi:10.1007/3-540-44645-1_10 fatcat:corp7lp6uvae5bb4pms3dlbv4m

Translating unknown cross-lingual queries in digital libraries using a web-based approach

Jenq-Haur Wang, Jei-Wen Teng, Pu-Jen Cheng, Wen-Hsiang Lu, Lee-Feng Chien
2004 Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries - JCDL '04  
In this paper, we investigate the feasibility of exploiting the Web as the corpus source to translate unknown query terms for cross-language information retrieval (CLIR) in digital libraries.  ...  We propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine.  ...  We intend to exploit the Web as the corpus to find effective translations automatically for query terms not included in a dictionary (unknown terms).  ... 
doi:10.1145/996350.996378 dblp:conf/jcdl/WangTCLC04 fatcat:4z7xxoqpxzgzncncyl2ubuvbee

Translation of web queries using anchor text mining

Wen-Hsiang Lu, Lee-Feng Chien, Hsi-Jian Lee
2002 ACM Transactions on Asian Language Information Processing  
The proposed approach successfully exploits the anchor-text resources and reduces the existing difficulties of query term translation.  ...  ________________________________________________________________________ This article presents an approach to automatically extracting translations of Web query terms through mining of Web anchor texts  ...  The authors would like to thank Kam-Fai Wong and Noriko Kando, and also the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/568954.568958 fatcat:2mrbolllebgclayqicdmkxjrn4

Creating multilingual translation lexicons with regional variations using web corpora

Pu-Jen Cheng, Yi-Cheng Pan, Wen-Hsiang Lu, Lee-Feng Chien
2004 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics - ACL '04  
The purpose of this paper is to automatically create multilingual translation lexicons with regional variations.  ...  We propose a transitive translation approach to determine translation variations across languages that have insufficient corpora for translation via the mining of bilingual search-result pages and clues  ...  In addition, Simard (2000) exploited the transitive properties of translations to improve the quality of multilingual text alignment.  ... 
doi:10.3115/1218955.1219023 dblp:conf/acl/ChengLTC04 fatcat:fhyi4bmy3jhgfd7yq2ausw7ysm

ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus [article]

Ayyoob Imani, Masoud Jalili Sabet, Philipp Dufter, Michael Cysouw, Hinrich Schütze
2021 arXiv   pre-print
ParCourE can be set up for any parallel corpus and can thus be used for typological research on other corpora as well as for exploring their quality and properties.  ...  Researching typological properties of languages is fundamental for progress in multilingual NLP.  ...  We exploit the generated word alignments to induce lexicons for all 889,111 language pairs. To this end, we consider aligned words as translations of each other.  ... 
arXiv:2107.06632v2 fatcat:sq37ij4dfvgptca5bk4omizf44

Compilation and Exploitation of Parallel Corpora

Toma� Erjavec
2003 Journal of Computing and Information Technology  
Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers, or as a data-source for various language technology tools.  ...  Two exploitation results over our annotated corpora are also presented, namely a Web concordancer and the extraction of bi-lingual lexica.  ...  Acknowledgements The author would like to thank the company Amebis, d.o.o., for lexically annotating the Slovene part of the IJS-ELAN corpus and Jin-Dong Kim for useful comments on a previous version of  ... 
doi:10.2498/cit.2003.02.02 fatcat:ddwdai2mhnfy3eh72wwesx33dm

Cross-Language Information Retrieval

Jian-Yun Nie
2010 Synthesis Lectures on Human Language Technologies  
A method that exploits parallel texts for query translation is proposed. This method is shown to allow for retrieval effectiveness comparable to the state-of-the-art effectiveness.  ...  In order to increase the translation accuracy, compound terms are extracted and incorporated into the translation models, so that compounds can be translated as a unit, rather than as separate words.  ...  This problem is more and more acute for IR on the Web due to the fact that the Web is a truly multilingual environment.  ... 
doi:10.2200/s00266ed1v01y201005hlt008 fatcat:a7ncb6fhkfcu5njlwsdllx45nu

Integration Of Machine Translation In On-Line Multilingual Applications: Domain Adaptation [chapter]

Mirela-Ştefania Duma, Cristina Vertan
2018 Zenodo  
Large amounts of bilingual corpora are used in the training process of statistical machine translation systems. Usually a general domain is used as the training corpus.  ...  In this paper, we used language model interpolation as a domain adaptation method and proved that it is a fast state of the art method that can be used in building adapted translation systems even when  ...  We want to thank the anonymous reviewers for their comments and constructive suggestions.  ... 
doi:10.5281/zenodo.1291936 fatcat:aw5afygi5vh7dm3u6odmp4vbr4
« Previous Showing results 1 — 15 out of 1,111 results