Filters








5,388 Hits in 10.6 sec

Dimension Projection among Languages based on Pseudo-relevant Documents for Query Translation [article]

Javid Dadashkarimi, Mahsa S. Shahshahani, Amirhossein Tebbifakhr, Heshaam Faili, Azadeh Shakery
2016 arXiv   pre-print
In this paper, we propose a new method for dictionary-based query translation based on dimension projection of embedded vectors from the pseudo-relevant documents in the source language to their equivalents  ...  To this end, first we learn low-dimensional vectors of the words in the pseudo-relevant collections separately and then aim to find a query-dependent transformation matrix between the vectors of translation  ...  Linear Projection between Languages based on Pseudo-relevant Documents In this section we introduce the proposed method in more details.  ... 
arXiv:1605.07844v2 fatcat:vzrmsfkcgjbbxcmulz633pesoy

Continuous space models for CLIR

Parth Gupta, Rafael E. Banchs, Paolo Rosso
2017 Information Processing & Management  
This property is very helpful for resource-poor languages, therefore, we carry out experiments on the English-Hindi language pair.  ...  Different from most existing models, which rely only on available parallel data for training, our learning framework provides a natural way to exploit monolingual data and its associated relevance metadata  ...  Acknowledgements We thank Germán Sanchis Trilles for helping in conducting experiments with machine translation.  ... 
doi:10.1016/j.ipm.2016.11.002 fatcat:vgiclzfllnb67fkd6omnttrdfm

Translingual information retrieval: learning from bilingual corpora

Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, Robert E. Frederking
1998 Artificial Intelligence  
Translingual information retrieval (TLIR) consists of providing a query in one language and searching document collections in one or more different languages.  ...  Query translation based on a general machine-readable bilingual dictionaryheretofore the most popular method-did not match the performance of other, more sophisticated methods.  ...  Acknowledgments We thank Christie Watson and Dorcas Wallace for their efforts in corpus annotation.  ... 
doi:10.1016/s0004-3702(98)00063-0 fatcat:madtpj3ndze3tapxlqvjafnjne

Advanced learning algorithms for cross-language patent retrieval and classification

Yaoyong Li, John Shawe-Taylor
2007 Information Processing & Management  
We also investigate learning algorithms for cross-language document classification. The learning algorithm are based on KCCA and Support Vector Machines (SVM).  ...  In comparison with most of other studies involving machine learning for cross-language information retrieval, which basically used learning techniques for monolingual sub-tasks, our learning algorithms  ...  Thank Sandor Szedmak for providing us the Matlab code solving SVM_2k. Thank Mitsuharu Makita for help in preprocessing Japanese document.  ... 
doi:10.1016/j.ipm.2006.11.005 fatcat:l2i4icimofclxhw646hghmf56i

Using KCCA for Japanese–English cross-language information retrieval and document classification

Yaoyong Li, John Shawe-Taylor
2006 Journal of Intelligent Information Systems  
A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese-English cross-language information retrieval.  ...  Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.  ...  We would also thank Mitsuharu Makita for help in preprocessing Japanese document. We thank anonymous reviewers for detailed comments and valuable suggestions.  ... 
doi:10.1007/s10844-006-1627-y fatcat:fnvcma2w7fes5eun2ihtt4myni

Learning Neural Representation for CLIR with Adversarial Framework

Bo Li, Ping Cheng
2018 Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing  
In this paper, we follow the success of neural representation in natural language processing (NLP) and develop a novel text representation model based on adversarial learning, which seeks a task-specific  ...  embedding space for CLIR.  ...  Acknowledgments We thank the anonymous reviewers for their valuable comments. This work was supported by the Fundamental Research Funds for Central Universities of CCNU (No. CCNU15A05062).  ... 
doi:10.18653/v1/d18-1212 dblp:conf/emnlp/LiC18 fatcat:rtlbaco6m5gobedpbi6rci5xim

TEKMA at CLEF-2021: BM-25 based rankings for scientific publication retrieval and data set recommendation

Jüri Keller, Leon Paul Mondrian Munz
2021 Conference and Labs of the Evaluation Forum  
We made one submission for each of the two tasks. For both submissions we focused on data enrichment and Solr's implementation of the probabilistic BM25 ranking function.  ...  In this paper we report the results of our participation in the Living Labs for Academic Search (LiLAS) CLEF Challenge, which is aimed at strengthening the concept of user-centered living labs for the  ...  Based on the assumption that the best-ranked documents are somehow relevant, information on them is used to rewrite and extend the query [7]. Using the base query, a ranking is generated.  ... 
dblp:conf/clef/KellerM21 fatcat:es2ge7e2znhhpfy7lx63xk4bty

Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models [article]

Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard
2022 arXiv   pre-print
In translate-train, the system is trained on the MS MARCO English queries coupled with machine translations of the associated MS MARCO passages.  ...  In zero-shot training, the system is trained on the English MS MARCO collection, relying on the XLM-R encoder for cross-language mappings.  ...  Acknowledgments This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via contract  ... 
arXiv:2201.08471v1 fatcat:qotjmi4dmner3cqxym6ad3ol3q

Information Flow Analysis with Chinese Text [chapter]

Paulo Cheong, Dawei Song, Peter Bruza, Kam-Fai Wong
2005 Lecture Notes in Computer Science  
To evaluate the Chinese-based information flow model, it is applied to query expansion, in which a set of test queries are expanded automatically via information flow computations and documents are retrieved  ...  The information inference derives implicit associations via computation of information flow on a high dimensional conceptual space, which is approximated by a cognitively motivated lexical semantic space  ...  The authors would like to thank Zi Huang for her great work and assistance on the Chinese word segmentation system.  ... 
doi:10.1007/978-3-540-30211-7_11 fatcat:54e4tng3vrgx3cakekxl5twrla

UsingWord Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval [article]

Paheli Bhattacharya, Pawan Goyal, Sudeshna Sarkar
2016 arXiv   pre-print
One of the standard methods is to use query translation from source to target language.  ...  In this paper, we propose an approach based on word embeddings, a method that captures contextual clues for a particular word in the source language and gives those words as translations that occur in  ...  After reducing the rank, the queries and the documents are projected to a lower dimensional space.  ... 
arXiv:1608.01561v1 fatcat:cjgynmaawzdzzhx2h3ozqlz2by

Leveraging Entities in Document Retrieval [chapter]

Krisztian Balog
2018 Advanced Topics in Information Retrieval  
The relevance between a query and a document is then estimated based on their projections to this latent entity space.  ...  Document-Based Query Expansion To give an idea of how traditional (term-based) pseudo relevance feedback works, we present one of the most popular approaches, the relevance model by Lavrenko and Croft  ... 
doi:10.1007/978-3-319-93935-3_8 fatcat:aeg7t42jhzeebelu4nht6q4iqu

Using Word Embeddings for Query Translation for Hindi to English Cross Language Information Retrieval

Paheli Bhattacharya, Pawan Goyal, Sudeshna Sarkar
2016 Journal of Computacion y Sistemas  
One of the standard methods is to use query translation from source to target language.  ...  In this paper, we propose an approach based on word embeddings, a method that captures contextual clues for a particular word in the source language and gives those words as translations that occur in  ...  Acknowledgments We would like to thank the anonymous reviewers for their valuable comments.  ... 
doi:10.13053/cys-20-3-2462 fatcat:zs44l332ivd77gglzixrnde3ay

Query representation for cross-temporal information retrieval

Miles Efron
2013 Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '13  
With this challenge in mind, we ask: given a query written in contemporary English, how can we retrieve relevant documents that were written in early English?  ...  We focus on ways to combine evidence to improve CTIR effectiveness, proposing and testing several ways to handle language change during book search.  ...  In the remainder of this section, we assume that based on our initial, dictionary-built query, we have retrieved k = 20 pseudo-relevant documents.  ... 
doi:10.1145/2484028.2484054 dblp:conf/sigir/Efron13 fatcat:thehxpjijvc3tc2n4x54rfserq

Using Language Models For Information Retrieval

Djoerd Hiemstra, Franciska De Jong
2001 Zenodo  
The approach uses simple document-based unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents.  ...  This book describes a mathematical model of information retrieval based on the use of statistical language models.  ...  I am most grateful to Wessel Kraaij of TNO-TPD for our cooperation in these projects, for our cooperation in four years of joined TREC-participations, and for implementing the language model algorithms  ... 
doi:10.5281/zenodo.570441 fatcat:mfju6ok4t5bzjp2pvp6bktfdn4

Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization

Jiho Noh, Ramakanth Kavuluru
2020 Findings of the Association for Computational Linguistics: EMNLP 2020  
The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets.  ...  Component (a) directly generates a matching score of a candidate document for a query.  ...  Model Source Target REL doc+query sentences doc relevance EXT doc token relevances ABS doc+facet signal a pseudo-query Implementation Details For all three models, we begin with the pretrained bert-base-uncased  ... 
doi:10.18653/v1/2020.findings-emnlp.304 pmid:34541588 pmcid:PMC8444997 fatcat:elsv2diavfe7vawrnwoohyagqm
« Previous Showing results 1 — 15 out of 5,388 results