25 Hits in 3.4 sec

Experiments in CLIR using fuzzy string search based on surface similarity

Sethuramalingam Subramaniam, Anil Kumar Singh, Pradeep Dasigi, Vasudeva Varma
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity.  ...  In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching.  ...  search based on Surface Similarity.  ... 
doi:10.1145/1571941.1572076 dblp:conf/sigir/SubramaniamSDV09 fatcat:lrn7qvqb4fgynkgubuepbc7q5e

Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems

Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap
2015 International Journal of Engineering Research and  
Query translation is the most important component in Cross Language Information Retrieval systems using dictionary-based approach.  ...  The method is based on constructing bi-lingual dictionaries, keyword extraction from source query, getting translation candidates for each keyword using mutual information and finally building structured  ...  Value of C is based on the text corpus size. In our experiment, we define C = log 2 (12000). The second formula is based on the monolingual English IR system.  ... 
doi:10.17577/ijertv4is040317 fatcat:q7ut2zpt2bdpffqgti5iqhjzvq

The effect of bilingual term list size on dictionary-based cross-language information retrieval

D. Demner-Fushman, D.W. Oard
2003 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the  
Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on  ...  The contribution of named entity translation was evaluated in a cross-language experiment involving English and Chinese.  ...  Acknowledgments The authors would like to thank Terry Zhao for tagging named entities in Chinese queries. This work has been supported in part by DARPA cooperative agreement N660010028910.  ... 
doi:10.1109/hicss.2003.1174250 dblp:conf/hicss/Demner-FushmanO03 fatcat:qdkngvr7ljf2lecaop4olfaspe

Variations on language modeling for information retrieval

Wessel Kraaij
2005 SIGIR Forum  
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6  ...  In the following sections we will describe the tools we used for conflation based on approximate string matching and the experimental results. Fuzzy conflation architecture.  ...  in the indexing dictionary using the fuzzy index (on-line).  ... 
doi:10.1145/1067268.1067291 fatcat:h23lp5aqfvfu5iecwnihfme244

Domain Adaptation for Statistical Machine Translation [article]

Longyue Wang
2018 arXiv   pre-print
The third one is the out-of-vocabulary words (OOVs) problem. In-domain training data are often scarce with low terminology coverage.  ...  The second one is language style due to the fact that texts from different genres are always presented in different syntax, length and structural organization.  ...  One of string-difference metrics, edit distance is a widely used similarity measure, known as Levenshtein distance (Levenshtein, 1966) .  ... 
arXiv:1804.01760v1 fatcat:ik6wji6vlbayddavb4wiun7df4

Word-to-Word Models of Translational Equivalence [article]

I. Dan Melamed
1998 arXiv   pre-print
Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models.  ...  First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties.  ...  Acknowledgements Many of the ideas in this paper came from enlightening correspondence with Ken Church, Mike Collins, Ido Dagan, Jason Eisner, Steven Finch, George Foster, Djoerd Hiemstra, Adwait Ratnaparkhi  ... 
arXiv:cmp-lg/9805006v1 fatcat:66ldvfitx5akrhzkywaz4gqioq

Statistical source expansion for question answering

Nico Schlaefer, Jennifer Chu-Carroll, Eric Nyberg, James Fan, Wlodek Zadrozny, David Ferrucci
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
The statistical models use a comprehensive set of features to predict the topicality and quality of text nuggets based on topic models built from seed content, search engine rankings and surface characteristics  ...  These differences are statistically significant and result in noticeable gains in search performance in a task-based evaluation on QA datasets.  ...  Often different surface strings are used to refer to the same topic (e.g.  ... 
doi:10.1145/2063576.2063632 dblp:conf/cikm/SchlaeferCNFZF11 fatcat:whoy62klazctbdo4p57wbevkdu

Natural Language Processing as a Foundation of the Semantic Web

Yorick Wilks, Christopher Brewster
2007 Foundations and Trends® in Web Science  
Acknowledgments CB would like to thank José Iria for help in discussing and formulating the Abraxas model, and Ziqi Zhang for undertaking parts of the implementation and evaluation.  ...  CB has been supported in this work by the EPSRC project Abraxas ( under grant number GR/T22902/01 and the EC funded IP Companions IST-034434 (  ...  mere strings of key words, and also because the classic IR paradigm using very long queries is being replaced by the Google paradigm of search based on two or three highly ambiguous key terms.  ... 
doi:10.1561/1800000002 fatcat:n2xfw3qdhverrokidb2globwyq

Quantifying Cross-lingual Semantic Similarity for Natural Language Processing Applications

Katharina Wäschle
We model cross-lingual similarity both based on alignment scores and full translations and combine the models in a statistical learning framework.  ...  We present an analysis of the kind of noise present in our data, a method to filter the noise and experiments on its influence on MT quality.  ...  Usually, string similarity is used to find good fuzzy matches. We strive to go beyond the surface and enable the use of monolingual resources for retrieving fuzzy matches at the same time.  ... 
doi:10.11588/heidok.00019046 fatcat:lonvu2wzujcplhqhpyjdzxidqm

Cost-to-Go Function Approximation [chapter]

2017 Encyclopedia of Machine Learning and Data Mining  
Mitchell's, (1982; candidate-elimination algorithm performs a bidirectional search in the hypothesis space.  ...  These two sets form two boundaries on the version space.  ...  In: Furukawa K, Cross-References  ... 
doi:10.1007/978-1-4899-7687-1_100093 fatcat:vse7ncdqs5atlosjhz7fhlj3im

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Application of NLP in Indian Languages in Information Retrieval

M Thirumalai, B Mallikarjun, Sam, B A Sharada, A R Fatihi, Lakhan, Marie Jennifer, S M Bayer, G Ravichandran, L Baskaran, Ramamoorthy, B Sharada
Experiments in CLIR using fuzzy string search based on surface similarity in 2009 discusses Cross Language Information Retrieval (CLIR) between languages of the same origin.  ...  Bhasa, is a corpus-based search engine and summarizer that performs document indexing and retrieves information based on key words using vector space retrieval method.  ...  of natural languages using a rule based approach was discussed in 2002.  ... 

Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments

Imad Zeroual, Abdelhak Lakhouaja
2019 Zenodo  
The survey presents a summarisation of data sources and different compilation methods used in relation to corpus characteristics like size and time consumed during the compilation process.  ...  The prime motivation for carrying out the research in this thesis comes from the limited research on Arabic corpus linguistics and the lack of available resources, standards, and efficient tools that can  ...  As mentioned, Sinclair (2005) formulates the overall instructions proposed by the previous authors in ten fundamental criteria to follow in the design and the compilation of a general corpus:  ... 
doi:10.5281/zenodo.4441159 fatcat:nwix7lrzrbaxpgasing7mgdtwq

Parton_columbia_0054D_11044.pdf [article]

We use the TLIR corpus to carry out a task-embedded MT evaluation, which shows that our CLIR models address lost in retrieval errors, resulting in higher TLIR recall; and that the APEs successfully correct  ...  Furthermore, we develop an analysis framework for isolating the impact of MT errors on CLIR and on result understanding, as well as evaluating the whole TLIR task.  ...  The features are based on surface strings as well as bilingual POS tags.  ... 
doi:10.7916/d81260sm fatcat:xfjkg2m4gnc6fpvrsvee77eh5a

Final Report --- Always Already Computational: Collections as Data [article]

Thomas Padilla, Laurie Allen, Hannah Frost, Sarah Potvin, Elizabeth Russey Roke, Stewart Varner
2019 Zenodo  
From 2016-2018 Always Already Computational: Collections as Data documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven  ...  With funding from the Institute of Museum and Library Services, Always Already Computational held two national forums, organized multiple workshops, shared project outcomes in disciplinary and professional  ...  In thinking about ways to facilitate use and reuse, I hope to draw on my current research as a CLIR/DLF Software Curation Postdoctoral Fellow.  ... 
doi:10.5281/zenodo.3152934 fatcat:4plw2tw3tzha3bt6qwvhrqcyrq

Position Statements --- Always Already Computational: Collections as Data [article]

Jefferson Bailey, Alexandra Chassanoff, Tanya Clement, Gabrielle P. Foreman, Labanya Mookerjee, Dan Fowler, Harriett Green, Jennifer Guiliano, Juliet L. Hardesty, Christina Harlow, Greg Jansen, Richard Marciano (+16 others)
2019 Zenodo  
The statements certainly informed the work of the forum, and consequently the iterative community based development of project outcomes.  ...  salient to the scope of work described in Always Already Computational.  ...  In thinking about ways to facilitate use and reuse, I hope to draw on my current research as a CLIR/DLF Software Curation Postdoctoral Fellow.  ... 
doi:10.5281/zenodo.3066160 fatcat:loxxdkbqwfcqjcgjyymkb73uqy
« Previous Showing results 1 — 15 out of 25 results