A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Experiments in CLIR using fuzzy string search based on surface similarity
2009
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09
We found significant improvements for all the six language pairs using a method for fuzzy text search based on Surface Similarity. ...
In this paper we report these results and compare them with a baseline CLIR system and a CLIR system that uses Scaled Edit Distance (SED) for fuzzy string matching. ...
search based on Surface Similarity. ...
doi:10.1145/1571941.1572076
dblp:conf/sigir/SubramaniamSDV09
fatcat:lrn7qvqb4fgynkgubuepbc7q5e
Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems
2015
International Journal of Engineering Research and
Query translation is the most important component in Cross Language Information Retrieval systems using dictionary-based approach. ...
The method is based on constructing bi-lingual dictionaries, keyword extraction from source query, getting translation candidates for each keyword using mutual information and finally building structured ...
Value of C is based on the text corpus size. In our experiment, we define C = log 2 (12000). The second formula is based on the monolingual English IR system. ...
doi:10.17577/ijertv4is040317
fatcat:q7ut2zpt2bdpffqgti5iqhjzvq
The effect of bilingual term list size on dictionary-based cross-language information retrieval
2003
36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the
Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on ...
The contribution of named entity translation was evaluated in a cross-language experiment involving English and Chinese. ...
Acknowledgments The authors would like to thank Terry Zhao for tagging named entities in Chinese queries. This work has been supported in part by DARPA cooperative agreement N660010028910. ...
doi:10.1109/hicss.2003.1174250
dblp:conf/hicss/Demner-FushmanO03
fatcat:qdkngvr7ljf2lecaop4olfaspe
Variations on language modeling for information retrieval
2005
SIGIR Forum
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6 ...
In the following sections we will describe the tools we used for conflation based on approximate string matching and the experimental results. Fuzzy conflation architecture. ...
in the indexing dictionary using the fuzzy index (on-line). ...
doi:10.1145/1067268.1067291
fatcat:h23lp5aqfvfu5iecwnihfme244
Domain Adaptation for Statistical Machine Translation
[article]
2018
arXiv
pre-print
The third one is the out-of-vocabulary words (OOVs) problem. In-domain training data are often scarce with low terminology coverage. ...
The second one is language style due to the fact that texts from different genres are always presented in different syntax, length and structural organization. ...
One of string-difference metrics, edit distance is a widely used similarity measure, known as Levenshtein distance (Levenshtein, 1966) . ...
arXiv:1804.01760v1
fatcat:ik6wji6vlbayddavb4wiun7df4
Word-to-Word Models of Translational Equivalence
[article]
1998
arXiv
pre-print
Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. ...
First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. ...
Acknowledgements Many of the ideas in this paper came from enlightening correspondence with Ken Church, Mike Collins, Ido Dagan, Jason Eisner, Steven Finch, George Foster, Djoerd Hiemstra, Adwait Ratnaparkhi ...
arXiv:cmp-lg/9805006v1
fatcat:66ldvfitx5akrhzkywaz4gqioq
Statistical source expansion for question answering
2011
Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11
The statistical models use a comprehensive set of features to predict the topicality and quality of text nuggets based on topic models built from seed content, search engine rankings and surface characteristics ...
These differences are statistically significant and result in noticeable gains in search performance in a task-based evaluation on QA datasets. ...
Often different surface strings are used to refer to the same topic (e.g. ...
doi:10.1145/2063576.2063632
dblp:conf/cikm/SchlaeferCNFZF11
fatcat:whoy62klazctbdo4p57wbevkdu
Natural Language Processing as a Foundation of the Semantic Web
2007
Foundations and Trends® in Web Science
Acknowledgments CB would like to thank José Iria for help in discussing and formulating the Abraxas model, and Ziqi Zhang for undertaking parts of the implementation and evaluation. ...
CB has been supported in this work by the EPSRC project Abraxas (http://nlp.shef.ac.uk/abraxas/) under grant number GR/T22902/01 and the EC funded IP Companions IST-034434 (www.companionsproject.org) ...
mere strings of key words, and also because the classic IR paradigm using very long queries is being replaced by the Google paradigm of search based on two or three highly ambiguous key terms. ...
doi:10.1561/1800000002
fatcat:n2xfw3qdhverrokidb2globwyq
Quantifying Cross-lingual Semantic Similarity for Natural Language Processing Applications
2015
We model cross-lingual similarity both based on alignment scores and full translations and combine the models in a statistical learning framework. ...
We present an analysis of the kind of noise present in our data, a method to filter the noise and experiments on its influence on MT quality. ...
Usually, string similarity is used to find good fuzzy matches. We strive to go beyond the surface and enable the use of monolingual resources for retrieving fuzzy matches at the same time. ...
doi:10.11588/heidok.00019046
fatcat:lonvu2wzujcplhqhpyjdzxidqm
Cost-to-Go Function Approximation
[chapter]
2017
Encyclopedia of Machine Learning and Data Mining
Mitchell's, (1982; candidate-elimination algorithm performs a bidirectional search in the hypothesis space. ...
These two sets form two boundaries on the version space. ...
In: Furukawa K,
Cross-References ...
doi:10.1007/978-1-4899-7687-1_100093
fatcat:vse7ncdqs5atlosjhz7fhlj3im
LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Application of NLP in Indian Languages in Information Retrieval
unpublished
Experiments in CLIR using fuzzy string search based on surface similarity in 2009 discusses Cross Language Information Retrieval (CLIR) between languages of the same origin. ...
Bhasa, is a corpus-based search engine and summarizer that performs document indexing and retrieves information based on key words using vector space retrieval method. ...
of natural languages using a rule based approach was discussed in 2002. ...
fatcat:2n7r3zdesbghpoqkyywawev5em
Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments
2019
Zenodo
The survey presents a summarisation of data sources and different compilation methods used in relation to corpus characteristics like size and time consumed during the compilation process. ...
The prime motivation for carrying out the research in this thesis comes from the limited research on Arabic corpus linguistics and the lack of available resources, standards, and efficient tools that can ...
As mentioned, Sinclair (2005) formulates the overall instructions proposed by the previous authors in ten fundamental criteria to follow in the design and the compilation of a general corpus: ...
doi:10.5281/zenodo.4441159
fatcat:nwix7lrzrbaxpgasing7mgdtwq
Parton_columbia_0054D_11044.pdf
[article]
2017
We use the TLIR corpus to carry out a task-embedded MT evaluation, which shows that our CLIR models address lost in retrieval errors, resulting in higher TLIR recall; and that the APEs successfully correct ...
Furthermore, we develop an analysis framework for isolating the impact of MT errors on CLIR and on result understanding, as well as evaluating the whole TLIR task. ...
The features are based on surface strings as well as bilingual POS tags. ...
doi:10.7916/d81260sm
fatcat:xfjkg2m4gnc6fpvrsvee77eh5a
Final Report --- Always Already Computational: Collections as Data
[article]
2019
Zenodo
From 2016-2018 Always Already Computational: Collections as Data documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven ...
With funding from the Institute of Museum and Library Services, Always Already Computational held two national forums, organized multiple workshops, shared project outcomes in disciplinary and professional ...
In thinking about ways to facilitate use and reuse, I hope to draw on my current research as a CLIR/DLF Software Curation Postdoctoral Fellow. ...
doi:10.5281/zenodo.3152934
fatcat:4plw2tw3tzha3bt6qwvhrqcyrq
Position Statements --- Always Already Computational: Collections as Data
[article]
2019
Zenodo
The statements certainly informed the work of the forum, and consequently the iterative community based development of project outcomes. ...
salient to the scope of work described in Always Already Computational. ...
In thinking about ways to facilitate use and reuse, I hope to draw on my current research as a CLIR/DLF Software Curation Postdoctoral Fellow. ...
doi:10.5281/zenodo.3066160
fatcat:loxxdkbqwfcqjcgjyymkb73uqy
« Previous
Showing results 1 — 15 out of 25 results