5,923 Hits in 4.1 sec

Creating a Dutch Information Retrieval Test Corpus [chapter]

Djoerd Hiemstra, David van Leeuwen
Computational Linguistics in the Netherlands 2001  
Careful examination of the test collection shows that it serves as a reliable tool for the evaluation of information retrieval systems in the future.  ...  We describe in detail the characteristics of the Dutch test data, which is part of the official CLEF multilingual test corpus, and give an overview of the experimental results of companies and research  ...  We are most grateful to PCM Landelijke Dagbladen / Het Parool for providing the Dutch data.  ... 
doi:10.1163/9789004334038_012 fatcat:y3onk2grtbeifozohvdicv5eoy

A Cross-Language Approach to Historic Document Retrieval [chapter]

Marijn Koolen, Frans Adriaans, Jaap Kamps, Maarten de Rijke
2006 Lecture Notes in Computer Science  
using cross-language information retrieval techniques.  ...  Our experimental evidence is based on a collection of 17th century Dutch documents and a set of 25 known-item topics in modern Dutch.  ...  In total, the corpus contains 17,794 distinct word tokens. We created a topic set consisting of 25 modern Dutch known-item topics.  ... 
doi:10.1007/11735106_36 fatcat:yp7ckxapwnhilcb6mb6ih44h5m

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource [article]

Stéphan Tulkens, Chris Emmery, Walter Daelemans
2016 arXiv   pre-print
In this paper we demonstrate the performance of multiple types of embeddings, created with both count and prediction-based architectures on a variety of corpora, in two language-specific tasks: relation  ...  Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks.  ...  Mirroring this, we created a similar evaluation set for Dutch.  ... 
arXiv:1607.00225v1 fatcat:kvzlpmarbjcf3luam4nxw25n5a

Focused retrieval and result aggregation with political data

Rianne Kaptein, Maarten Marx
2010 Information retrieval (Boston)  
This paper presents a case-study in which we use a large semi-structured data set consisting of official transcripts of meetings of the Dutch parliament for focused retrieval and result aggregation.  ...  Users reported that, compared to a standard document retrieval system, our search engine gives a better overview of the data.  ...  Conclusions and future work We have provided a worked out example of an information retrieval system for a corpus of truly semi-structured documents: the proceedings of the Dutch parliament.  ... 
doi:10.1007/s10791-010-9130-z fatcat:yc62gobhz5apnhqze3gn54arwa

Page 151 of Computational Linguistics Vol. 29, Issue 1 [page]

2003 Computational Linguistics  
and Frank Schilder “Creating a Dutch information retrieval test corpus” by Djoerd Hiemstra and David van Leeuwen “Performance grammar: A declarative definition” by Gerard Kempen and Karin Harbusch “Multi-feature  ...  PAROLE corpus” by Jesse de Does and John van der Voort van der Kleij “A named entity recognition system for Dutch” by Fien De Meulder, Walter Daelemans, and Véronique Hoste “Reference resolution in context  ... 

Cross-Language Information Retrieval with Latent Topic Models Trained on a Comparable Corpus [chapter]

Ivan Vulić, Wim De Smet, Marie-Francine Moens
2011 Lecture Notes in Computer Science  
The probabilistic interlingual representation is incorporated in a statistical language model for information retrieval.  ...  In this paper we study cross-language information retrieval using a bilingual topic model trained on comparable corpora such as Wikipedia articles.  ...  None of these works apply the bilingual LDA model in a cross-lingual information retrieval setting. Cross-language information retrieval is a well-studied research topic (e.g., [8, 19, 24, 18] ).  ... 
doi:10.1007/978-3-642-25631-8_4 fatcat:exnq7ybqdjanpjp7iesfs6xltm

The Wikipedia XML corpus

Ludovic Denoyer, Patrick Gallinari
2006 SIGIR Forum  
Content-oriented XML retrieval is an area of Information Retrieval (IR) research that is receiving an increasing interest.  ...  The article provides a description of the corpus.  ...  Entity corpus We provide an Entity Corpus where each article of the Main English Corpus has been tagged using a set of possible entity types extracted using the different categories of wikipedia.  ... 
doi:10.1145/1147197.1147210 fatcat:yawgcuzx6rgl5csrav57ldosle

Natural Language Generation from Pictographs

Leen Sevens, Vincent Vandeghinste, Ineke Schuurman, Frank Van Eynde
2015 Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)  
We present a Pictograph-to-Text translation system for people with Intellectual or Developmental Disabilities (IDD).  ...  The system translates pictograph messages, consisting of one or more pictographs, into Dutch text using WordNet links and an ngram language model.  ...  For Dutch, we created a reverse lemmatizer based on the SoNaR corpus. 8 Each of these surface forms is a hypothesis for the language model, as described in section 5.4.  ... 
doi:10.18653/v1/w15-4711 dblp:conf/enlg/SevensVSE15 fatcat:byfv5eqnjbhhrabv43ntd6yqe4

Creating the DISEQuA Corpus: A Test Set for Multilingual Question Answering [chapter]

Bernardo Magnini, Simone Romagnoli, Alessandro Vallin, Jesús Herrera, Anselmo Peñas, Víctor Peinado, Felisa Verdejo, Maarten de Rijke
2004 Lecture Notes in Computer Science  
Finally, the result of the joint efforts was the creation of the DISEQuA (Dutch Italian Spanish English Questions and Answers) corpus, a useful and reusable resource that is freely available for the research  ...  Despite the little resources available, the three groups collaborated and managed to formulate and verify a large pool of original questions posed in three different languages: Dutch, Italian and Spanish  ...  We would like to thank all the people at ITC-irst (TCC division) who posed the necessary questions for the monolingual Italian test set.  ... 
doi:10.1007/978-3-540-30222-3_47 fatcat:nfaayaegkfhgpmfywcmwolbw3u

WikiTranslate: Query Translation for Cross-Lingual Information Retrieval Using Only Wikipedia [chapter]

Dong Nguyen, Arnold Overwijk, Claudia Hauff, Dolf R. B. Trieschnigg, Djoerd Hiemstra, Franciska de Jong
2009 Lecture Notes in Computer Science  
This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations.  ...  WikiTranslate is evaluated by searching with topics in Dutch, French and Spanish in an English data collection. The systems achieved a performance of 67% compared to the monolingual baseline.  ...  Introduction Cross-lingual information retrieval (CLIR) has become more important in recent years. CLIR enables users to retrieve documents in a different language than their original query.  ... 
doi:10.1007/978-3-642-04447-2_6 fatcat:7u6z45uywffdxakv7whsumxlaa

Creating Research Environments with BlackLab [chapter]

2017 CLARIN in the Low Countries  
The BlackLab search engine for linguistically annotated corpora is a recurring element in several CLARIN and other recent search and retrieval projects.  ...  This chapter describes the motivation for developing the BlackLab platform, how it has been used in actual research, and sketches future developments which will make it a more powerful tool for the creation  ...  Corpus Hedendaags Nederlands (Corpus of contemporary Dutch) The Corpus Hedendaags Nederlands (CHN) is a rst step towards a monitor corpus for contemporary Dutch, intergrating corpora gathered by INL in  ... 
doi:10.5334/bbi.20 fatcat:o33nd7ss7ndpncwlqzz4h3af7e

Sheffield University CLEF 2000 Submission — Bilingual Track: German to English [chapter]

Tim Gollins, Mark Sanderson
2001 Lecture Notes in Computer Science  
We compared the results of retrieval experiments using these queries, with other versions created by combining the transitive translations or created by direct translation.  ...  We investigated dictionary based cross language information retrieval using lexical triangulation. Lexical triangulation combines the results of different transitive translations.  ...  Introduction and Background Cross Language Information Retrieval (CLIR) addresses the situation where the query that a user presents to an IR system, is not in the same language as the corpus of documents  ... 
doi:10.1007/3-540-44645-1_24 fatcat:z2aeaiqbtnd5vixmymlytrvizq

Speech technology for unwritten languages

Odette Scharenborg, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux, Laurent Besacier (+7 others)
2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping.  ...  In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speechto-text and text-to-speech  ...  The CGN is a corpus of almost 9M words of Dutch spoken in the Netherlands and in Flanders (Belgium) in over 14 different speech styles, ranging from formal to informal.  ... 
doi:10.1109/taslp.2020.2973896 fatcat:mjhxfnrnq5g73jis6stemoogem

Relation Extraction for Open and Closed Domain Question Answering [chapter]

Gosse Bouma, Ismail Fahmi, Jori Mur
2011 Interactive Multi-modal Question-Answering  
Both methods improve the performance of the Dutch question answering system Joost.  ...  The IMIX corpus is relatively small and relation instances may contain complex noun phrases that do not occur frequently in the exact same form in the corpus.  ...  To test the effect of including these tables in our QA system Joost, we expanded the number of relevant questions in the CLEF QA test sets with a number of questions that we have created ourselves.  ... 
doi:10.1007/978-3-642-17525-1_8 dblp:series/tanlp/BoumaFM11 fatcat:jkoly4jsljaxfgfky3xs547j3i

Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Ivan Vulić, Wim De Smet, Marie-Francine Moens
2012 Information retrieval (Boston)  
The first focus lies on the task of cross-language information retrieval (CLIR).  ...  We confirm these findings in an alternative evaluation, where we automatically generate queries and perform the known-item search on a test subset of Wikipedia articles.  ...  for the tasks of English-Dutch and Dutch-English cross-language information retrieval.  ... 
doi:10.1007/s10791-012-9200-5 fatcat:ednqhlfih5dcphmyagg4hzz37i
« Previous Showing results 1 — 15 out of 5,923 results