Filters








7,562 Hits in 3.7 sec

Indexing and stemming approaches for the Czech language

Ljiljana Dolamic, Jacques Savoy
2009 Information Processing & Management  
This paper describes and evaluates various stemming and indexing strategies for the Czech language.  ...  Based on Czech test-collection, we have designed and evaluated two stemming approaches, a light and a more aggressive one.  ...  Acknowledgment This research was supported in part by the Swiss NSF under Grant #200021-113273. Appendix A. Description of our Czech light stemmer See Fig. A1 .  ... 
doi:10.1016/j.ipm.2009.06.001 fatcat:by4xmv2ktrdhzdbxfdmuvgaxfu

Retrieval Experiments at Morpho Challenge 2008

Paul McNamee
2008 Conference and Labs of the Evaluation Forum  
stemming are helpful; and, (3) full character n-gram indexing is the most effective form of tokenization in more morphologically complex languages.  ...  We found that: (1) rule-based stemming is effective in less morphologically complicated languages; (2) alternative methods for stemming such as unsupervised learning of morphemes and least common n-gram  ...  The differences in French and Spanish were Least Common N-gram Stems Another language-neutral approach to stemming is to select for each word, its least common ngram.  ... 
dblp:conf/clef/McNamee08b fatcat:u7rzgs2z7fd7pfdmh4x62mk55u

JHU Ad Hoc Experiments at CLEF 2008 [chapter]

Paul McNamee
2009 Lecture Notes in Computer Science  
The approach we adopted for TEL was to strip out non-content sections of records and to treat the task as ordinary full-text search using character n-grams and stemmed words.  ...  Using the provided training topics we compared character n-grams, n-gram stems, ordinary words, words automatically segmented into morphemes, and a novel form of n-gram indexing based on n-grams with character  ...  Least Common N-gram Stems Another language-neutral approach to stemming is to select for each word, its least common ngram.  ... 
doi:10.1007/978-3-642-04447-2_21 fatcat:3sqlbbailjhwpdjquucpdsvpzm

University of Chicago at the CLEF 2007 Cross Language Speech Retrieval Track

Gina-Anne Levow
2007 Conference and Labs of the Evaluation Forum  
Czech experiments explored the effect of different stemming approaches on retrieval for this morphologically rich language.  ...  The University of Chicago participated in the CLEF 2007 CL-SR track, performing monolingual retrieval for both English and Czech and cross-language French-English retrieval.  ...  Monolingual Czech Retrieval: Stemming Strategies The Czech language poses special challenges for information retrieval.  ... 
dblp:conf/clef/Levow07 fatcat:yrjqyosmwnbupgiw5puzmfc4ke

Benefit of Proper Language Processing for Czech Speech Retrieval in the CL-SR Task at CLEF 2006 [chapter]

Pavel Ircing, Luděk Müller
2007 Lecture Notes in Computer Science  
We have employed the Czech morphological analyser and tagger for that purposes.  ...  We have decided to concentrate only on the monolingual searching in the Czech test collection and investigate the effect of proper language processing on the retrieval performance.  ...  language modeling approach.  ... 
doi:10.1007/978-3-540-74999-8_95 fatcat:nblk56remnbhdjn2ondnwa4nnm

A New Czech Morphological Analyser ajka [chapter]

Radek Sedláček, Pavel Smrž
2001 Lecture Notes in Computer Science  
A brief description of the data structures used for storing morphological information as well as a discussion of the efficient storage of lexical items (stem bases of Czech words) is included too.  ...  First, we present two most important word-forming processes in Czech -inflection and derivation.  ...  On the other hand, the highly inflectional languages like Czech or Finnish present a difficulty for such simple approaches as the expansion of the dictionary is at least an order of magnitude greater 1  ... 
doi:10.1007/3-540-44805-5_13 fatcat:iilbu56yxnfc7kbwzh6hyjcyxu

CUNI team: CLEF eHealth Consumer Health Search Task 2018

Shadi Saleh, Pavel Pecina
2018 Conference and Labs of the Evaluation Forum  
In IRTask4, we submitted 4 runs for each language of Czech, French and German.  ...  We use this list for two systems: the first one uses 1-best-list translation to construct queries, and the second one uses a hypotheses reranker to select the best translation (in terms of retrieval performance  ...  Acknowledgments This research was supported by the Czech Science Foundation (grant n. P103/12/G084).  ... 
dblp:conf/clef/SalehP18 fatcat:u6cbjs52o5bgfm5sxgybxinu74

HPS: High precision stemmer

Tomáš Brychcín, Miloslav Konopík
2015 Information Processing & Management  
We used corpora in the Czech, Slovak, Polish, Hungarian, Spanish and English languages.  ...  In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for the second-stage algorithm  ...  We also thank the Czech News Agency for providing a huge number of texts in Czech.  ... 
doi:10.1016/j.ipm.2014.08.006 fatcat:giqrh6znpfh6zduiq3p64g23ny

Report of MIRACLE team for the Ad-Hoc Track in CLEF 2007

José Miguel Goñi-Menoyo, José Carlos González Cristóbal, Julio Villena-Román, Sara Lana-Serrano
2007 Conference and Labs of the Evaluation Forum  
For this campaign, runs were submitted for the following languages and tracks: -Monolingual: Bulgarian, Hungarian, and Czech. -Robust monolingual: French, English and Portuguese.  ...  The work carried out for this campaign has been reduced to monolingual experiments, in the standard and in the robust tracks.  ...  -07588-C03-01; and by the Madrid's R+D Regional Plan, by means of the project MAVIR (Enhancing the Access and the Visibility of Networked Multilingual Information for Madrid Community), S-0505/TIC/000267  ... 
dblp:conf/clef/Goni-MenoyoCVL07 fatcat:2tl2h5ipmbemvnsbulnxya5grm

Charles University at CLEF 2007 CL-SR Track

Pavel Ceska, Pavel Pecina
2007 Conference and Labs of the Evaluation Forum  
We employed own morphological tagger and lemmatized the collection before indexing to deal with the rich morphology in Czech which significantly improved our results.  ...  This paper describes a system built at Charles University in Prague for participation in the CLEF 2007 Cross-Language Speech Retrieval track.  ...  Acknowledgments This work has been supported by the Ministry of Education of the Czech Republic, projects MSM 0021620838 and #1P05ME786.  ... 
dblp:conf/clef/CeskaP07a fatcat:mtktwpe37rclvpa62olzhcj734

Czech Monolingual Information Retrieval Using Off-The-Shelf Components - the University of West Bohemia at CLEF 2007 Ad-Hoc track

Pavel Ircing, Ludek Müller
2007 Conference and Labs of the Evaluation Forum  
The effect of the blind relevance feedback was also explored. Czech morphological analyser and tagger were used for lemmatization and stop word removal.  ...  We have performed only monolingual experiments (Czech documents -Czech queries) using two incarnations of the tf.idf model -one with raw term frequency and the other with the BM25 term frequency weighting  ...  Acknowledgments This work was supported by the Grant Agency of the Czech Academy of Sciences project No. 1ET101470416 and the Ministry of Education of the Czech Republic project No. LC536.  ... 
dblp:conf/clef/IrcingM07a fatcat:wvtuc5peznegvbay3r63kigxnm

Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak) [chapter]

Jan Nouza, Jindrich Zdansky, Petr Cerva, Jan Silovsky
2010 Lecture Notes in Computer Science  
We present our solutions we applied in the design of voice dictation and broadcast speech transcription systems developed for Czech.  ...  Slavic languages pose a big challenge for researchers dealing with speech technology.  ...  Acknowledgement This work was supported by grants no. 102/08/0707 by the Grant Agency of the Czech Republic and no.  ... 
doi:10.1007/978-3-642-12397-9_19 fatcat:h3tegqw37zhnrm6vk53nnc6ewa

SAPKOS: Experimental Czech Multi-label Document Classification and Analysis System [chapter]

Ladislav Lenc, Pavel Král
2015 IFIP Advances in Information and Communication Technology  
The system which integrates the state-of-the-art machine learning and natural language processing approaches is intended to be used by the Czech news Agency (ČTK).  ...  An interesting contribution is that, to the best of our knowledge, no other automatic Czech document classification system exists. It is also worth mentioning that the system accuracy is very high.  ...  We would like also to thank Czech New Agency (ČTK) for support and for providing the data.  ... 
doi:10.1007/978-3-319-23868-5_24 fatcat:zigmf5hv6zebjplup2hwnfn3gy

Page 2645 of Linguistics and Language Behavior Abstracts: LLBA Vol. 28, Issue 5 [page]

1994 Linguistics and Language Behavior Abstracts: LLBA  
comparison; 9411137 Czech podle ‘by, according to’ semantics/pragmatics, Russian equiva- lents; 9411212 Czech p¥ece ‘yet, all the same’/vidyt’ ‘indeed, really’, communicative function, Russian equivalents  ...  subject index Russia language discourse, Russian conception/changes, collective memory erasure; 9411771 Russian as a second language learning, beginner's trip to Russia ac- count; 9409797 Russian alternative  ... 

Indexing and searching strategies for the Russian language

Ljiljana Dolamic, Jacques Savoy
2009 Journal of the American Society for Information Science and Technology  
This paper describes and evaluates various stemming and indexing strategies for the Russian language.  ...  We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram  ...  Acknowledgment This research was supported in part by the Swiss NSF under Grant #200021-113273.  ... 
doi:10.1002/asi.21191 fatcat:x2626h3ecfdudhnhitqbe5tlhq
« Previous Showing results 1 — 15 out of 7,562 results