Filters








187 Hits in 5.7 sec

Engineering and utilizing a stopword list in Greek Web retrieval

Fotis Lazarinis
2007 Journal of the American Society for Information Science and Technology  
The article presents the phases of engineering a stopword list for the Greek language as well as the problems faced and the inferences deduced from this procedure.  ...  A set of 32 authentic queries are proposed by users and are run in Google with and without the stopwords.  ...  Two further studies discussed the effect of stopword elimination in Greek Web retrieval and in utilizing search engines of e-shops (Lazarinis, 2005 (Lazarinis, , 2007a .  ... 
doi:10.1002/asi.20648 fatcat:7tydss34sjgmtoxfy5rrliwa5i

Current research issues and trends in non-English Web searching

Fotis Lazarinis, Jesús Vilares, John Tait, Efthimis N. Efthimiadis
2009 Information retrieval (Boston)  
A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers.  ...  With increasingly higher numbers of nonEnglish language web searchers the problems of efficient handling of nonEnglish Web documents and user queries are becoming major issues for search engines.  ...  The authors also acknowledge the assistance of Jennifer Rohan in compiling part of the bibliography and the University of Washington Information School for resources. Prof.  ... 
doi:10.1007/s10791-009-9093-0 fatcat:ueniqckvaffe5gftjwkkrlki7y

Bootstrapping the Albanian Information Retrieval

Nikitas N. Karanikolas
2009 2009 Fourth Balkan Conference in Informatics  
Keywords -information retrieval; stemming algorithm; stopword list I.  ...  As a consequence of our study (investigation) we provide a naive-single-step (rudimentary) stemming algorithm for the Albanian language. A stopword list is also created.  ...  Without their help, this work would still be in my mind.  ... 
doi:10.1109/bci.2009.16 dblp:conf/bci/Karanikolas09 fatcat:cdtpjs4enjeadjitf43sqi6bue

An Evaluation of Greek-English Cross Language Retrieval within the CLEF Ad-Hoc Bilingual Task

Polyxeni Katsiouli, Theodore Kalamboukis
2009 Conference and Labs of the Evaluation Forum  
In particular we use our disambiguation experiments with statistical query translation on a Greek-English cross language retrieval system using Google's n-grams.  ...  This article describes an experimental investigation on the use of resources from the web on a common Natural Language Problem (NLP) problem that of Word Sense Disambiguation (WSD).  ...  It contains two subsystems: a multilingual subsystem, for retrieving bilingual documents (a collection of scientific articles in medicine available in the Greek web) and a cross language subsystem, which  ... 
dblp:conf/clef/KatsiouliK09 fatcat:y5xak4w7avfg5eu25groenarxe

UGLEO: A WEB BASED INTELLIGENCE CHATBOT FOR STUDENT ADMISSION PORTAL USING MEGAHAL STYLE

Anneke Annassia Putri Siswadi, Avinanta Tarigan
2018 Jurnal Ilmiah Informatika Komputer  
It needs a service that can serve them anytime and anywhere.  ...  Therefore, this research is developing the UGLeo as a web based QA intelligence chatbot application for Gunadarma University's student admission portal.  ...  Stopwords used in this research are the stopwords written in [17] , lists of determiners, conjunctions, prepositions in Indonesian language, and the common words.  ... 
doi:10.35760/ik.2018.v23i3.2373 fatcat:ducpro3nbngyzfpo7xot243jgm

An Efficient Mechanism for Stemming and Tagging: The Case of Greek Language [chapter]

Giorgos Adam, Konstantinos Asimakis, Christos Bouras, Vassilis Poulopoulos
2010 Lecture Notes in Computer Science  
Our system is constructed in such a way that can be easily adapted to any existing system and support it with recognition and analysis of Greek words.  ...  In an era that, searching the WWW for information becomes a tedious task, it is obvious that mainly search engines and other data mining mechanisms need to be enhanced with characteristics such as NLP  ...  A Greek search engine that was constructed recently does utilize a stemmer; though, no information is publicly available except for a reference to the complete work that was done for the specific search  ... 
doi:10.1007/978-3-642-15393-8_44 fatcat:hfighxemlfasln6bg66yrfq76a

Boosting Venue Page Rankings for Contextual Retrieval-Georgetown at TREC 2013 Contextual Suggestion Track

Jiyun Luo, Hui Yang
2013 Text Retrieval Conference  
Since the Open Web is not used in our submissions, the task is essentially a retrieval task instead of a result merging task.  ...  Ideal relevant documents for this task should be a list of Web pages each of which is a venue's homepage, which we call a "venue page".  ...  It was essentially a result merging task that utilizes a personalized learning to rank framework to merge top returned results from existing commercial search engines such as Google Place and Yelp.  ... 
dblp:conf/trec/LuoY13 fatcat:xnw5y7oiufc7lm7isjt3n2t3ay

Melange: Components for Cross-Lingual Retrieval

Max Pfingsthorn, Koen E. A. van de Sande, Vladimir Nedovic
2005 Conference and Labs of the Evaluation Forum  
We present the finalized version of our cross-lingual search engine Melange, and results obtained by running it on WebCLEF topics in an attempt to solve Mixed Monolingual and Multilingual tasks.  ...  These are our data extraction and indexing methods, our language detection module (with an accuracy of 88% on WebCLEF query strings), PageRank ranking scheme and query translation.  ...  We would like to thank our instructors, Maarten de Rijke and Gilad Mishne, for their informationrich course on search engines. Our workload had never been that high before.  ... 
dblp:conf/clef/PfingsthornSN05 fatcat:4kmwaxuivna37dnkmnhza2w54i

@Note: A workbench for Biomedical Text Mining

Anália Lourenço, Rafael Carreira, Sónia Carneiro, Paulo Maia, Daniel Glez-Peña, Florentino Fdez-Riverola, Eugénio C. Ferreira, Isabel Rocha, Miguel Rocha
2009 Journal of Biomedical Informatics  
conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting  ...  Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text  ...  Acknowledgments We thank Alberto Simões and José João Almeida for helping deploy the text rewriting system and their expert suggestions in Natural Language Processing issues.  ... 
doi:10.1016/j.jbi.2009.04.002 pmid:19393341 fatcat:zqgrd47mg5dovjnzd2fjhvsoda

Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC 2004

Stephen Blott, Fabrice Camous, Paul Ferguson, Georgina Gaughan, Cathal Gurrin, Gareth J. F. Jones, Noel Murphy, Noel E. O'Connor, Alan F. Smeaton, Peter Wilkins, Oisín Boydell, Barry Smyth
2004 Text Retrieval Conference  
In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies  ...  In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty.  ...  Part of this material is based on work supported by Science Foundation Ireland under Grant Nos. 03/IN.3/I361.  ... 
dblp:conf/trec/BlottCFGGJMOSWBS04 fatcat:jotvkjbkc5cihjhh3d4rmzpzbq

Software tools and test data for research and testing of page-reading OCR systems

Thomas A. Nartker, Stephen V. Rice, Steven E. Lumos, Elisa H. Barney Smith, Kazem Taghva
2005 Document Recognition and Retrieval XII  
These performance comparisons were published in previous ISRI Test Reports and are also provided.  ...  The paper concludes with a summary of the programs, test data, and documentation that is available and gives the URL where they can be located.  ...  This program accepts an ordered list of N stopwords in stopwordfile and a wordacc_report and produces the non-stopword accuracy of this report for each subset of stopwords beginning with 0 through all  ... 
doi:10.1117/12.587293 dblp:conf/drr/NartkerRL05 fatcat:il4ehpf7sjdlvnebzttekknf6e

Plagiarism Detection for Indonesian Texts

Lucia D. Krisnawati, Klaus U. Schulz
2013 Proceedings of International Conference on Information Integration and Web-based Applications & Services - IIWAS '13  
I thank my sisters in women small group, MICC, for their prayers, spiritual support and simply being there in time of troubles.  ...  This research was conducted with the support of a DAAD-Indonesian German Scholarship Program (DAAD-IGSP).  ...  Given a document, stopword list and a segment length, all stopwords defined in the stopword set are extracted.  ... 
doi:10.1145/2539150.2539213 dblp:conf/iiwas/KrisnawatiS13 fatcat:r6p2h4oiq5fi3mhlazokatknrq

GeoMantis: Inferring the Geographic Focus of Text using Knowledge Bases

Christos T. Rodosthenous, Loizos Michael
2018 Proceedings of the 10th International Conference on Agents and Artificial Intelligence  
We consider the problem of identifying the geographic focus of a document.  ...  Our results give evidence that using general-purpose knowledge bases and ontologies can, in certain cases, outperform even specialized tools.  ...  THE GeoMantis SYSTEM GeoMantis (from the Greek words Geo that means earth and Mantis, that means oracle or guesser) is a web application designed for inferring the geographic focus of documents and web  ... 
doi:10.5220/0006588501110121 dblp:conf/icaart/RodosthenousM18 fatcat:k4ymqie3unh7tgck2b3qctmi3u

Finding translations in scanned book collections

Ismet Zeki Yalniz, R. Manmatha
2012 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12  
show that the proposed method retrieves translations of books with an average MAP score of 1.0 and a speed of 10K book pair comparisons per second on a single core.  ...  Both sequences are now in German and can, therefore, be aligned using a Longest Common Subsequence (LCS) algorithm.  ...  Stopwords for each language (English, French, German, Greek, Italian, Latin and Spanish) are learned from 20 noise free ebooks downloaded from the Gutenberg archive.  ... 
doi:10.1145/2348283.2348347 dblp:conf/sigir/YalnizM12 fatcat:cre3szxurzbllkxecqxhu4bpki

An exploration of the principles underlying redundancy-based factoid question answering

Jimmy Lin
2007 ACM Transactions on Information Systems  
Through contrastive and ablation experiments with Aranea, a system that has performed well in several TREC QA evaluations, this work examines the underlying assumptions and principles behind redundancy-based  ...  redundancy-based methods encode a substantial amount of knowledge in the form of heuristics.  ...  Web search engines frequently return tens if not hundreds of thousands of "potentially relevant" pages in response to a query, leaving users to manually sift through long hit lists.  ... 
doi:10.1145/1229179.1229180 fatcat:l2pwnam7qvh6xbp6a3krpoidpq
« Previous Showing results 1 — 15 out of 187 results