3,709 Hits in 6.5 sec

An Architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts

Alan F. Smeaton, Mark Burnett, Francis Crimmins, Gerard Quinn
1998 unpublished
An implementation of the clustering on an archive of the Irish Times newspaper is reported here.  ...  In this paper we describe a technique for clustering a collection of documents such as a collection of online newspapers which uses a number of short-cuts to make the process computable for large collections  ...  and by Dublin City University, and the provision of data by the Irish Times newspaper.  ... 
doi:10.14236/ewic/irsg1998.10 fatcat:bni5cgrf2bhyjj5sj4c2od3r6i

Improving Weak Queries using Local Cluster Analysis as a Preliminary Framework

Amir H. Jadidinejad, Hossein Sadr
2015 Indian Journal of Science and Technology  
Consequently, various experiments are conducted to evaluate the impact of the proposed architecture and different clustering variants in large Persian text collection created based on TREC specifications  ...  In a web retrieval task, the query is usually short and the users expect to find the relevant documents in the first several result pages.  ...  Document clustering can be performed on the collection as a whole (static clustering) 1,10,26,32,37 , but post-retrieval document clustering (dynamic clustering) has shown that can produce superior results  ... 
doi:10.17485/ijst/2015/v8i15/46754 fatcat:u2qpykba5bgcfo3pxip2ljd27u

The Multi-model DBMS Architecture and XML Information Retrieval [chapter]

Arjen P. de Vries, Johan A. List, Henk Ernst Blok
2003 Lecture Notes in Computer Science  
Consider for example an XML collection representing a newspaper archive, and the information need 'recent English newspaper articles about Willem-Alexander dating Maxima'. 3 This can be expressed as the  ...  goal to identify documents relevant for satisfying a user's information need) while data retrieval involves exact match, that is, checking a data collection for presence or absence of (precisely specified  ...  An instantiation of the retrieval model requires the collection of component text and computation of term statistics, as well as calculating a score for the component under consideration.  ... 
doi:10.1007/978-3-540-45194-5_12 fatcat:qtjesgap3zh3dbivuzrzbgn4ee


Dongdong Shan, Wayne Xin Zhao, Rishan Chen, Baihan Shu, Ziqi Wang, Junjie Yao, Hongfei Yan, Xiaoming Li
2012 Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12  
We present EventSearch, a system for event extraction and retrieval on four types of news-related historical data, i.e., Web news articles, newspapers, TV news program, and micro-blog short messages.  ...  The newspaper and TV news video clips also span from 2001 to 2011. The system, upon a user query, returns a list of event snippets from multiple data sources.  ...  ACKNOWLEDEMENTS This work is supported by NSFC Grant (60933004, 70903008, 61073082) and HGJ Grant 2011ZX01042-001-001.  ... 
doi:10.1145/2339530.2339781 dblp:conf/kdd/ShanZCSWYYL12 fatcat:ad6m5tjqsjgw5g5qv2ecmlqalu

Intelligent Extended Clustering Genetic Algorithm for Information Retrieval Using BPEL

N. El-Bathy, C. Gloster, I. Kateeb, G. Stein
2012 American Journal of Intelligent Systems  
It improves the efficiency and performance for retrieving a proper information results that satisfy user's needs.  ...  Finally, IECGA for data clustering gives the user needed documents based on similarity between query matching and relevant document mechanism.  ...  The future work will focus on creating such a solution in the form of an intelligent information retrieval lifecycle architecture based adapted DNA algorithm using Service-Oriented Architecture (SOA).  ... 
doi:10.5923/j.ajis.20110101.02 fatcat:6s64jq4vgbgchgqc7xntsdckgm

Managing complexity in a distributed digital library

I.H. Witten, R.J. McNab, S. Jones, M. Apperley, D. Bainbridge, S.J. Cunningham
1999 Computer  
Acknowledgments We gratefully acknowledge the help of Carl Gutwin, who developed one of the experimental user interfaces.  ...  Many thanks are also due to Stefan Boddie, Te Taka Keegan, Craig Nevill-Manning, and Lloyd Smith, who contributed to this work in various ways.  ...  Presently the system includes two search and retrieval methods-one for text and the other for music. Our uniform architecture accommodates them and their vastly different document types.  ... 
doi:10.1109/2.745723 fatcat:oadbs4e63zd3jbvm4u76xbeo6m

The New Zealand Digital Library

2008 ChoiceReviews  
Acknowledgments We gratefully acknowledge the help of Carl Gutwin, who developed one of the experimental user interfaces.  ...  Many thanks are also due to Stefan Boddie, Te Taka Keegan, Craig Nevill-Manning, and Lloyd Smith, who contributed to this work in various ways.  ...  Presently the system includes two search and retrieval methods-one for text and the other for music. Our uniform architecture accommodates them and their vastly different document types.  ... 
doi:10.5860/choice.46-0007 fatcat:ewatsd2jsng5le3nopvu5ybrvq

Intelligent surveillance lifecycle architecture for epidemiological data clustering using Twitter and novel genetic algorithm

Naser El-Bathy, Clay Gloster, Ghassan Azar, Mohammed El-Bathy, Gordon Stein, Ricky Stevenson
2014 IEEE International Conference on Electro/Information Technology  
The four (4) major co mponents of the system are: a search engine system, an informat ion ext raction sub-system, an informat ion retrieval sub-system, and genetic algorith m for data clustering.  ...  Th is algorith m eliminates irrelevant information and redundancy for data clustering as it improves search performance.  ...  An Intelligent Internet Search Technology (I 2 ST) using our modified genetic algorith m and service-oriented architecture (SOA) improves the efficiency of retriev ing and clustering data.  ... 
doi:10.1109/eit.2014.6871753 dblp:conf/eit/El-BathyGAESS14 fatcat:tkmj64k4ebhapmay5zopyhhore

Clustered Distributed Index for Efficient Text Retrieval Using Threads

M Basavaraju, R Prabhakar
2010 International Journal of Grid Computing & Applications  
In this research paper, a novel method of improving the clustered distributed indices for efficient text retrieval using threads is presented.  ...  The indexing stage scans for text of all the documents and builds a list of search terms, often called an index.  ...  Various access methods have been developed to support efficient search and retrieval over text document collections.  ... 
doi:10.5121/ijgca.2010.1201 fatcat:tx5tounsuzdh7eax6fmu42bnui

Search engines and Web dynamics

Knut Magne Risvik, Rolf Michelsen
2002 Computer Networks  
Furthermore, we use the FAST Search Engine architecture as a case study for showing some possible solutions for web dynamics and search engines.  ...  The service is running live at and major portals worldwide with more than 30 million queries a day, about 700 million full-text documents, a crawl base of 1.8 billion documents, updated  ...  Document content represents the bulk of the data, and we have optimized our storage system for efficiently writing or updating individual documents and for streaming the entire document collection to the  ... 
doi:10.1016/s1389-1286(02)00213-x fatcat:qbopdtdrqndppihti5kutnszvy

Language Resources for Historical Newspapers: the Impresso Collection

Maud Ehrmann, Matteo Romanello, Simon Clematide, Philipp Ströbel, Raphaël Barman
2020 Zenodo  
In this context, this paper presents a collection of historical newspaper data sets composed of text and image resources, curated and published within the context of the 'impresso - Media Monitoring of  ...  Yet, the application of text processing tools on historical documents in general, and historical newspapers in particular, poses new challenges, and crucially requires appropriate language resources.  ...  Authors also gratefully acknowledge the financial support of the Swiss National Science Foundation (SNSF) for the project impresso -Media Monitoring of the Past under grant number CR-SII5 173719.  ... 
doi:10.5281/zenodo.4641901 fatcat:glusmzr2nfg3zbc7lxe2mkvxzq

Research Domain Selection using Naive Bayes Classification

Selvani Deepthi Kavila, Radhika Y
2016 International Journal of Mathematical Sciences and Computing  
Research Domain Selection plays an important role for researchers to identify a particular document based on their discipline or research areas.  ...  Primary area and Sub area of the documents are identified by applying pre-processing and text classification techniques. Naive Bayes classifier is used to find the probability of various areas.  ...  An effective pre-processor represents the document efficiently in terms of both space (for storing the document) and time (for processing retrieval requests) requirements and maintain good retrieval performance  ... 
doi:10.5815/ijmsc.2016.02.02 fatcat:oshrhsqmdbgkzc4l7phxdhas2i

Guest Editorial: Automated Big Data Analysis for Social Multimedia Network Environments

Changhoon Lee
2016 Multimedia tools and applications  
Most previous works have focused on analyzing long documents such as blogs and hence they are not effective for very short and possibly informal SNS documents.  ...  The paper entitled BMulti-Scale Local Structure Patterns Histogram for Describing Visual Contents in Social Image Retrieval Systems,^by Baik et al. [1] proposes a local descriptor for personalized social  ...  The last paper entitled BImage retrieval technique using the clustering based on rearranged radon transform,^by An et al.  ... 
doi:10.1007/s11042-016-3838-8 fatcat:ior46rwjhbhb3evvu6pygemgje


Benjamin E. Teitler, Michael D. Lieberman, Daniele Panozzo, Jagan Sankaranarayanan, Hanan Samet, Jon Sperling
2008 Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems - GIS '08  
A new system named NewsStand is presented that collects, analyzes, and displays news stories in a map interface, thus leveraging on their implicit geographic content.  ...  NewsStand monitors RSS feeds from thousands of online news sources and retrieves articles within minutes of publication.  ...  We use the vector space model [34] of documents, often used in text mining and information retrieval.  ... 
doi:10.1145/1463434.1463458 dblp:conf/gis/TeitlerLPSSS08 fatcat:6kaxi4dlqjbmvb7dlxeaxwhm44

Intelligent Search Lifecycle Architecture for Mass Media Using SOA

Naser El-Bathy, Clay Gloster, Ghassan Azar, Mohammed El-Bathy
2012 Architecture Research  
The phases also include the procedures needed for eliminating the gaps between the state of "to be" and the IT processes state, and the plan a pproval.  ...  In this study, the IT strategic plan follows a timetable via a road map and is exposed to alteration based on the business process changes .  ...  Information retrieval text-based processes An intelligent Web is a web mining that its essential benefit is satisfying the users' needs, ranking the resources based on the user's concerns, relating resources  ... 
doi:10.5923/j.arch.20120204.01 fatcat:xlnqzu5dfbe3zcpczmnruqe2wi
« Previous Showing results 1 — 15 out of 3,709 results