Filters








167,258 Hits in 7.2 sec

Ranking large document collections by a state space search

Michael D. Gordon
1991 Information Processing & Management  
An algorithm is described for ordering by probability of relevance overlapping document subsets from which a searcher should choose the next document.  ...  INTRODUCTION A collection of documents described by index terms defines a search space of document subsets.  ...  SEARCHING A DOCUMENT STATE SPACE With this terminology in hand, we describe an algorithm for searching a document retrieval state space. The algorithm seeks multiple goal nodes.  ... 
doi:10.1016/0306-4573(91)90029-l fatcat:45gfx42ksjgwnkst65rzlgw7ki

Instability of Relevance-Ranked Results Using Latent Semantic Indexing for Web Search

Houssain Kettani, Gregory B. Newby
2010 2010 43rd Hawaii International Conference on System Sciences  
a fuzzy match of a topic to the original term by document matrix.  ...  This methodology is used to rank text documents, such as Web pages or abstracts, based on their relevance to a topic.  ...  Introduction Searching large collections of documents is challenging for computational, practical and humanistic reasons.  ... 
doi:10.1109/hicss.2010.235 dblp:conf/hicss/KettaniN10 fatcat:2xtfodoxgvffdgf4tsnrgfndgy

Improving News Ranking by Community Tweets [article]

Xin Shuai and Xiaozhong Liu and Johan Bollen
2012 arXiv   pre-print
Here, we propose a Community Tweets Voting Model (CTVM) to re-rank Google and Yahoo news search results on the basis of open, large-scale Twitter community data.  ...  Users frequently express their information needs by means of short and general queries that are difficult for ranking algorithms to interpret correctly.  ...  Content-based methods rank documents according to how their content matches a given search query, and may rely on vector space models [17] and language models [13] .  ... 
arXiv:1202.3185v2 fatcat:bmj5c3n6srgpdivenovv2rptbi

Improving news ranking by community tweets

Xin Shuai, Xiaozhong Liu, Johan Bollen
2012 Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion  
Here, we propose a Community Tweets Voting Model (CTVM) to re-rank Google and Yahoo news search results on the basis of open, large-scale Twitter community data.  ...  Users frequently express their information needs by means of short and general queries that are difficult for ranking algorithms to interpret correctly.  ...  Content-based methods rank documents according to how their content matches a given search query, and may rely on vector space models [17] and language models [13] .  ... 
doi:10.1145/2187980.2188265 dblp:conf/www/ShuaiLB12 fatcat:tzj4vnasovcn3lzdgmenfmwmv4

Text-Based Content Search and Retrieval in Ad-hoc P2P Communities [chapter]

Francisco Matias Cuenca-Acuna, Thu D. Nguyen
2002 Lecture Notes in Computer Science  
Our algorithm is based on a state-of-the-art text-based document ranking algorithm: the vector-space model, instantiated with the TFxIDF ranking rule.  ...  Furthermore, our algorithm preserves the main flavor of TFxIDF by retrieving close to the same set of documents for any given query.  ...  We chose to adapt a well-known state-of-the-art text-based document ranking algorithm, the vector-space model, instantiated with the TFxIDF ranking rule.  ... 
doi:10.1007/3-540-45745-3_20 fatcat:ts755sytqratfbhhvjc4cgpdvq

Word embedding for French natural language in healthcare: a comparative study (Preprint)

Emeric Dynomant, Romain Lelong, Badisse Dahamna, Clément Massonaud, Gaétan Kerdelhué, Julien Grosjean, Stéphane Canu, Stefan J Darmoni
2018 JMIR Medical Informatics  
The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context.  ...  These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions).  ...  They also assume that almost all documents in a collection are non-relevant to a query (which is very close to truth given that collections are large) and estimate Õ by Ò AE , where AE is the collection  ... 
doi:10.2196/12310 pmid:31359873 pmcid:PMC6690161 fatcat:jntc3ylh3rdwzkzx4ivhernsha

C-DLSI: An Extended LSI Tailored for Federated Text Retrieval [article]

Qijun Zhu, Dandan Li, Dik Lun Lee
2018 arXiv   pre-print
Most of the existing methods only apply traditional IR techniques that treat each text collection as a single large document and utilize term matching to rank the collections.  ...  complete search engine by itself.  ...  LSI decompose A to a lower dimensional vector space k by retaining only the largest k singular values, where 1 ≤ k < rank(A).  ... 
arXiv:1810.02579v1 fatcat:4jbiglx5yrhcdo2bydt37m6adm

The History of Information Retrieval Research

M. Sanderson, W. B. Croft
2012 Proceedings of the IEEE  
computers to search for items that are relevant to a user's query.  ...  This paper describes a brief history of the research and development of information retrieval systems starting with the creation of electro-mechanical searching devices, through to the early adoption of  ...  Nowadays, the ranking formulas proposed by Salton are rarely used, however, viewing documents and queries as vectors in a large dimensional space is still common.  ... 
doi:10.1109/jproc.2012.2189916 fatcat:edydiz3g6fbjvgldote2k7u56y

To index or not to index

Diego Arroyuelo, Senén González, Mauricio Marin, Mauricio Oyarzún, Torsten Suel
2012 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12  
Positional ranking functions, widely used in web search engines, improve result quality by exploiting the positions of the query terms within documents.  ...  However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index.  ...  Since the query results are usually large, the result set must be ranked by relevance. For large document collections, the data stored in inverted indexes requires considerable amounts of space.  ... 
doi:10.1145/2348283.2348320 dblp:conf/sigir/ArroyueloGMOS12 fatcat:zhlsxinz35g4xo6avn2wdxrrby

An Unsupervised Technical Readability Ranking Model by Building a Conceptual Terrain in LSI

Shoaib Jameel, Xiaojun Qian
2012 2012 Eighth International Conference on Semantics, Knowledge and Grids  
Our model has achieved significant improvement in ranking documents by technical readability.  ...  We address this problem in domain-specific search using a conceptual model where the sequence of the terms in a document is modeled as a connected conceptual terrain.  ...  If a sequence terrain is comprised of series of domain-specific terms which are separated by large semantic distance, then a typical reader will probably leave the document and search for a more technically  ... 
doi:10.1109/skg.2012.20 dblp:conf/skg/JameelQ12 fatcat:vvkqihxx4vfkrept4nyeacr7dm

Framework for Document Retrieval using Latent Semantic Indexing

Neelam Phadnis, Jayant Gadge
2014 International Journal of Computer Applications  
The state of the art for traditional IR techniques is to find relevant documents depending on matching words in users' query with individual words in text collections.  ...  With the availability of large scale inexpensive storage the amount of information stored by organizations will increase.  ...  The simplest form of document retrieval is linear scan through documents. But it is not efficient when we need to search large document collections quickly.  ... 
doi:10.5120/16414-6065 fatcat:n4zx75guezf5ngzsful5t2j7l4

Document Ranking using Customizes Vector Method

Priyanka Mesariya, Nidhi Madia
2017 International Journal of Trend in Scientific Research and Development  
Archive positioning is fundamentally looking the pertinent record as per their rank. Document ranking is basically search the relevant document according to their rank.  ...  Vector space model is traditional and widely applied information retrieval models to rank the web page based on similarity values.  ...  Majority of internet users rely on search engines for extracting information by providing a query from any large dataset.  ... 
doi:10.31142/ijtsrd125 fatcat:wybbvrh3kff3jc7mwl7lvidyra

Dual-Sorted Inverted Lists in Practice [chapter]

Roberto Konow, Gonzalo Navarro
2012 Lecture Notes in Computer Science  
of the art for conjunctive queries, while it offers an attractive space/time tradeoff when both kinds of queries are of interest.  ...  We implement a recent theoretical proposal to represent inverted lists in memory, in a way that docid-sorted and weight-sorted lists are simultaneously represented in a single wavelet tree data structure  ...  Given a text collection containing a set of D documents, where each document has a unique document identifier (docid), an inverted index is an array of lists or postings.  ... 
doi:10.1007/978-3-642-34109-0_31 fatcat:whwhixoleveb5nvaz4tylf6uvi

Formal definitions of web information search

Su Yan, C. Lee Giles, Bernard J. Jansen
2007 Proceedings of the American Society for Information Science and Technology  
Grounded on previous work, we then propose a new Web information retrieval model based on both objective and subjective criteria.  ...  Appropriate Web models and theories for search engines will make web search and information retrieval problems easier to formulate and comprehend.  ...  It means that the retrieval set is a collection for the pairs of a document and its rank value.  ... 
doi:10.1002/meet.1450430152 fatcat:rw75gitribaoxjnimbchshrwxe

A Review On Important Aspects Of Information Retrieval

Yogesh Gupta, Ashish Saini, A.K. Saxena
2014 Zenodo  
such as document representation, similarity measure and query expansion.  ...  This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects  ...  This ranked retrieval approach to search was taken up by IR researchers, who over the following decades refined and revised the means by which documents were sorted in relation to a query.  ... 
doi:10.5281/zenodo.1336508 fatcat:gmyzki3nwng7bfmzs7pzvp4fse
« Previous Showing results 1 — 15 out of 167,258 results