3,278 Hits in 7.2 sec

Page 58 of Library & Information Science Abstracts Vol. , Issue 10 [page]

1990 Library & Information Science Abstracts  
Compares the Two-Poisson (2P), Inverse Document Frequency (IDF) and Discrimination Value (DV) models of document representation.  ...  Zp/x — AUTOMATIC SUBJECT INDEXING Zs — Statistical techniques 90/6954 A comparison of Two-Poisson, Inverse Document Frequency, and Discrimination Value models of document representation.  ... 

Relevance information

Arjen P. de Vries, Thomas Roelleke
2005 Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05  
The main result is a formal framework uncovering the close relationship of a generalised idf and the BIR model.  ...  for the within-document term frequencies.  ...  This representation of the term weights of the BIR model based on a sum of idf -values gives a clear justification of the idf through the BIR model.  ... 
doi:10.1145/1076034.1076084 dblp:conf/sigir/VriesR05 fatcat:a6y63owo7rbdnctdxhx7dnysm4

Interpretable Low-Rank Document Representations with Label-Dependent Sparsity Patterns [article]

Ivan Ivek
2014 arXiv   pre-print
of discriminative nonsubtractive latent semantic components occuring in TF-IDF document representations.  ...  In context of document classification, where in a corpus of documents their label tags are readily known, an opportunity lies in utilizing label information to learn document representation spaces with  ...  Acknowledgments This work was supported by the Croatian Ministry of Science, Education and Sports through the project "Computational Intelligence Methods in Measurement Systems", No. 098-0982560-2565.  ... 
arXiv:1407.6872v1 fatcat:zohkkavfzjbpvctchblir5mpwy

Discriminative models for information retrieval

Ramesh Nallapati
2004 Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04  
We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model  ...  Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are the author's and do not necessarily reflect those of the sponsor.  ... 
doi:10.1145/1008992.1009006 dblp:conf/sigir/Nallapati04 fatcat:6x2uptmtefekdlw3metzf6nntq

A Novel Feature Selection Based Gravitation for Text Categorization

Jieming Yang, Zhiying Liu, Zhaoyang Qu
2016 International Journal of Database Theory and Application  
four well-known feature selection algorithms (information gain, document frequency, orthogonal centroid feature selection and Poisson distribution).  ...  with four feature selection algorithms (information gain, document frequency, the orthogonal centroid feature selection and Poisson distribution).The experiments show that GFS performs significantly better  ...  Acknowledgment This research is supported by the project development plan of science and technology of Jilin Province under Grant no. 20140204071GX.  ... 
doi:10.14257/ijdta.2016.9.3.21 fatcat:2iv7ffxv2rcjlnck26by7sydqm

Deep Poisson Factor Modeling

Ricardo Henao, Zhe Gan, James Lu, Lawrence Carin
2015 Neural Information Processing Systems  
The model is composed of a Poisson distribution to model observed vectors of counts, as well as a deep hierarchy of hidden binary units.  ...  We propose a new deep architecture for topic modeling, based on Poisson Factor Analysis (PFA) modules.  ...  Acknowledgements This research was supported in part by ARO, DARPA, DOE, NGA and ONR.  ... 
dblp:conf/nips/HenaoGLC15 fatcat:m2vd76h2o5fg7id72payokejou

Entropy-biased models for query representation on the click graph

Hongbo Deng, Irwin King, Michael R. Lyu
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click  ...  We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation.  ...  It is argued that the discriminative ability of a URL should be inversely proportional to the entropy, hence a (heavilyclicked) URL with a high query frequency is less discriminative overall.  ... 
doi:10.1145/1571941.1572001 dblp:conf/sigir/DengKL09 fatcat:6tyopnkjybderkdjatmvydcknq

Combining Speech Retrieval Results with Generalized Additive Models

J. Scott Olsson, Douglas W. Oard
2008 Annual Meeting of the Association for Computational Linguistics  
Combining retrieval results from systems built on various errorful representations of the same collection offers some potential to address these challenges.  ...  Rapid and inexpensive techniques for automatic transcription of speech have the potential to dramatically expand the types of content to which information retrieval techniques can be productively applied  ...  Critically, by plotting this proportion against the term's inverse document frequency, we observe that the most discriminative query terms are often not available in both document represen- Inverse Document  ... 
dblp:conf/acl/OlssonO08 fatcat:bokwnknruzaijfcalaieuql5iq

PubMed related articles: a probabilistic topic-based model for content similarity

Jimmy Lin, W. John Wilbur
2007 BMC Bioinformatics  
Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions.  ...  Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance-but rather our focus is "relatedness", the probability that a user would want to examine a particular document given  ...  Acknowledgements For this work, JL was funded in part by the National Library of Medicine, where he was a visiting research scientist during the summer of 2006.  ... 
doi:10.1186/1471-2105-8-423 pmid:17971238 pmcid:PMC2212667 fatcat:rnwzjiaofjdmniqqw2edjjrqfu

"Is this document relevant?…probably": a survey of probabilistic models in information retrieval

Fabio Crestani, Mounia Lalmas, Cornelis J. Van Rijsbergen, Iain Campbell
1998 ACM Computing Surveys  
The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described.  ...  The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented.  ...  ACKNOWLEDGMENTS We would like to thank the anonymous reviewers for their interesting and helpful comments.  ... 
doi:10.1145/299917.299920 fatcat:saq74jbtzzbgvorqsrq7tminjq

A comparative study of TF*IDF, LSI and multi-words for text classification

Wen Zhang, Taketoshi Yoshida, Xijin Tang
2011 Expert systems with applications  
Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TFÃIDF, LSI and multi-word for text representation.  ...  We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization.  ...  For instance, IDF (inverse document frequency) assumes that the importance of a term relative to a document is inversely proportional to the frequency of occurrence of this term in all the documents, while  ... 
doi:10.1016/j.eswa.2010.08.066 fatcat:oqggdpgkh5h7hbjl6cumcszyg4

Benchmark Performance of Machine And Deep Learning Based Methodologies for Urdu Text Document Classification [article]

Muhammad Nabeel Asim, Muhammad Usman Ghani, Muhammad Ali Ibrahim, Sheraz Ahmad, Waqar Mahmood, Andreas Dengel
2020 arXiv   pre-print
datasets CLE Urdu Digest 1000k, and CLE Urdu Digest 1Million with a significant margin of 32%, and 13% respectively.  ...  Fifth, it evaluates the integrity of a hybrid approach which combines traditional machine learning based feature engineering and deep learning based automated feature engineering.  ...  Two different versions of the model namely generative and discriminative LSTM model were proposed.  ... 
arXiv:2003.01345v1 fatcat:sxmaksohlvaodctpxeqjaenbvq

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Bo Tang, Steven Kay, Haibo He
2016 IEEE Transactions on Knowledge and Data Engineering  
of a Bayesian classifier.  ...  Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MD) and MD-χ^2 methods, for text categorization.  ...  ACKNOWLEDGMENT This research was partially supported by National Science Foundation (NSF) under grant ECCS 1053717 and CCF 1439011, and the Army Research Office under grant W911NF-12-1-0378.  ... 
doi:10.1109/tkde.2016.2563436 fatcat:v7a64udu3nf7hkwch5wql6ldfm

Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words

Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter, Enrico Scalas
2009 PLoS ONE  
The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency.  ...  of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling.  ...  This figure may be compared to the TF-IDF (term frequency-inverse document frequency) method used for keyword identification [14] , but it is computed from a single document (see also Refs.  ... 
doi:10.1371/journal.pone.0007678 pmid:19907645 pmcid:PMC2770836 fatcat:xivcnkjikbfh3ojvdqdgy7m5hm

An information-theoretic perspective of tf–idf measures

Akiko Aizawa
2003 Information Processing & Management  
The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency-inverse document frequency  ...  This paper presents a mathematical definition of the "probability-weighted amount of information" (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of  ...  The author is also grateful to Noriko Kando at NII for encouraging and assisting us in this study as one of the organizers of NTCIR.  ... 
doi:10.1016/s0306-4573(02)00021-3 fatcat:xvthuxtqn5fixcdx7i6coxyu5u
« Previous Showing results 1 — 15 out of 3,278 results