Filters








160 Hits in 7.0 sec

Entropy of search logs

Qiaozhu Mei, Kenneth Church
2008 Proceedings of the international conference on Web search and web data mining - WSDM '08  
Entropy is a powerful tool for sizing challenges and opportunities. How hard is search? How hard are query suggestion mechanisms like auto-complete? How much does personalization help?  ...  All these difficult questions can be answered by estimation of entropy from search logs. What is the potential opportunity for personalization?  ...  We thank Mike Schultz for his help on preparing the search log data.  ... 
doi:10.1145/1341531.1341540 dblp:conf/wsdm/MeiC08 fatcat:cq5djrw3s5bkvara6nobi2vfim

Personalizing Search on Shared Devices

Ryen W. White, Ahmed Hassan Awadallah
2015 Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15  
We utilize a large Web search log dataset containing both person identifiers and machine identifiers to quantify the gain in personalization performance from ABP, identify the circumstances under which  ...  ABP allows search providers to personalize experiences for individuals rather than targeting all users of a device collectively.  ...  The use of these logs is an important distinction from many log-based studies reported in the research literature, which rely on proprietary search logs, analyzed only by employees of commercial search  ... 
doi:10.1145/2766462.2767736 dblp:conf/sigir/WhiteA15 fatcat:rae23ats2fbmzmvmgfzx3p5sgm

Precomputing search features for fast and accurate query classification

Venkatesh Ganti, Arnd Christian König, Xiao Li
2010 Proceedings of the third ACM international conference on Web search and data mining - WSDM '10  
At the same time, the vocabulary used in search queries is vast: thus, classifiers based on word-occurrence have to deal with a very sparse feature space, and often require large amounts of training data  ...  Query intent classification is crucial for web search and advertising.  ...  The issue of feature sparseness is problematic as classifiers relying on these features may require very large amounts Permission to make digital or hard copies of all or part of this work for personal  ... 
doi:10.1145/1718487.1718496 dblp:conf/wsdm/GantiKL10 fatcat:xtzvliadtnb5jpx6erjphqujqu

Syntactic complexity of Web search queries through the lenses of language models, networks and users

Rishiraj Saha Roy, Smith Agarwal, Niloy Ganguly, Monojit Choudhury
2016 Information Processing & Management  
Across the world, millions of users interact with search engines every day to satisfy their information needs.  ...  The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL.  ...  For n-grams and n-terms where n > 1, the weighted average of entropies over all (n − 1)-grams is considered as the entropy of the model.  ... 
doi:10.1016/j.ipm.2016.04.002 fatcat:eqskitwxkjgxjkcewatoxuzb4i

Searching spontaneous conversational speech

Franciska de Jong, Douglas W. Oard, Roeland Ordelman, Stephan Raaijmakers
2007 SIGIR Forum  
. • The redundancy present in human language meant that search effectiveness held up well over a reasonable range of transcription accuracy. • Sufficiently accurate Large-Vocabulary Continuous Speech Recognition  ...  (LVCSR) systems could be built for the planned speech of news announcers.  ...  ACKNOWLEDGEMENTS This paper is dedicated to the memory of my doctoral supervisor Karen Spärck Jones for her generosity and indefatigable support of her students.  ... 
doi:10.1145/1328964.1328982 fatcat:wwpzqq7ndrfedh4imhoznvccl4

Natural Language Understanding with Distributed Representation [article]

Kyunghyun Cho
2015 arXiv   pre-print
In order to make it as self-contained as possible, I spend much time on describing basics of machine learning and neural networks, only after which how they are used for natural languages is introduced  ...  On the language front, I almost solely focus on language modelling and machine translation, two of which I personally find most fascinating and most fundamental to natural language understanding.  ...  You can continue with other languages by googling very hard, but eventually you run into a hard wall. This hard wall is not only the lack of any resource, but also lack of enough resource.  ... 
arXiv:1511.07916v1 fatcat:jfa2ab5byfavheibaxfohaf2ui

Effects of Language Modeling and its Personalization on Touchscreen Typing Performance

Andrew Fowler, Kurt Partridge, Ciprian Chelba, Xiaojun Bi, Tom Ouyang, Shumin Zhai
2015 Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI '15  
One reason is that typing accuracy is difficult to measure empirically on a large scale.  ...  Using the Enron email corpus as a personalization test set, we show for the first time at this scale that a combined spatial/language model reduces word error rate from a pre-model baseline of 38.4% down  ...  ACKNOWLEDGMENTS This paper is based on work supported by the National of Institutes of Health under grant R01 DC009834.  ... 
doi:10.1145/2702123.2702503 dblp:conf/chi/FowlerPCBOZ15 fatcat:vbgvnllw6rehtlgttd5hv6wn7q

Scalable Text Mining with Sparse Generative Models [article]

Antti Puurula
2016 arXiv   pre-print
This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines.  ...  The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers.  ...  Acknowledgements We'd like to thank Kaggle and the LSHTC organizers for their work in making the competition a success, and the machine learning group at the University of Waikato for the computers we  ... 
arXiv:1602.02332v1 fatcat:2urzib3btveslj5ggie55irxwq

Syntactic language modeling with formal grammars

Tobias Kaufmann, Beat Pfister
2012 Speech Communication  
We provide an extensive discussion on various aspects of the approach, including the contribution of different kinds of information, the development of a precise formal grammar and the acquisition of lexical  ...  In this paper, we investigate the use of a formal grammar as a source of syntactic information.  ...  We cordially thank Jean-Luc Gauvain for providing us with word lattices from the LIMSI German broadcast news transcription system.  ... 
doi:10.1016/j.specom.2012.01.001 fatcat:is4hicwi6nexveiyxhawu2z52y

Detection and prevention of MAC layer misbehavior in ad hoc networks

Alvaro A. C�rdenas, Svetlana Radosavac, John S. Baras
2004 Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks - SASN '04  
Then we discuss detection algorithms to deal with the problem of colluding selfish nodes.  ...  We first propose an algorithm to ensure honest backoffs when at least one, either the receiver or the sender is honest.  ...  This material is based upon work supported by the U.S. Army Research Office under Award No. DAAD19-01-1-0494 to the University of Maryland College Park.  ... 
doi:10.1145/1029102.1029107 dblp:conf/sasn/CardenasRB04 fatcat:vjsvrh2muvgsbjp7ei2yufhnsa

String Transduction with Target Language Models and Insertion Handling [article]

Garrett Nicolai, Saeed Najafi, Grzegorz Kondrak
2018 arXiv   pre-print
We show that leveraging target language models derived from unannotated target corpora, combined with a precise alignment of the training data, yields state-of-the art results on cognate projection, inflection  ...  Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language.  ...  We thank the members of the University of Alberta teams who collaborated with us in the context of the 2018 shared tasks on transliteration and morphological reinflection: Bradley Hauer, Rashed Rubby Riyadh  ... 
arXiv:1809.07182v1 fatcat:2rngjmeuhjhcra5rkyilhn42vm

End-to-end contextual speech recognition using class language models and a token passing decoder [article]

Zhehuai Chen, Mahaveer Jain, Yongqiang Wang, Michael L. Seltzer, Christian Fuegen
2018 arXiv   pre-print
To enable this approach to scale to a large number of class members and minimize search errors, we propose a token passing decoder with efficient token recombination for E2E systems for the first time.  ...  Although it simplifies training and decoding pipelines, the unified model is hard to adapt when mismatch exists between training and test data.  ...  A typical example of contextual information is the personal information such as a user's contacts.  ... 
arXiv:1812.02142v1 fatcat:hs6wg67yk5hvxmu42rk5tqrkfu

A maximum entropy approach to adaptive statistical language modelling

Ronald Rosenfeld
1996 Computer Speech and Language  
The function with the highest entropy within that set is the ME solution.  ...  The intersection of these constraints is the set of probability functions which are consistent with all the information sources.  ...  Without this "source", entropy is log V, where V is the vocabulary size.  ... 
doi:10.1006/csla.1996.0011 fatcat:p2aueevyjvf3tjttw5gfxywzgi

Probabilistic Distributional Semantics with Latent Variable Models

Diarmuid Ó Séaghdha, Anna Korhonen
2014 Computational Linguistics  
We consider LDA and a number of extensions to the model and evaluate them on a variety of semantic prediction tasks, demonstrating that our approach attains state-of-the-art performance.  ...  We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using the acquired representations to model the effects of context on word meaning.  ...  information and advice; and to the anonymous Computational Linguistics reviewers, whose suggestions have substantially improved the quality of this article.  ... 
doi:10.1162/coli_a_00194 fatcat:nk735ka77vgqte4oxhkudwcmv4

Eddi

Michael S. Bernstein, Bongwon Suh, Lichan Hong, Jilin Chen, Sanjay Kairam, Ed H. Chi
2010 Proceedings of the 23nd annual ACM symposium on User interface software and technology - UIST '10  
An algorithm evaluation reveals that search engine callouts outperform other approaches when they employ simple syntactic transformation and backoff strategies.  ...  Eddi is a topic-oriented browsing interface for Twitter. Clockwise from upper right is the tag cloud, timeline (hidden in another tab), the topic dashboard, and the navigational list.  ...  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage  ... 
doi:10.1145/1866029.1866077 dblp:conf/uist/BernsteinSHCKC10 fatcat:67cqvraphrbxlpibixwmbijqqi
« Previous Showing results 1 — 15 out of 160 results