5 Hits in 6.7 sec

On Extending NLP Techniques from the Categorical to the Latent Space: KL Divergence, Zipf's Law, and Similarity Search [article]

Adam Hare, Yu Chen, Yinan Liu, Zhenming Liu, Christopher G. Brinton
2020 arXiv   pre-print
Next, we recast the heavy-tailed distribution known as Zipf's law that is frequently observed in the categorical space to the latent space.  ...  In this paper, we aim to modernize these older methods while retaining their advantages by extending approaches from categorical or bag-of-words representations to word embeddings representations in the  ...  Gutenberg is more of an outlier here in that in the categorical space it fits Zipf's law about as well as Reuters and RACE but in the latent space is closer to Gatsby and Brown.  ... 
arXiv:2012.01941v1 fatcat:qois3uoarre3tjtm4tsgbi6r7a

Introduction to information retrieval

2009 ChoiceReviews  
In the future, the convention for addresses (collectively known as the internet address space) is likely to use a new standard known as IPv6 (  ...  The Mercator crawler is due to Najork and Heydon (Najork and Heydon 2001; ; the treatment in this chapter follows their work.  ...  We first derive the term frequency of the ith term from Zipf's law.  ... 
doi:10.5860/choice.46-2715 fatcat:ruwoe46pgzcupjygnwbnit4z3u

Beyond Discourse: Computational Text Analysis and Material Historical Processes

Jose Tomas Atria
of an hypothetical semantic space.  ...  The co-occurrence matrices obtained from the POB corpus are used to demonstrate two different projections: semantic n [...]  ...  Measures based on the Kullback-Leibler divergence D KL (P ||Q) JSD (P ||Q) = D KL (P ||M ) 2 + D KL (Q||M ) 2 : M = (P + Q) 2 are undefined on vectors containing 0 valued entries (because of the product  ... 
doi:10.7916/d8n88tfg fatcat:ky5p7ujokjh4rakeb26nwu2zna

Modeling and analyzing bias in recommender systems from multi-views: context, topic and evaluation [article]

Jing Yuan, Technische Universität Berlin, Sahin Albayrak
In order to automatically generate such "guess what you like" results and serve matching recommendations, advanced machine learning and data mining techniques are applied in recommender systems.  ...  In this thesis, we research on the bias problem in recommender systems from multi-views, including contextual bias, content-level understanding of bias, and the evaluation bias.  ...  With this dimension reduction technique, the space complexity of a corpus' index is reduced from m × n to (m + n) × r.  ... 
doi:10.14279/depositonce-11998 fatcat:dw6wm2ftsrbttmvkhrjpunuxxu

Towards Data-Efficient Machine Learning

Qizhe Xie
To offer practical suggestions to researchers and practitioners, we analyze the effectiveness, the applicability and the engineering difficulty of each algorithm.  ...  from another domain to the domain of interest; Last, with prior knowledge, we can inject targeted inductive biases into the models and make use of external knowledge bases.With three possible directions  ...  The overall distribution of relation frequencies resembles that of word frequencies, subject to the zipf's law.  ... 
doi:10.1184/r1/14395898.v1 fatcat:zatmmd5qsffqtjsakbdd4j5bcq