Filters








108,402 Hits in 8.9 sec

On the Power Laws of Language

Flavio Chierichetti, Ravi Kumar, Bo Pang
2017 Zenodo  
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot.  ...  For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight  ...  We set N = n 1−α β 1−α = |U |, i.e. , N is the number of words in the language. e distribution on the language will be the power law P (α ) N .  ... 
doi:10.5281/zenodo.4697663 fatcat:ny66a4rvjzb6jmexnrpvobfgo4

On the Power Laws of Language

Flavio Chierichetti, Ravi Kumar, Bo Pang
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot.  ...  For many corpora, however, the empirical distribution barely resembles a power law: when plo ed on a loglog scale, the distribution is concave and appears to be composed of two di erently sloped straight  ...  We set N = n 1−α β 1−α = |U |, i.e. , N is the number of words in the language. e distribution on the language will be the power law N .  ... 
doi:10.1145/3077136.3080821 dblp:conf/sigir/Chierichetti0P17 fatcat:s7intwjjtncxfh6orbra6ld7ma

The empirical structure of word frequency distributions [article]

Michael Ramscar
2020 arXiv   pre-print
The frequencies at which individual words occur across languages follow power law distributions, a pattern of findings known as Zipf's law.  ...  of languages are both geometric and, historically, remarkably similar, with power law distributions only emerging when empirical distributions are aggregated.  ...  First, linguistic fits to power laws are often poor, and better described by other distributions. 20 Second, power law distributions can simply represent mixtures of other distributions. 9 10 Third  ... 
arXiv:2001.05292v1 fatcat:me5amymdcnflldrpv5q63n4vvq

Randomness versus specifics for word-frequency distributions

Xiaoyong Yan, Petter Minnhagen
2016 Physica A: Statistical Mechanics and its Applications  
The text-length-dependence of real word-frequency distributions can be connected to the general properties of a random book.  ...  It is pointed out that this finding has strong implications, when deciding between two conceptually different views on word-frequency distributions, i.e. the specific 'Zipf's-view' and the non-specific  ...  The question is then what special principle or property of a language causes this power law distribution of word-frequencies and this is still an ongoing research [6] [7] [8] [9] [10] . Ref.  ... 
doi:10.1016/j.physa.2015.10.082 fatcat:w7gwxq4akvbtpanjehrukgemmm

On the power-law distribution of language family sizes

SØREN WICHMANN
2005 Journal of Linguistics  
It is suggested that the apparent power-law distribution of language family sizes is of relevance when evaluating overall classifications of the world's languages, for the analysis of taxonomic structures  ...  Such ' power-law ' distributions are known to characterize a wide range of social, biological, and physical phenomena and are essentially of a stochastic nature.  ...  Thus -citing a major, early publication for each individual field -power laws have been found in urban conglomerations (Auerbach 1913) , the abundance of biological taxa (Yule 1924) , word frequencies  ... 
doi:10.1017/s002222670400307x fatcat:caj4rmyv5je4vj2crexiulgcu4

Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words

Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter, Enrico Scalas
2009 PLoS ONE  
Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective  ...  The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency.  ...  a word decays as a power law since the last use of that word.  ... 
doi:10.1371/journal.pone.0007678 pmid:19907645 pmcid:PMC2770836 fatcat:xivcnkjikbfh3ojvdqdgy7m5hm

Retrieval constraints and word frequency distributions

Stéphane Clinchant, Eric Gaussier
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09  
We then review empirical findings on word frequency distributions and the central role played by burstiness in this context.  ...  The experiments we conduct on several collections illustrate the good behavior of the log-logistic IR model: It significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on most  ...  We thank the anonymous reviewers for their comments on the first version of this paper.  ... 
doi:10.1145/1645953.1646280 dblp:conf/cikm/ClinchantG09 fatcat:2w65uicj6jaajgsgabupprjxhq

Random texts exhibit Zipf's-law-like word frequency distribution

W. Li
1992 IEEE Transactions on Information Theory  
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English.  ...  The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation  ...  A much smoother power law distributions show up in Fig.2 . In conclusion, Zipf's law is not a deep law in natural language as one might first have thought.  ... 
doi:10.1109/18.165464 fatcat:k2a73k573jdytjeqfxmqex4zty

Solvable null model for the distribution of word frequencies

J. F. Fontanari, L. I. Perlovsky
2004 Physical Review E  
Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery.  ...  Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely  ...  a word on its rank is very well described by the power-law distribution P ϰ 1/, regardless of the language or speaker [1] .  ... 
doi:10.1103/physreve.70.042901 pmid:15600443 fatcat:j4wjuvamn5ejfmiwmjgag7fdgm

A cross-model study on the effect of power-laws on language evolution

Tao Gong, Lan Shuai
2012 2012 IEEE Congress on Evolutionary Computation  
Based on three evolutionary computational models that respectively simulate lexical, categorical and syntactic evolutions, we explore the effect of power-law distributed social popularity on language origin  ...  Simulation results reveal a critical scaling degree (λ ≈ 1.0) in power-law distributions that helps accelerate the diffusion of linguistic conventions and preserve high linguistic understandability in  ...  Vittorio Loreto from the Sapienza University of Rome and Prof. Umberto Ansaldo from the University of Hong Kong for their comments on this work.  ... 
doi:10.1109/cec.2012.6252965 dblp:conf/cec/GongS12 fatcat:xf2vpix7czccfhrb632lxobcsu

Defining thermodynamic parameters for texts from word rank-frequency distributions

Andrij Rovenchak, Solomija Buk
2011 Journal of Physical Studies  
We report the results regarding the calculation of a new parameter set obtained from the rank-frequency distribution of texts.  ...  The parameters are defined using the analogy between the rank-frequency distribution and the quantum Bose-distribution.  ...  We are grateful to Haruko Sanada for advices regarding the automatic word division in Japanese and to Valentin Vydrin for providing a copy of the Bamana translation of the novella.  ... 
doi:10.30970/jps.15.1005 fatcat:xn5pnmvv55datafr6xovshupvi

Dependence of exponents on text length versus finite-size scaling for word-frequency distributions [article]

Alvaro Corral, Francesc Font-Clos
2018 arXiv   pre-print
Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid.  ...  We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [Yan and Minnhagen, Physica A 444, 828 (2016)] does not stand with rigorous statistical  ...  APPENDIX I We explain here the difference between a power law and a scaling law, and how scaling laws in statistical physics usually only hold asymptotically.  ... 
arXiv:1804.03718v1 fatcat:gcptuls3hjglfottcqdlsbz5t4

The Small-World of 'Le Petit Prince': Revisiting the Word Frequency Distribution

Daniel Gamermann, Carmen Moret-Tatay, Esperanza Navarro-Pardo, Pedro Fernandez de Córdoba Castellá
2016 Digital Scholarship in the Humanities  
One of these features is the so called scale-free distribution for its node's connectivity, which means that the degree distribution for the network's nodes follows a power law.  ...  Here we present a mathematical analysis on linguistics: the word frequency effect for different translations of the "Le Petit Prince" in different languages.  ...  In this model a (random) graph is constructed from a set of N nodes by connecting or not each one of the N (N −1) Acknowledgment We would like to thank Thomas Irvin for his invaluable help and comments  ... 
doi:10.1093/llc/fqw005 dblp:journals/lalc/GamermannMNC17 fatcat:37vvbzz4xbfyndoim6nogdwlaq

Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Álvaro Corral, Francesc Font-Clos
2017 Physical review. E  
We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [X. Yan and P.  ...  moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product).  ...  of the validity of a finite-size scaling law in word-frequency distributions.  ... 
doi:10.1103/physreve.96.022318 pmid:28950565 fatcat:a6ngbuvfxndahc6npg7rm3vyby

The dependence of frequency distributions on multiple meanings of words, codes and signs

Xiaoyong Yan, Petter Minnhagen
2018 Physica A: Statistical Mechanics and its Applications  
The dependence of the frequency distributions due to multiple meanings of words in a text is investigated by deleting letters.  ...  This further implies that the difference of the shape for word-frequencies for an English text written by letters and a Chinese text written by Chinese characters is due to the coding and not to the language  ...  A central question in this context is what special principle or property of a language causes the ubiquitous observed "fat tailed' power-law like distribution of word-frequencies [5] [6] [7] [8] [9] [  ... 
doi:10.1016/j.physa.2017.08.133 fatcat:sovcojrdt5cjjni6zosirpciki
« Previous Showing results 1 — 15 out of 108,402 results