### On the Power Laws of Language

Flavio Chierichetti, Ravi Kumar, Bo Pang
2017 Zenodo
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot.  ...  For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight  ...  We set N = n 1−α β 1−α = |U |, i.e. , N is the number of words in the language. e distribution on the language will be the power law P (α ) N .  ...

### The empirical structure of word frequency distributions [article]

Michael Ramscar
2020 arXiv   pre-print
The frequencies at which individual words occur across languages follow power law distributions, a pattern of findings known as Zipf's law.  ...  of languages are both geometric and, historically, remarkably similar, with power law distributions only emerging when empirical distributions are aggregated.  ...  First, linguistic fits to power laws are often poor, and better described by other distributions. 20 Second, power law distributions can simply represent mixtures of other distributions. 9 10 Third  ...

### Randomness versus specifics for word-frequency distributions

Xiaoyong Yan, Petter Minnhagen
2016 Physica A: Statistical Mechanics and its Applications
The text-length-dependence of real word-frequency distributions can be connected to the general properties of a random book.  ...  It is pointed out that this finding has strong implications, when deciding between two conceptually different views on word-frequency distributions, i.e. the specific 'Zipf's-view' and the non-specific  ...  The question is then what special principle or property of a language causes this power law distribution of word-frequencies and this is still an ongoing research [6] [7] [8] [9] [10] . Ref.  ...

### On the power-law distribution of language family sizes

SØREN WICHMANN
2005 Journal of Linguistics
It is suggested that the apparent power-law distribution of language family sizes is of relevance when evaluating overall classifications of the world's languages, for the analysis of taxonomic structures  ...  Such ' power-law ' distributions are known to characterize a wide range of social, biological, and physical phenomena and are essentially of a stochastic nature.  ...  Thus -citing a major, early publication for each individual field -power laws have been found in urban conglomerations (Auerbach 1913) , the abundance of biological taxa (Yule 1924) , word frequencies  ...

### Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words

Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter, Enrico Scalas
2009 PLoS ONE
Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective  ...  The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency.  ...  a word decays as a power law since the last use of that word.  ...

### Retrieval constraints and word frequency distributions

Stéphane Clinchant, Eric Gaussier
2009 Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09
We then review empirical findings on word frequency distributions and the central role played by burstiness in this context.  ...  The experiments we conduct on several collections illustrate the good behavior of the log-logistic IR model: It significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on most  ...  We thank the anonymous reviewers for their comments on the first version of this paper.  ...

### Random texts exhibit Zipf's-law-like word frequency distribution

W. Li
1992 IEEE Transactions on Information Theory
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English.  ...  The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation  ...  A much smoother power law distributions show up in Fig.2 . In conclusion, Zipf's law is not a deep law in natural language as one might first have thought.  ...

### Solvable null model for the distribution of word frequencies

J. F. Fontanari, L. I. Perlovsky
2004 Physical Review E
Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery.  ...  Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely  ...  a word on its rank is very well described by the power-law distribution P ϰ 1/, regardless of the language or speaker [1] .  ...

### A cross-model study on the effect of power-laws on language evolution

Tao Gong, Lan Shuai
2012 2012 IEEE Congress on Evolutionary Computation
Based on three evolutionary computational models that respectively simulate lexical, categorical and syntactic evolutions, we explore the effect of power-law distributed social popularity on language origin  ...  Simulation results reveal a critical scaling degree (λ ≈ 1.0) in power-law distributions that helps accelerate the diffusion of linguistic conventions and preserve high linguistic understandability in  ...  Vittorio Loreto from the Sapienza University of Rome and Prof. Umberto Ansaldo from the University of Hong Kong for their comments on this work.  ...

### Defining thermodynamic parameters for texts from word rank-frequency distributions

Andrij Rovenchak, Solomija Buk
2011 Journal of Physical Studies
We report the results regarding the calculation of a new parameter set obtained from the rank-frequency distribution of texts.  ...  The parameters are defined using the analogy between the rank-frequency distribution and the quantum Bose-distribution.  ...  We are grateful to Haruko Sanada for advices regarding the automatic word division in Japanese and to Valentin Vydrin for providing a copy of the Bamana translation of the novella.  ...

### Dependence of exponents on text length versus finite-size scaling for word-frequency distributions [article]

Alvaro Corral, Francesc Font-Clos
2018 arXiv   pre-print
Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid.  ...  We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [Yan and Minnhagen, Physica A 444, 828 (2016)] does not stand with rigorous statistical  ...  APPENDIX I We explain here the difference between a power law and a scaling law, and how scaling laws in statistical physics usually only hold asymptotically.  ...

### The Small-World of 'Le Petit Prince': Revisiting the Word Frequency Distribution

Daniel Gamermann, Carmen Moret-Tatay, Esperanza Navarro-Pardo, Pedro Fernandez de Córdoba Castellá
2016 Digital Scholarship in the Humanities
One of these features is the so called scale-free distribution for its node's connectivity, which means that the degree distribution for the network's nodes follows a power law.  ...  Here we present a mathematical analysis on linguistics: the word frequency effect for different translations of the "Le Petit Prince" in different languages.  ...  In this model a (random) graph is constructed from a set of N nodes by connecting or not each one of the N (N −1) Acknowledgment We would like to thank Thomas Irvin for his invaluable help and comments  ...

### Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Álvaro Corral, Francesc Font-Clos
2017 Physical review. E
We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [X. Yan and P.  ...  moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product).  ...  of the validity of a finite-size scaling law in word-frequency distributions.  ...

### The dependence of frequency distributions on multiple meanings of words, codes and signs

Xiaoyong Yan, Petter Minnhagen
2018 Physica A: Statistical Mechanics and its Applications
The dependence of the frequency distributions due to multiple meanings of words in a text is investigated by deleting letters.  ...  This further implies that the difference of the shape for word-frequencies for an English text written by letters and a Chinese text written by Chinese characters is due to the coding and not to the language  ...  A central question in this context is what special principle or property of a language causes the ubiquitous observed "fat tailed' power-law like distribution of word-frequencies [5] [6] [7] [8] [9] [  ...
