A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Power Laws of Language
2017
Zenodo
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. ...
For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a log-log scale, the distribution is concave and appears to be composed of two differently sloped straight ...
We set N = n 1−α β 1−α = |U |, i.e. , N is the number of words in the language. e distribution on the language will be the power law P (α ) N . ...
doi:10.5281/zenodo.4697663
fatcat:ny66a4rvjzb6jmexnrpvobfgo4
On the Power Laws of Language
2017
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17
About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. ...
For many corpora, however, the empirical distribution barely resembles a power law: when plo ed on a loglog scale, the distribution is concave and appears to be composed of two di erently sloped straight ...
We set N = n 1−α β 1−α = |U |, i.e. , N is the number of words in the language. e distribution on the language will be the power law N . ...
doi:10.1145/3077136.3080821
dblp:conf/sigir/Chierichetti0P17
fatcat:s7intwjjtncxfh6orbra6ld7ma
The empirical structure of word frequency distributions
[article]
2020
arXiv
pre-print
The frequencies at which individual words occur across languages follow power law distributions, a pattern of findings known as Zipf's law. ...
of languages are both geometric and, historically, remarkably similar, with power law distributions only emerging when empirical distributions are aggregated. ...
First, linguistic fits to power laws are often poor, and better described by other distributions. 20 Second, power law distributions can simply represent mixtures of other distributions. 9 10 Third ...
arXiv:2001.05292v1
fatcat:me5amymdcnflldrpv5q63n4vvq
Randomness versus specifics for word-frequency distributions
2016
Physica A: Statistical Mechanics and its Applications
The text-length-dependence of real word-frequency distributions can be connected to the general properties of a random book. ...
It is pointed out that this finding has strong implications, when deciding between two conceptually different views on word-frequency distributions, i.e. the specific 'Zipf's-view' and the non-specific ...
The question is then what special principle or property of a language causes this power law distribution of word-frequencies and this is still an ongoing research [6] [7] [8] [9] [10] . Ref. ...
doi:10.1016/j.physa.2015.10.082
fatcat:w7gwxq4akvbtpanjehrukgemmm
On the power-law distribution of language family sizes
2005
Journal of Linguistics
It is suggested that the apparent power-law distribution of language family sizes is of relevance when evaluating overall classifications of the world's languages, for the analysis of taxonomic structures ...
Such ' power-law ' distributions are known to characterize a wide range of social, biological, and physical phenomena and are essentially of a stochastic nature. ...
Thus -citing a major, early publication for each individual field -power laws have been found in urban conglomerations (Auerbach 1913) , the abundance of biological taxa (Yule 1924) , word frequencies ...
doi:10.1017/s002222670400307x
fatcat:caj4rmyv5je4vj2crexiulgcu4
Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words
2009
PLoS ONE
Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective ...
The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency. ...
a word decays as a power law since the last use of that word. ...
doi:10.1371/journal.pone.0007678
pmid:19907645
pmcid:PMC2770836
fatcat:xivcnkjikbfh3ojvdqdgy7m5hm
Retrieval constraints and word frequency distributions
2009
Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09
We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. ...
The experiments we conduct on several collections illustrate the good behavior of the log-logistic IR model: It significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on most ...
We thank the anonymous reviewers for their comments on the first version of this paper. ...
doi:10.1145/1645953.1646280
dblp:conf/cikm/ClinchantG09
fatcat:2w65uicj6jaajgsgabupprjxhq
Random texts exhibit Zipf's-law-like word frequency distribution
1992
IEEE Transactions on Information Theory
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. ...
The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation ...
A much smoother power law distributions show up in Fig.2 . In conclusion, Zipf's law is not a deep law in natural language as one might first have thought. ...
doi:10.1109/18.165464
fatcat:k2a73k573jdytjeqfxmqex4zty
Solvable null model for the distribution of word frequencies
2004
Physical Review E
Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery. ...
Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely ...
a word on its rank is very well described by the power-law distribution P ϰ 1/, regardless of the language or speaker [1] . ...
doi:10.1103/physreve.70.042901
pmid:15600443
fatcat:j4wjuvamn5ejfmiwmjgag7fdgm
A cross-model study on the effect of power-laws on language evolution
2012
2012 IEEE Congress on Evolutionary Computation
Based on three evolutionary computational models that respectively simulate lexical, categorical and syntactic evolutions, we explore the effect of power-law distributed social popularity on language origin ...
Simulation results reveal a critical scaling degree (λ ≈ 1.0) in power-law distributions that helps accelerate the diffusion of linguistic conventions and preserve high linguistic understandability in ...
Vittorio Loreto from the Sapienza University of Rome and Prof. Umberto Ansaldo from the University of Hong Kong for their comments on this work. ...
doi:10.1109/cec.2012.6252965
dblp:conf/cec/GongS12
fatcat:xf2vpix7czccfhrb632lxobcsu
Defining thermodynamic parameters for texts from word rank-frequency distributions
2011
Journal of Physical Studies
We report the results regarding the calculation of a new parameter set obtained from the rank-frequency distribution of texts. ...
The parameters are defined using the analogy between the rank-frequency distribution and the quantum Bose-distribution. ...
We are grateful to Haruko Sanada for advices regarding the automatic word division in Japanese and to Valentin Vydrin for providing a copy of the Bamana translation of the novella. ...
doi:10.30970/jps.15.1005
fatcat:xn5pnmvv55datafr6xovshupvi
Dependence of exponents on text length versus finite-size scaling for word-frequency distributions
[article]
2018
arXiv
pre-print
Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid. ...
We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [Yan and Minnhagen, Physica A 444, 828 (2016)] does not stand with rigorous statistical ...
APPENDIX I We explain here the difference between a power law and a scaling law, and how scaling laws in statistical physics usually only hold asymptotically. ...
arXiv:1804.03718v1
fatcat:gcptuls3hjglfottcqdlsbz5t4
The Small-World of 'Le Petit Prince': Revisiting the Word Frequency Distribution
2016
Digital Scholarship in the Humanities
One of these features is the so called scale-free distribution for its node's connectivity, which means that the degree distribution for the network's nodes follows a power law. ...
Here we present a mathematical analysis on linguistics: the word frequency effect for different translations of the "Le Petit Prince" in different languages. ...
In this model a (random) graph is constructed from a set of N nodes by connecting or not each one of the N (N −1)
Acknowledgment We would like to thank Thomas Irvin for his invaluable help and comments ...
doi:10.1093/llc/fqw005
dblp:journals/lalc/GamermannMNC17
fatcat:37vvbzz4xbfyndoim6nogdwlaq
Dependence of exponents on text length versus finite-size scaling for word-frequency distributions
2017
Physical review. E
We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [X. Yan and P. ...
moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product). ...
of the validity of a finite-size scaling law in word-frequency distributions. ...
doi:10.1103/physreve.96.022318
pmid:28950565
fatcat:a6ngbuvfxndahc6npg7rm3vyby
The dependence of frequency distributions on multiple meanings of words, codes and signs
2018
Physica A: Statistical Mechanics and its Applications
The dependence of the frequency distributions due to multiple meanings of words in a text is investigated by deleting letters. ...
This further implies that the difference of the shape for word-frequencies for an English text written by letters and a Chinese text written by Chinese characters is due to the coding and not to the language ...
A central question in this context is what special principle or property of a language causes the ubiquitous observed "fat tailed' power-law like distribution of word-frequencies [5] [6] [7] [8] [9] [ ...
doi:10.1016/j.physa.2017.08.133
fatcat:sovcojrdt5cjjni6zosirpciki
« Previous
Showing results 1 — 15 out of 108,402 results