A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions
One of the striking commonalities between languages is the way word frequencies are distributed. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank (Zipf, 1949). Intuitively, this means that languages have relatively few high-frequency words and many low-frequency ones. While studied extensively, little work has explored the learnability consequences of the greater predictability of words in such distributions.doi:10.23668/psycharchives.3079 fatcat:dq27e74lpbgqpishnff6scdxva