A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Word-length entropies and correlations of natural language written texts
[article]
2014
arXiv
pre-print
We study the frequency distributions and correlations of the word lengths of ten European languages. ...
measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages. ...
Word-length entropies and correlations
of natural language written texts ...
arXiv:1401.6224v1
fatcat:iuyzae36xzafhjaroialfsya5y
Word-length Entropies and Correlations of Natural Language Written Texts
2015
Journal of Quantitative Linguistics
We study the frequency distributions and correlations of the word lengths of ten European languages. ...
measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages. ...
Acknowledgements We are grateful to George Mikros for the fruitful discussions and support. K.K. would like to express his gratitude to the Library in the National Hellenic Research Foundation. ...
doi:10.1080/09296174.2014.1001636
fatcat:qdyrjbam4vadzfrxdif4kkpbti
ENTROPY ANALYSIS OF WORD-LENGTH SERIES OF NATURAL LANGUAGE TEXTS: EFFECTS OF TEXT LANGUAGE AND GENRE
2012
International Journal of Bifurcation and Chaos in Applied Sciences and Engineering
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. ...
Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies. ...
and correlations of word lengths in the effects of text language and genre. ...
doi:10.1142/s0218127412502239
fatcat:irt2cm4brnanfjb4ypldvmwwvq
Complexity-entropy analysis at different levels of organisation in written language
2019
PLoS ONE
A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources. ...
Despite all these constraints, unpredictability is an essential element of natural language. ...
Acknowledgments The authors would like to thank the referees, who carefully read the manuscript and made valuable suggestions.
Author Contributions Conceptualization: Ernesto Estevez-Rams. ...
doi:10.1371/journal.pone.0214863
pmid:31067221
pmcid:PMC6505741
fatcat:fryrk3w5bbe7veq3qke4awubya
Comparative Study of Complexity, Entropy and Correlations of Natural Written Texts Produced by Human Brain and DNA "Texts" that Create Human Being
2015
Journal of Theoretical and Computational Science
This length R c is much grater than correlation length R c ≈10 observed for natural written texts (Figure 2 ). ...
Usually the DNA and natural language texts are considered as random sequences with finite number of states. ...
Page 4 of 4 ...
doi:10.4172/2376-130x.1000139
fatcat:ctln6xuk7bartbabxlxrbsm36q
Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years
2018
Complexity
We speculate that the increase of word length and sentence length in written Chinese may account for this phenomenon, in terms of both the social-cultural aspects and the self-adapting properties of language ...
We investigate how short and long-range word length correlations evolve in Chinese narrative texts. ...
Evolution of -Gram Word Length Entropies and Short-Range Word Length Correlations. ...
doi:10.1155/2018/9362468
fatcat:sxnqhbf7fncp3jrbuxt46dzlje
THE PREDICTABILITY OF LETTERS IN WRITTEN ENGLISH
1996
Fractals
This agrees with the intuitive notion that words are well defined subunits in written languages, with much weaker correlations across these units than within them. ...
We show that the predictability of letters in written English texts depends strongly on their position in the word. The first letters are usually the least easy to predict. ...
Finally, we show in Figs. 3 and 4 how the estimated overall entropy depends on the length of the text, with and without scrambling. ...
doi:10.1142/s0218348x96000029
fatcat:fmm6rm2oa5gofkilu2lz3g5i4a
Universal Entropy of Word Ordering Across Linguistic Families
2011
PLoS ONE
In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. ...
Conclusions/Significance: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical ...
correlation length and the entropy of the Zipf's distribution. ...
doi:10.1371/journal.pone.0019875
pmid:21603637
pmcid:PMC3094390
fatcat:eqm3a6kdtbbz3pduyk4prb2veq
Complexity measurement of natural and artificial languages
2014
Complexity
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and ...
Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversity than English ones. ...
Carlos Gershenson was partially supported by SNI membership 47907 of CONACyT, Mexico. ...
doi:10.1002/cplx.21529
fatcat:v6kxymdekre7ngji5oueiz3eui
The Entropy of Words—Learnability and Expressivity across More than 1000 Languages
2017
Entropy
The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally. ...
Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence ...
Christian Bentz was funded by the German Research Foundation (DFG FOR 2237: Project "Words, Bones, Genes, Tools: Tracking Linguistic, Cultural, and Biological Trajectories of the Human Past") and by the ...
doi:10.3390/e19060275
fatcat:rjbauypw25ffhi4lx334ocg63a
Information Theory and Language
2020
Entropy
Human language is a system of communication. Communication, in turn, consists primarily of information transmission [...] ...
Acknowledgments: We express our thanks to the authors of the above contributions, the reviewers for their feedback on the manuscripts, and to the journal Entropy and MDPI for their support during this ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/e22040435
pmid:33286209
fatcat:me5ui7eginbsfl4663jyrprzle
Average Word Length from the Diachronic Perspective: The Case of Arabic
2019
Linguistic Frontiers
The dynamics of the average word length correlates with the dynamics of the average word distribution entropy, which encourages an explanation of the phenomenon based on the Shannonian theory of communication ...
Previous studies based on English, Russian and Chinese corpora show that the average word length in texts grows steadily across centuries. ...
I also thank Petr Zemánek for various important comments and Denisa Šebestová for proofreading and valuable remarks. ...
doi:10.2478/lf-2018-0007
fatcat:qxk5i5y2gnhoxpn7iaegj5iohm
Evaluating the Irregularity of Natural Languages
2017
Entropy
We modified a well-known method to calculate the approximate and sample entropy of written texts. ...
We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural ...
Conflicts of Interest: The authors declare no conflict of interest. ...
doi:10.3390/e19100521
fatcat:m55npadgzrh2tp2r7tiye6rypi
The word entropy of natural languages
[article]
2016
arXiv
pre-print
We here use parallel texts of 21 languages to establish the number of tokens at which word entropies converge to stable values. ...
These convergence points are then used to select texts from a massively parallel corpus, and to estimate word entropies across more than 1000 languages. ...
Since then, Shannon [28] and others have undertaken great efforts to estimate precisely the entropy of written English [4, 12, 8, 27] , and other languages [1, 17, 18] . ...
arXiv:1606.06996v1
fatcat:beyuikfxkfcl3je263hbtngkyi
Natural Language Statistical Features of LSTM-Generated Texts
2019
IEEE Transactions on Neural Networks and Learning Systems
In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures. ...
We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders. ...
ACKNOWLEDGMENT The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. ...
doi:10.1109/tnnls.2019.2890970
pmid:30951479
fatcat:arxskczkcvgadn67wqksxe3tjq
« Previous
Showing results 1 — 15 out of 11,497 results