11,497 Hits in 5.2 sec

Word-length entropies and correlations of natural language written texts [article]

Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, Konstantinos Karamanos, Fotis K. Diakonos, Harris Papageorgiou
2014 arXiv   pre-print
We study the frequency distributions and correlations of the word lengths of ten European languages.  ...  measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages.  ...  Word-length entropies and correlations of natural language written texts  ... 
arXiv:1401.6224v1 fatcat:iuyzae36xzafhjaroialfsya5y

Word-length Entropies and Correlations of Natural Language Written Texts

Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, Konstantinos Karamanos, Fotis K. Diakonos, Harris Papageorgiou
2015 Journal of Quantitative Linguistics  
We study the frequency distributions and correlations of the word lengths of ten European languages.  ...  measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages.  ...  Acknowledgements We are grateful to George Mikros for the fruitful discussions and support. K.K. would like to express his gratitude to the Library in the National Hellenic Research Foundation.  ... 
doi:10.1080/09296174.2014.1001636 fatcat:qdyrjbam4vadzfrxdif4kkpbti


2012 International Journal of Bifurcation and Chaos in Applied Sciences and Engineering  
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre.  ...  Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies.  ...  and correlations of word lengths in the effects of text language and genre.  ... 
doi:10.1142/s0218127412502239 fatcat:irt2cm4brnanfjb4ypldvmwwvq

Complexity-entropy analysis at different levels of organisation in written language

Ernesto Estevez-Rams, Ania Mesa-Rodriguez, Daniel Estevez-Moya, Diego Raphael Amancio
2019 PLoS ONE  
A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources.  ...  Despite all these constraints, unpredictability is an essential element of natural language.  ...  Acknowledgments The authors would like to thank the referees, who carefully read the manuscript and made valuable suggestions. Author Contributions Conceptualization: Ernesto Estevez-Rams.  ... 
doi:10.1371/journal.pone.0214863 pmid:31067221 pmcid:PMC6505741 fatcat:fryrk3w5bbe7veq3qke4awubya

Comparative Study of Complexity, Entropy and Correlations of Natural Written Texts Produced by Human Brain and DNA "Texts" that Create Human Being

Melnik SS, Usatenko Usatenko OV
2015 Journal of Theoretical and Computational Science  
This length R c is much grater than correlation length R c ≈10 observed for natural written texts (Figure 2 ).  ...  Usually the DNA and natural language texts are considered as random sequences with finite number of states.  ...  Page 4 of 4  ... 
doi:10.4172/2376-130x.1000139 fatcat:ctln6xuk7bartbabxlxrbsm36q

Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years

Heng Chen, Haitao Liu
2018 Complexity  
We speculate that the increase of word length and sentence length in written Chinese may account for this phenomenon, in terms of both the social-cultural aspects and the self-adapting properties of language  ...  We investigate how short and long-range word length correlations evolve in Chinese narrative texts.  ...  Evolution of -Gram Word Length Entropies and Short-Range Word Length Correlations.  ... 
doi:10.1155/2018/9362468 fatcat:sxnqhbf7fncp3jrbuxt46dzlje


1996 Fractals  
This agrees with the intuitive notion that words are well defined subunits in written languages, with much weaker correlations across these units than within them.  ...  We show that the predictability of letters in written English texts depends strongly on their position in the word. The first letters are usually the least easy to predict.  ...  Finally, we show in Figs. 3 and 4 how the estimated overall entropy depends on the length of the text, with and without scrambling.  ... 
doi:10.1142/s0218348x96000029 fatcat:fmm6rm2oa5gofkilu2lz3g5i4a

Universal Entropy of Word Ordering Across Linguistic Families

Marcelo A. Montemurro, Damián H. Zanette, Michael Breakspear
2011 PLoS ONE  
In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints.  ...  Conclusions/Significance: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical  ...  correlation length and the entropy of the Zipf's distribution.  ... 
doi:10.1371/journal.pone.0019875 pmid:21603637 pmcid:PMC3094390 fatcat:eqm3a6kdtbbz3pduyk4prb2veq

Complexity measurement of natural and artificial languages

Gerardo Febres, Klaus Jaffé, Carlos Gershenson
2014 Complexity  
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and  ...  Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversity than English ones.  ...  Carlos Gershenson was partially supported by SNI membership 47907 of CONACyT, Mexico.  ... 
doi:10.1002/cplx.21529 fatcat:v6kxymdekre7ngji5oueiz3eui

The Entropy of Words—Learnability and Expressivity across More than 1000 Languages

Christian Bentz, Dimitrios Alikaniotis, Michael Cysouw, Ramon Ferrer-i-Cancho
2017 Entropy  
The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics and language sciences more generally.  ...  Here, we use three parallel corpora, encompassing ca. 450 million words in 1916 texts and 1259 languages, to tackle some of the major conceptual and practical problems of word entropy estimation: dependence  ...  Christian Bentz was funded by the German Research Foundation (DFG FOR 2237: Project "Words, Bones, Genes, Tools: Tracking Linguistic, Cultural, and Biological Trajectories of the Human Past") and by the  ... 
doi:10.3390/e19060275 fatcat:rjbauypw25ffhi4lx334ocg63a

Information Theory and Language

Łukasz Dębowski, Christian Bentz
2020 Entropy  
Human language is a system of communication. Communication, in turn, consists primarily of information transmission [...]  ...  Acknowledgments: We express our thanks to the authors of the above contributions, the reviewers for their feedback on the manuscripts, and to the journal Entropy and MDPI for their support during this  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/e22040435 pmid:33286209 fatcat:me5ui7eginbsfl4663jyrprzle

Average Word Length from the Diachronic Perspective: The Case of Arabic

Jiří Milička
2019 Linguistic Frontiers  
The dynamics of the average word length correlates with the dynamics of the average word distribution entropy, which encourages an explanation of the phenomenon based on the Shannonian theory of communication  ...  Previous studies based on English, Russian and Chinese corpora show that the average word length in texts grows steadily across centuries.  ...  I also thank Petr Zemánek for various important comments and Denisa Šebestová for proofreading and valuable remarks.  ... 
doi:10.2478/lf-2018-0007 fatcat:qxk5i5y2gnhoxpn7iaegj5iohm

Evaluating the Irregularity of Natural Languages

Candelario Hernández-Gómez, Rogelio Basurto-Flores, Bibiana Obregón-Quintana, Lev Guzmán-Vargas
2017 Entropy  
We modified a well-known method to calculate the approximate and sample entropy of written texts.  ...  We find differences in the degree of irregularity between the families and our method, which is based on the search of regularities in a sequence of symbols, and consistently distinguishes between natural  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/e19100521 fatcat:m55npadgzrh2tp2r7tiye6rypi

The word entropy of natural languages [article]

Christian Bentz, Dimitrios Alikaniotis
2016 arXiv   pre-print
We here use parallel texts of 21 languages to establish the number of tokens at which word entropies converge to stable values.  ...  These convergence points are then used to select texts from a massively parallel corpus, and to estimate word entropies across more than 1000 languages.  ...  Since then, Shannon [28] and others have undertaken great efforts to estimate precisely the entropy of written English [4, 12, 8, 27] , and other languages [1, 17, 18] .  ... 
arXiv:1606.06996v1 fatcat:beyuikfxkfcl3je263hbtngkyi

Natural Language Statistical Features of LSTM-Generated Texts

Marco Lippi, Marcelo A. Montemurro, Mirko Degli Esposti, Giampaolo Cristadoro
2019 IEEE Transactions on Neural Networks and Learning Systems  
In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures.  ...  We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders.  ...  ACKNOWLEDGMENT The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.  ... 
doi:10.1109/tnnls.2019.2890970 pmid:30951479 fatcat:arxskczkcvgadn67wqksxe3tjq
« Previous Showing results 1 — 15 out of 11,497 results