Filters








1,954 Hits in 7.5 sec

ENTROPY ANALYSIS OF WORD-LENGTH SERIES OF NATURAL LANGUAGE TEXTS: EFFECTS OF TEXT LANGUAGE AND GENRE

MARIA KALIMERI, VASSILIOS CONSTANTOUDIS, CONSTANTINOS PAPADIMITRIOU, KONSTANTINOS KARAMANOS, FOTIS K. DIAKONOS, HARIS PAPAGEORGIOU
2012 International Journal of Bifurcation and Chaos in Applied Sciences and Engineering  
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre.  ...  Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies.  ...  and correlations of word lengths in the effects of text language and genre.  ... 
doi:10.1142/s0218127412502239 fatcat:irt2cm4brnanfjb4ypldvmwwvq

AVERAGE WORD LENGTH AND TEXT REDUNDANCY VARIABILITY: FRENCH TEXTS CASE STUDY

Malvina Marinashvili
2020 Polonia University Scientific Journal  
This correlation has been evaluated on the basis of analysis of entropy, redundancy and average word length for literary, scientific, and publicistic texts.  ...  In this regard it is proposed to distinguish between two average word lengths of text: the average length of a word belonging to the exponentially decaying tail of entropy and the average length of a word  ...  Acknowledgements The author is grateful to Andrey Olinchuk for proving the useful program on C# for determination of French text statistical characteristics. Also special thanks to Dr Dzhema V.  ... 
doi:10.23856/3849 fatcat:uahwcrgg7ray3i74t34aq3yrju

Word-length entropies and correlations of natural language written texts [article]

Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, Konstantinos Karamanos, Fotis K. Diakonos, Harris Papageorgiou
2014 arXiv   pre-print
We study the frequency distributions and correlations of the word lengths of ten European languages.  ...  measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages.  ...  Pa- pageorgiou, “Entropy analysis of word-length series of natural language texts: Effects of text language and genre,” International Journal of Bifurcation and Chaos, vol. 22, no. 9, 2012  ... 
arXiv:1401.6224v1 fatcat:iuyzae36xzafhjaroialfsya5y

Word-length Entropies and Correlations of Natural Language Written Texts

Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, Konstantinos Karamanos, Fotis K. Diakonos, Harris Papageorgiou
2015 Journal of Quantitative Linguistics  
We study the frequency distributions and correlations of the word lengths of ten European languages.  ...  measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages.  ...  Acknowledgements We are grateful to George Mikros for the fruitful discussions and support. K.K. would like to express his gratitude to the Library in the National Hellenic Research Foundation.  ... 
doi:10.1080/09296174.2014.1001636 fatcat:qdyrjbam4vadzfrxdif4kkpbti

Natural Language Statistical Features of LSTM-Generated Texts

Marco Lippi, Marcelo A. Montemurro, Mirko Degli Esposti, Giampaolo Cristadoro
2019 IEEE Transactions on Neural Networks and Learning Systems  
In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures.  ...  We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders.  ...  ACKNOWLEDGMENT The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.  ... 
doi:10.1109/tnnls.2019.2890970 pmid:30951479 fatcat:arxskczkcvgadn67wqksxe3tjq

Shared common ground influences information density in microblog texts

Gabriel Doyle, Michael Frank
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies  
We use microblog texts from Twitter, tied to a single shared event (the baseball World Series), to quantify both linguistic and non-linguistic context.  ...  These findings lend further support to the UID hypothesis and highlights the importance of nonlinguistic common ground for language production and processing.  ...  For this analysis, we created a mixed-effects model with WPA, log(rate) and log(time) as predictors of tweet length.  ... 
doi:10.3115/v1/n15-1182 dblp:conf/naacl/DoyleF15 fatcat:yvfkiyablbfipa4eadn7jv7sgq

Compression of parallel texts

Craig Nevill, Timothy Bell
1992 Information Processing & Management  
To minimise the cost of storing and transmitting multiple translations of a text, one could store the text in just one language, from which other translations can be created.  ...  - Abstract The world-wide use of digital storage and communications devices is increasing the need to make texts available in multiple languages.  ...  This facilitates the efficient and flexible retrieval and analysis of information.  ... 
doi:10.1016/0306-4573(92)90068-b fatcat:ch4yvpkg65gyhgmbntfcsogwxa

Automatic Detection of Text Genre [article]

Brett Kessler, Geoffrey Nunberg, Hinrich Schuetze (Xerox PARC and Stanford University)
1997 arXiv   pre-print
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of  ...  We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural  ...  For example, instead of estimating separate weights α, β, and γ for the ratios words per sentence (average sentence length), characters per word (average word length) and words per type (token/type ratio  ... 
arXiv:cmp-lg/9707002v1 fatcat:5o6lbf2tlrfezd6dijeyawmqba

Automatic detection of text genre

Brett Kessler, Geoffrey Numberg, Hinrich Schütze
1997 Proceedings of the 35th annual meeting on Association for Computational Linguistics -  
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of  ...  We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural  ...  For example, instead of estimating separate weights o, 3, and 3' for the ratios words per sentence (average sentence length), characters per word (average word length) and words per type (token/type ratio  ... 
doi:10.3115/976909.979622 dblp:conf/acl/KesslerNS97 fatcat:4vkb3ci6zfa6zdjxeeq5mdw2p4

Filtering artificial texts with statistical machine learning techniques

Thomas Lavergne, Tanguy Urvoy, François Yvon
2010 Language Resources and Evaluation  
models and the third one is based on a relative entropy measure which captures short range dependencies between words.  ...  Our experiments show that lexicometric features and language models are efficient to detect most generated texts, but fail to detect texts that are generated with high order Markov models.  ...  words length; the mean and standard deviation of sentences length; the ratio of grammatical words; the ratio of words that are found in an English dictionary; the ratio between number of tokens (ie. the  ... 
doi:10.1007/s10579-009-9113-0 fatcat:jb5n3xdkv5ax3dlvgbexd3ddpq

Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years

Heng Chen, Haitao Liu
2018 Complexity  
We investigate how short and long-range word length correlations evolve in Chinese narrative texts.  ...  We speculate that the increase of word length and sentence length in written Chinese may account for this phenomenon, in terms of both the social-cultural aspects and the self-adapting properties of language  ...  These correlations can be found in word length series, word frequency series, unicode (of word/character) series, and so on. Using the natural visibility graph method, Guzmán-Vargas et al.  ... 
doi:10.1155/2018/9362468 fatcat:sxnqhbf7fncp3jrbuxt46dzlje

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts [article]

Mahdi Mohseni, Volker Gast, Christoph Redies
2020 arXiv   pre-print
The basic observations are transformed into time series, and these time series are subject to multifractal detrended fluctuation analysis (MFDFA).  ...  We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic probabilities in  ...  The entropy of word distributions can also be informative for comparing different types of languages in term of word ordering.  ... 
arXiv:2008.10906v1 fatcat:is3orr7525czhlem6dc7k5prj4

Probing the Topological Properties of Complex Networks Modeling Short Written Texts

Diego R. Amancio, Satoru Hayasaka
2015 PLoS ONE  
Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks.  ...  More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis.  ...  In the latter, graph-based techniques have been applied to the analysis and construction of software architecture [17] , supervised classifiers [18] , spam filters [19] and natural language processing  ... 
doi:10.1371/journal.pone.0118394 pmid:25719799 pmcid:PMC4342245 fatcat:6ddzocgtwjdjfixlr3wamfneoi

A Comprehensive Review of Arabic Text summarization

Asmaa Elsaid, Ammar Mohammed, Lamiaa Fattouh, Mohamed Sakre
2022 IEEE Access  
and architectures, and the complexity of the Arabic language.  ...  Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems  ...  Applications of natural language processing include information retrieval, machine translation, questions and answers, and text summarization.  ... 
doi:10.1109/access.2022.3163292 fatcat:spdw4rrvmfgm3anlujx3ndvg2y

Entropic evolution of lexical richness of homogeneous texts over time: A dynamic complexity perspective

Yanhui Zhang
2016 Journal of Language Modelling  
The entropies of these texts are calculated and treated as a time series data.  ...  as text length increases.  ... 
doi:10.15398/jlm.v3i2.111 fatcat:buzekmsnazapbajcvk2rrkxugq
« Previous Showing results 1 — 15 out of 1,954 results