A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
ENTROPY ANALYSIS OF WORD-LENGTH SERIES OF NATURAL LANGUAGE TEXTS: EFFECTS OF TEXT LANGUAGE AND GENRE
2012
International Journal of Bifurcation and Chaos in Applied Sciences and Engineering
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. ...
Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies. ...
and correlations of word lengths in the effects of text language and genre. ...
doi:10.1142/s0218127412502239
fatcat:irt2cm4brnanfjb4ypldvmwwvq
AVERAGE WORD LENGTH AND TEXT REDUNDANCY VARIABILITY: FRENCH TEXTS CASE STUDY
2020
Polonia University Scientific Journal
This correlation has been evaluated on the basis of analysis of entropy, redundancy and average word length for literary, scientific, and publicistic texts. ...
In this regard it is proposed to distinguish between two average word lengths of text: the average length of a word belonging to the exponentially decaying tail of entropy and the average length of a word ...
Acknowledgements The author is grateful to Andrey Olinchuk for proving the useful program on C# for determination of French text statistical characteristics. Also special thanks to Dr Dzhema V. ...
doi:10.23856/3849
fatcat:uahwcrgg7ray3i74t34aq3yrju
Word-length entropies and correlations of natural language written texts
[article]
2014
arXiv
pre-print
We study the frequency distributions and correlations of the word lengths of ten European languages. ...
measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages. ...
Pa-
pageorgiou, “Entropy analysis of word-length series of natural language texts: Effects of
text language and genre,” International Journal of Bifurcation and Chaos, vol. 22, no. 9,
2012 ...
arXiv:1401.6224v1
fatcat:iuyzae36xzafhjaroialfsya5y
Word-length Entropies and Correlations of Natural Language Written Texts
2015
Journal of Quantitative Linguistics
We study the frequency distributions and correlations of the word lengths of ten European languages. ...
measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages. ...
Acknowledgements We are grateful to George Mikros for the fruitful discussions and support. K.K. would like to express his gratitude to the Library in the National Hellenic Research Foundation. ...
doi:10.1080/09296174.2014.1001636
fatcat:qdyrjbam4vadzfrxdif4kkpbti
Natural Language Statistical Features of LSTM-Generated Texts
2019
IEEE Transactions on Neural Networks and Learning Systems
In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures. ...
We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders. ...
ACKNOWLEDGMENT The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. ...
doi:10.1109/tnnls.2019.2890970
pmid:30951479
fatcat:arxskczkcvgadn67wqksxe3tjq
Shared common ground influences information density in microblog texts
2015
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
We use microblog texts from Twitter, tied to a single shared event (the baseball World Series), to quantify both linguistic and non-linguistic context. ...
These findings lend further support to the UID hypothesis and highlights the importance of nonlinguistic common ground for language production and processing. ...
For this analysis, we created a mixed-effects model with WPA, log(rate) and log(time) as predictors of tweet length. ...
doi:10.3115/v1/n15-1182
dblp:conf/naacl/DoyleF15
fatcat:yvfkiyablbfipa4eadn7jv7sgq
Compression of parallel texts
1992
Information Processing & Management
To minimise the cost of storing and transmitting multiple translations of a text, one could store the text in just one language, from which other translations can be created. ...
- Abstract The world-wide use of digital storage and communications devices is increasing the need to make texts available in multiple languages. ...
This facilitates the efficient and flexible retrieval and analysis of information. ...
doi:10.1016/0306-4573(92)90068-b
fatcat:ch4yvpkg65gyhgmbntfcsogwxa
Automatic Detection of Text Genre
[article]
1997
arXiv
pre-print
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of ...
We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural ...
For example, instead of estimating separate weights α, β, and γ for the ratios words per sentence (average sentence length), characters per word (average word length) and words per type (token/type ratio ...
arXiv:cmp-lg/9707002v1
fatcat:5o6lbf2tlrfezd6dijeyawmqba
Automatic detection of text genre
1997
Proceedings of the 35th annual meeting on Association for Computational Linguistics -
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of ...
We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural ...
For example, instead of estimating separate weights o, 3, and 3' for the ratios words per sentence (average sentence length), characters per word (average word length) and words per type (token/type ratio ...
doi:10.3115/976909.979622
dblp:conf/acl/KesslerNS97
fatcat:4vkb3ci6zfa6zdjxeeq5mdw2p4
Filtering artificial texts with statistical machine learning techniques
2010
Language Resources and Evaluation
models and the third one is based on a relative entropy measure which captures short range dependencies between words. ...
Our experiments show that lexicometric features and language models are efficient to detect most generated texts, but fail to detect texts that are generated with high order Markov models. ...
words length; the mean and standard deviation of sentences length; the ratio of grammatical words; the ratio of words that are found in an English dictionary; the ratio between number of tokens (ie. the ...
doi:10.1007/s10579-009-9113-0
fatcat:jb5n3xdkv5ax3dlvgbexd3ddpq
Quantifying Evolution of Short and Long-Range Correlations in Chinese Narrative Texts across 2000 Years
2018
Complexity
We investigate how short and long-range word length correlations evolve in Chinese narrative texts. ...
We speculate that the increase of word length and sentence length in written Chinese may account for this phenomenon, in terms of both the social-cultural aspects and the self-adapting properties of language ...
These correlations can be found in word length series, word frequency series, unicode (of word/character) series, and so on. Using the natural visibility graph method, Guzmán-Vargas et al. ...
doi:10.1155/2018/9362468
fatcat:sxnqhbf7fncp3jrbuxt46dzlje
Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts
[article]
2020
arXiv
pre-print
The basic observations are transformed into time series, and these time series are subject to multifractal detrended fluctuation analysis (MFDFA). ...
We use four types of basic observations, (i) the frequency of POS-tags per sentence, (ii) sentence length, (iii) lexical diversity in chunks of text, and (iv) the distribution of topic probabilities in ...
The entropy of word distributions can also be informative for comparing different types of languages in term of word ordering. ...
arXiv:2008.10906v1
fatcat:is3orr7525czhlem6dc7k5prj4
Probing the Topological Properties of Complex Networks Modeling Short Written Texts
2015
PLoS ONE
Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. ...
More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. ...
In the latter, graph-based techniques have been applied to the analysis and construction of software architecture [17] , supervised classifiers [18] , spam filters [19] and natural language processing ...
doi:10.1371/journal.pone.0118394
pmid:25719799
pmcid:PMC4342245
fatcat:6ddzocgtwjdjfixlr3wamfneoi
A Comprehensive Review of Arabic Text summarization
2022
IEEE Access
and architectures, and the complexity of the Arabic language. ...
Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems ...
Applications of natural language processing include information retrieval, machine translation, questions and answers, and text summarization. ...
doi:10.1109/access.2022.3163292
fatcat:spdw4rrvmfgm3anlujx3ndvg2y
Entropic evolution of lexical richness of homogeneous texts over time: A dynamic complexity perspective
2016
Journal of Language Modelling
The entropies of these texts are calculated and treated as a time series data. ...
as text length increases. ...
doi:10.15398/jlm.v3i2.111
fatcat:buzekmsnazapbajcvk2rrkxugq
« Previous
Showing results 1 — 15 out of 1,962 results