135,158 Hits in 5.9 sec

The Entropy of Words—Learnability and Expressivity across More than 1000 Languages

Christian Bentz, Dimitrios Alikaniotis, Michael Cysouw, Ramon Ferrer-i-Cancho
2017 Entropy  
on text size, register, style and estimation method, as well as non-independence of words in co-text.  ...  The entropy difference between words with and without co-textual information is narrowly distributed around ca. three bits/word.  ...  Word Entropy Estimation A crucial pre-requisite for entropy estimation is the approximation of the probabilities of word types. In a text, each word type w i has a token frequency f i = f req(w i ).  ... 
doi:10.3390/e19060275 fatcat:rjbauypw25ffhi4lx334ocg63a

Authorship Detection with PPM Notebook for PAN at CLEF 2013

Victoria Bobicev
2013 Conference and Labs of the Evaluation Forum  
This paper reports on our work in the PAN 2013 author identification task. The task is to automatically detect the author of the given text having small training sets with known authors.  ...  It is natural that for different types of categorization different methods are pertinent.  ...  Treating a text as a string of characters, a character-based PPM avoids defining word boundaries; it deals with different types of documents in a uniform way.  ... 
dblp:conf/clef/Bobicev13 fatcat:bkk2cjmcxzanpiouok5sb6lvla

Markov Matrix and Entropy based Tamper Detection Technique for Text Images

Balkar Singh, Computer Science and Engineering Department,Thapar Institute of Engineering and Technology Patiala, Punjab INDIA, M. K. Sharma, School of Mathematics,Thapar Institute of Engineering and Technology Patiala, Punjab INDIA
2021 Maǧallaẗ al-abḥāṯ al-handasiyyaẗ  
In this paper, a novel watermarking technique for the tamper detection of text images is pro- posed.  ...  The ZWCs of entropy of each sentence is embedded at the end of every sentence after terminator. ZWCs of the character patterns are embedded in the end of the text of the image.  ...  The proposed technique can detect different types of activities like alter or add any word in the text.  ... 
doi:10.36909/jer.10947 fatcat:mrijauxqfvaxhezy2vyn6dlpla

Symmetry, Entropy and Computer Science

Ángel Garrido
2018 Proceedings (MDPI)  
However, the approach used by Shannon differs from that of Wiener in the nature of the transmitted signal and in the type of decision made at the receiver.  ...  From these ideas, he analyzes the entropy of literary texts.  ...  However, the approach used by Shannon differs from that of Wiener in the nature of the transmitted signal and in the type of decision made at the receiver.  ... 
doi:10.3390/proceedings2010080 fatcat:f5kaku75uvbllcraldoupmby5y

A Stylometric Analysis of Iranian Poets

Sohrab Rezaei, Nasim Kashanian
2017 Theory and Practice in Language Studies  
Over the past 3 centuries many types of textual tools has been introduced to discriminate different authors objectively that developing in computer programing has played the important role for using these  ...  Result shows differences between their styles in terms of these parameters. This way of analyzing writing of different authors has some implications in different field of sociolinguistic and TOEFL.  ...  Type Token Ratio As the Type token is sensitive to the length of the text, the type token is limited to the first n-number of words in the shortest writing sample.  ... 
doi:10.17507/tpls.0701.07 fatcat:myfbfzj4cve3ddwjabmw2jvvga

Fast calculation of entropy with Zhang's estimator [article]

Antoni Lozano, Bernardino Casas, Chris Bentz, Ramon Ferrer-i-Cancho
2017 arXiv   pre-print
The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types.  ...  Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang's estimator.  ...  Theoretically, entropy is a better measure of vocabulary size than the raw number of different types: the entropy of types measures the effective size of the vocabulary, which is related to the concept  ... 
arXiv:1707.08290v1 fatcat:jk2opxs66ndzzec633lkqlr3ii

SVD Entropy Reveals the High Complexity of Ecological Networks

Tanya Strydom, Giulio V. Dalla Riva, Timothée Poisot
2021 Frontiers in Ecology and Evolution  
Pollination networks, in particular, are more complex when compared to other types of interactions.  ...  In addition, we find that SVD entropy relates to other structural measures of complexity (nestedness, connectance, and spectral radius), but does not inform about the resilience of a network when using  ...  , we do find that different types of interaction networks have differing SVD entropies.  ... 
doi:10.3389/fevo.2021.623141 fatcat:y2ycbsxh5fc5bmff7iubdsvy2y

Entropy of English text: Experiments with humans and a machine learning system based on rough sets

Hamid Moradi, Jerzy W. Grzymala-Busse, James A. Roberts
1998 Information Sciences  
The goal of this paper is to show the dependency of measured entropy of English text on subject of the experiment, the type of English text, and the methodology used to estimate the entropy.  ...  In this experiment, a different type of text was used.  ...  It would only make sense that the type of text being used would make a difference in the entropy calculation.  ... 
doi:10.1016/s0020-0255(97)00074-1 fatcat:4oyrn2waxzd75iungkmuuyvsxq

Text Classification Using Word-Based PPM Models

Victoria Bobicev
2006 Computer Science Journal of Moldova  
Although in some cases the entropy difference which influenced the choice was rather small (several hundredths), most of the documents (up to 97 %) were classified correctly.  ...  In this paper the application of word-based PPM (Prediction by Partial Matching) model for automatic content-based text classification is described.  ...  It is natural that for different types of categorization different methods are pertinent.  ... 
doaj:b17949eaf9154f2c9e831f5ba88143ed fatcat:nwlz6onsbrfezmlj6p5i6tybue

SSE Lossless Compression Method for the Text of the Insignificance of the Lines Order [article]

Juncai Xu, Weidong Zhang, Qingwen Ren, Xin Xie, Zhengyu Yang
2017 arXiv   pre-print
There is a special type of text which the order of the rows makes no difference (e.g., a word list).  ...  To compress these special texts, the traditional lossless compression method is not the ideal choice. A new method that can achieve better compression results for this type of texts is proposed.  ...  The symbol should be different from any symbol in the text. For example, in a word list, a space can be used as an Empty Symbol.  ... 
arXiv:1709.04035v2 fatcat:y6zwicsy4jfvxnqviy5fkhs6ly

The word entropy of natural languages [article]

Christian Bentz, Dimitrios Alikaniotis
2016 arXiv   pre-print
We here use parallel texts of 21 languages to establish the number of tokens at which word entropies converge to stable values.  ...  These convergence points are then used to select texts from a massively parallel corpus, and to estimate word entropies across more than 1000 languages.  ...  Block entropies In a text, each word type w i has a token frequency f i = f req(w i ).  ... 
arXiv:1606.06996v1 fatcat:beyuikfxkfcl3je263hbtngkyi

Pictish symbols revealed as a written language through application of Shannon entropy

R. Lee, P. Jonathan, P. Ziman
2010 Proceedings of the Royal Society A  
This paper reports on a two-parameter decision-tree technique that distinguishes between the different character sets of human communication systems when sample sizes are small, thus enabling the type  ...  S d : the number of different character pairs that appear only once in a text.  ...  Tait, Clive McDonald, Richard Price and John Love for critical discussions and reading of the manuscript; Nigel Tait for technical help with the coding of the macros; and the referees for their help in  ... 
doi:10.1098/rspa.2010.0041 fatcat:jgqiv6mbxndghhg3jorddczkja

Revisiting the Predictability of Language: Response Completion in Social Media

Bo Pang, Sujith Ravi
2012 Conference on Empirical Methods in Natural Language Processing  
While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature.  ...  We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in conversational scenarios, to better understand predictability of user generated  ...  Previous work in predictive text input had very different focus from our study.  ... 
dblp:conf/emnlp/PangR12 fatcat:f5levac6prc7xi2xift5lphqvq

Morphosyntactic predictability of translationese

Dmitry Nikolaev, Taelin Karidi, Neta Kenneth, Veronika Mitnik, Lilja Saeboe, Omri Abend
2020 Linguistics Vanguard  
We test these competing observations by measuring morphosyntactic entropies of original and translated texts in several languages and show that there may exist a categorical distinction between translations  ...  AbstractIt is often assumed that translated texts are easier to process than original ones.  ...  A measure of morphosyntactic distance between languages We were interested in how differences in morphosyntactic entropy between original texts in different languages and texts in the same languages translated  ... 
doi:10.1515/lingvan-2019-0077 fatcat:uzkmtgaiqvdpxccemaqymwoxie

Music viewed by its entropy content: A novel window for comparative analysis

Gerardo Febres, Klaus Jaffe, Konradin Metze
2017 PLoS ONE  
type, style, composer and genre.  ...  These classification techniques promise to be useful in other disciplines for pattern recognition and machine learning.  ...  Acknowledgments We want to thank those musicians and enthusiasts who produced several web sites where MIDI sequences of several kinds of music are available, accompanied with additional information presented in  ... 
doi:10.1371/journal.pone.0185757 pmid:29040288 pmcid:PMC5645004 fatcat:oxbriofnv5hefd2e3vw2uyzane
« Previous Showing results 1 — 15 out of 135,158 results