Identifying the Significant Features in Illegal Texts
Выявление значимых признаков противоправных текстов

Nina Avanesyan, Higher School of Economics, Fedor Solovev, Elizaveta Tikhomirova, Andrey Chepovskiy, Federal Research Center "Informatics and Management", Bauman Moscow State Technical University, Peoples' Friendship University of Russia, Federal State Budget Educational Institution of Higher Education "MIREA-Russian Technological University", Higher School of Economics
2020 Voprosy kiberbezopasnosti  
The purpose of the study: development of a technique for determining lexical characteristics and psycholinguistic factors as discriminative features for identifying the topics of illegal texts by frequency methods for information security purposes. Method: automatic morphological and syntactic analysis, frequency methods, comparison of auto-generated dictionaries by correlation analysis methods. Results: a technique of frequency analysis of the illegal texts vocabulary has been developed, which
more » ... en developed, which allows to compare different sets of texts using frequency dictionaries and identify discriminative features; a technique of calculating pairwise rank correlation coefficient for comparison of frequency dictionaries of various lexical characteristics has been presented; a comparative analysis of different illegal texts collections has been carried out; the possibility of using frequency lexical characteristics to study the properties of texts in order to detect illegal resources and messages has been shown; the possibilities of using both morphological characteristics of words and word combinations and letter combinations as discriminative features have been shown; the possibility of calculating the psycholinguistic indicators of illegal texts based on automatic linguistic text analysis has been shown; the psycholinguistic characteristics for texts of various topics have been highlighted.
doi:10.21681/2311-3456-2020-04-76-84 fatcat:ngyac25kwbatllvhfpg757xllu