Social Media as an Innovation - The Case of Twitter

Andrew B. Whinston, Huaxia Rui
2010 Social Science Research Network  
Criminals use online social networks for various activities by including communication, planning, and execution of criminal acts. They often employ ciphered posts using slang expressions, which are restricted to specific groups. Although literature shows advances in analysis of posts in natural language messages, such as hate discourses, threats, and more notably in the sentiment analysis; research enabling intention analysis of posts using slang expressions is still underexplored. We propose a
more » ... lored. We propose a framework and construct software prototypes for the selection of social network posts with criminal slang expressions and automatic classification of these posts according to illocutionary classes. The developed framework explores computational ontologies and machine learning (ML) techniques. Our defined Ontology of Criminal Expressions represents crime concepts in a formal and flexible model, and associates them with criminal slang expressions. This ontology is used for selecting suspicious posts and decipher them. In our solution, the criminal intention in written posts is automatically classified relying on learned models from existing posts. This work carries out a case study to evaluate the framework with 8,835,290 tweets. The obtained results show its viability by demonstrating the benefits in deciphering posts and the effectiveness of detecting user's intention in written criminal posts based on ML. Information 2020, 11, 154 2 of 40 of social networks [4] , such as the analysis and detection of intentions related to crimes [5] . It is desirable, for example, to distinguish between who is inducing a person to commit a crime and who is commenting on a crime announced by the media. Therefore, automated analysis of users' intentions in social network posts can support investigators to understand the goals of the posts. Automated analysis of user's intentions described in natural language posts remains an open challenge. According to Wu et al. [6], the language used in social networks is short and informal, which represents a computational challenge for automated mechanisms. The use of slang, such as those expressed by criminals, incorporates complexity in designing effective computational algorithms. In this investigation, we refer to the dialect of (cyber) criminal as Criminal Slang Expression (CSE). The identification and classification of crimes mediated by the Web are hard tasks. There is a huge amount of Web content as unstructured data, which makes it arduous to analyze manually [7] . The complexity is even higher when people use CSE, which is not formally defined. It is used to hinder others from understanding. This produces a "secret" language, which can be used in conjunction with data encryption (out of the scope of this paper) [8] . Despite the use of complex cryptography techniques [9,10], which is not always viable, criminals use marginal/group CSE for a more restricted communication to a specific group. In addition, the volume and velocity of crimes execution committed with the support of the Web (e.g., drug trafficking and terrorism) [11] make it harder to deal with this problem. Literature presents lexical dictionaries aiming to represent CSE. For instance, Mota [12] presented an extensive work with CSE used in Rio de Janeiro, Brazil. These dictionaries are mostly built to support human (criminal investigators) in interpreting CSE. Nevertheless, formal models suited to represent CSE that can be interpreted by computers are still needed. Such models should provide flexible and expansive constructions because CSEs are in constant evolution. According to Agarwal and Sureka [13] , only lexical dictionaries are not enough for building automatic intention detection mechanisms. Automated detection using an exclusively lexical approach proved to be flawed [14] , requiring the adoption of multiple techniques, such as Machine Learning (ML) ones. Techniques and improvements in the area of categorizing emotions and feelings (that addresses intentions indirectly) have been explored (cf. Section 3). However, literature shows that evaluation and representation of criminal intentions in natural language texts are rare. In this context, there is a lack of further studies exploring the use of linguistic fundamentals of intention analysis in written texts, such as Semiotics [15] and Speech Act Theory (SAT) [16, 17] . In this paper, we present a framework for selecting and classifying social network posts with CSE. We combine Semantic Web technologies and ML algorithms to this end. The Semantic Web provides computer interpretable models [6] , which are useful for representing semantic relations between domain concepts. Law and security concepts [3, [18] [19] [20] , for instance, can be better interpreted (by human and computers) through the formal representation using ontologies. We chose to use an ontology to describe aspects related to the language used to commit criminal acts. The use of ontologies allows us to describe relationships and make inferences to determine weights related to suspicious messages, which cannot be represented by simple vocabularies, or other less structured knowledge representation systems. In addition, ontologies are expandable and interoperable models on the Web, which can be (re)used by other Web systems. Our work proposes the Ontology-Based Framework for Criminal Intention Classification (OFCIC). The solution explores automatic classification models and algorithms applied to short textual messages to help in the detection of digital criminal acts. We propose the Ontology of Criminal Expressions (OntoCexp) [21] to provide a formal and extensible model for representing CSE on social networks. The framework OFCIC uses the ontology OntoCexp for selecting potentially crime related posts, as well as to automatically decipher the posts. The ML techniques are used for automatically classifying the posts according to an intention classification framework [22, 23] . Our solution provides computational mechanisms and a software prototype to support investigators in the task of selecting potential criminal posts and filter them according to a predefined
doi:10.2139/ssrn.1564205 fatcat:72je6vi64fdolctmatffxgtlce