inTIME: A Machine Learning-Based Framework for Gathering and Leveraging Web Data to Cyber-Threat Intelligence
In today's world, technology has become deep-rooted and more accessible than ever over a plethora of different devices and platforms, ranging from company servers and commodity PCs to mobile phones and wearables, interconnecting a wide range of stakeholders such as households, organizations and critical infrastructures. The sheer volume and variety of the different operating systems, the device particularities, the various usage domains and the accessibility-ready nature of the platforms
... a vast and complex threat landscape that is difficult to contain. Staying on top of these evolving cyber-threats has become an increasingly difficult task that presently relies heavily on collecting and utilising cyber-threat intelligence before an attack (or at least shortly after, to minimize the damage) and entails the collection, analysis, leveraging and sharing of huge volumes of data. In this work, we put forward inTIME, a machine learning-based integrated framework that provides an holistic view in the cyber-threat intelligence process and allows security analysts to easily identify, collect, analyse, extract, integrate, and share cyber-threat intelligence from a wide variety of online sources including clear/deep/dark web sites, forums and marketplaces, popular social networks, trusted structured sources (e.g., known security databases), or other datastore types (e.g., pastebins). inTIME is a zero-administration, open-source, integrated framework that enables security analysts and security stakeholders to (i) easily deploy a wide variety of data acquisition services (such as focused web crawlers, site scrapers, domain downloaders, social media monitors), (ii) automatically rank the collected content according to its potential to contain useful intelligence, (iii) identify and extract cyber-threat intelligence and security artifacts via automated natural language understanding processes, (iv) leverage the identified intelligence to actionable items by semi-automatic entity disambiguation, linkage and correlation, and (v) manage, share or collaborate on the stored intelligence via open standards and intuitive tools. To the best of our knowledge, this is the first solution in the literature to provide an end-to-end cyber-threat intelligence management platform that is able to support the complete threat lifecycle via an integrated, simple-to-use, yet extensible framework.