One-pass-throw-away learning for cybersecurity in streaming non-stationary environments by dynamic stratum network

Mongkhon Thakong, Suphakant Phimoltares, Saichon Jaiyen, Chidchanok Lursinsap, Hua Wang
2018 PLoS ONE  
Throughout recent times, cybersecurity problems have occurred in various business applications. Although previous researchers proposed to cope with the occurrence of cybersecurity issues, their methods repeatedly replicated the training processes for several times to classify datasets of these problems in streaming non-stationary environments. In dynamic environments, the conventional methods possibly deteriorate the adaptive solution to prevent these issues. This research proposes a
more » ... row-away learning using the dynamical structure of the network to solve these problems in dynamic environments. Furthermore, to speed up the computational time and to maintain a minimum space complexity for streaming data, the new concepts of learning in forms of recursive functions were introduced. The information gain-based feature selection was also applied to reduce the learning time during the training process. The experimental results signified that the proposed algorithm outperformed the others in incremental-like and online ensemble learning algorithms in terms of classification accuracy, space complexity, and computational time. OPEN ACCESS Citation: Thakong M, Phimoltares S, Jaiyen S, Lursinsap C (2018) One-pass-throw-away learning for cybersecurity in streaming non-stationary environments by dynamic stratum network. PLoS ONE 13(9): e0202937. https://doi.org/10.intrusion detection, phishing detection, ransomware detection, and malware classification [8] [9] [10] [11] [12] [13] . As the number of attacks and its severity are expected to continuously increase over the next several years, the main concern of these cyber applications is how the system can be protected from such attacks. The learning of information acquired from cybersecurity domain has been applied for recognizing a data-driven solution as a consequence of a large amount of raw data available and the worldly cyber-attacks made throughout the world [14] . This situation is difficult to overcome with human expertise on these attacks occurring in a variety of scenarios. Learning processes have been developed by merging knowledge learned from previously seen data, along with an analysis of human expertise, to provide a scalable solution. The adaptive solution of learning has been widely designed in several security applications. For example, supervised learning techniques of classification are used for spam filtering [10, 15, 16] . Alternatively, graph-based learning can also be applied to find relationships between the reviews and their corresponding authors [17] . In addition, various techniques of machine learning, i.e. decision tree algorithm [18], ensemble and hybrid classifiers [19], support vector machines [20], bayesian network [21], and genetic algorithm [22], were feasibly applied to enhance the intrusion detection systems. Furthermore, many statistical detection techniques [21] were used for anomaly detection in network traffic. Hybrid approaches combining supervised and unsupervised techniques based on machine learning algorithms were also used for the detection of network attacks [23]. Other techniques introduced self-structuring neural network [24] or associative classification [25] for predicting the phishing website. With the characteristic of a non-stationary environment, the changes in data classes can conduce to the structure of dynamic data distribution over a period of time. Types of changes in non-stationary environments concern gradual changes, recurrent concepts, and sudden drift [26, 27] . As summarized in [1], the learning of non-stationary environments was introduced. Recently, learning methods have been mainly aiming to solve several clustering and classification problems. For instance, the streaming ensemble algorithm (SEA) [28] is the first ensemble of classifiers to learn the non-stationary environments for each consecutive windows of time of the training set. Concept drift very fast decision tree (CVFDT) [29] is one of the most well-known streaming data mining methods to cope with concept drift by using a fixedsize window of instances. Ensemble learning under non-stationary environments was proposed to use weighted majority vote (WMV) [30] based on the loss function for analyzing the probability of multiple expert systems. The dynamic weighted majority [31] was proposed to learn under online situations by adding or removing the number of classifiers for tracking concept drift. The arrangements of processing examples described in [32] , chunk-based and online ensembles were intended for applications with strict time and memory constraints. The accuracy updated ensemble (AUE) [26] based on chunk-based learning mode was proposed. In this ensemble, all component classifiers were incrementally updated with a chunk of instances. Online learning ensemble, namely the online accuracy updated ensemble (OAUE) [33] was introduced for improving AUE in the aspect of classification and training time. The anticipative dynamic adaptation to concept changes (ADACC) [34] ensemble was proposed to optimize control over the online classifiers by recognizing concepts in incoming instances. Adaptive random forest (ARF) [35] , introduced by Gomes et al., was used in the classification of evolving data streams. The learning of changes in the environments considers the purpose to preserve all acquired knowledge. This is accomplished by aggregating new knowledge and retaining existing knowledge, as called stability-plasticity dilemma [27] . Although there have been several learning models proposed to deal with the problems of cybersecurity, other issues still remain. The process of several iterations by using ensemble and hybrid classifiers and particularly storing training data in a sliding-window size has been One-pass-throw-away learning by dynamic stratum network PLOS ONE | https://doi.
doi:10.1371/journal.pone.0202937 pmid:30188908 pmcid:PMC6126810 fatcat:ygy4osz5czesxljxd6zkksdtqi