A Real-Time Intrusion Detection System based on Machine Learning and Big Data Techniques

Farah Jemili

A Real-Time Intrusion Detection System based on Machine Learning and Big Data Techniques

Visit for more related articles at American Journal of Computer Science and Information Technology

Abstract

Cybersecurity ventures expects cyber-attacks damage costs will rise to $11.5 billion in 2019 and that a business will fall victim to a cyber-attack every 14 seconds. Notice here that the time frame for such event is seconds. With data generated by peta bytes each day this is a challenging task for traditional intrusion detection systems (IDSs). Protecting sensitive information is a major concern for both businesses and governments. Therefore, the need for a real-time, large-scale and effective IDS is a must. In this work we present a cloud based, fault-tolerant, scalable and distributed IDS that uses Apache Spark Structured Streaming (PySpark) and its Machine Learning library (MLlib) to detect intrusions in real-time. To demonstrate the efficacy and effectivity of this system, we implement the proposed system within Microsoft Azure Cloud as it provides both processing power and storage capabilities. A decision tree algorithm is used to predict incoming data’s nature. For this task, the use of the MAWILab dataset as a data source will give better insights about the system capabilities against cyber-attacks. The experimental results showed a 99,95 % accuracy and more than 55175 events per second were processed by the proposed system on a small cluster.