i need a well constructive problem statement and well written proposal for thesis on the paper below.
Negative Selection Algorithm Based Intrusion Detection Model 978-1-7281-5200-4/20/$31.00 ©2020 IEEE Negative Selection Algorithm Based Intrusion Detection Model Salau-Ibrahim Taofeekat Tosin Department of Computer Science Al-Hikmah University Ilorin, Nigeria
[email protected] Jimoh Rasheed Gbenga Faculty of Communication and Information Science University of Ilorin Ilorin, Nigeria
[email protected] Abstract— The ever-growing security challenges have been a hindrance to the success of Information Technology Innovations due to multifaceted network intrusions. Hence, it becomes imperative to provide tools that can address without compromising integrity, confidentiality and availability of network resources. This paper presents a model for detecting intrusion in a network using Negative Selection Algorithm. Negative Selection which is Human Immune System (HIS) inspired has been used for anomaly detection due to its self-non- self-discrimination potential. However, it suffers from high rate of false positives and scalability issues. This paper addresses the issues using feature selection to reduce the dimensionality of the dataset. The intrusion detection model is evaluated using NSL- KDD dataset. The results obtained using the benchmark dataset showed that the scalability issue reduced in the proposed approach. Keywords— Intrusion Detection System, Artificial Immune System, Negative Selection Algorithm, Feature Selection. I. INTRODUCTION The rapid growth of technologies and sophisticated cyber threats have made research in the field of network intrusion detection open ended as new intrusions are being introduced everyday. Consequently, it is vital to deploy better ways of detecting intrusions on today’s Information Systems (IS) that thrive and deliver considerably in a networked environment. Therefore, new Intrusion Detection Systems (IDS) must be capable of identifying unknown threats as well as cope with the voluminous data from the networked systems. Intrusion Detection systems (IDS) are hardware or software system for automating the process of intrusion detection in a computer or network. IDS is an old concept that has being in existence since the 1980’s. It was first introduced to the research community in James Anderson’s influential paper [1]. Since that time, research on intrusion detection systems has gained substantial focus as a result of advancement in technology that has increased the vulnerability of information system assets to various attacks. Over the years, deployed IDS do not have the ability to detect previously unknown intrusions[2], [3]. This is a great concern as the nature of intrusions keep evolving. This has prompted research into the defense mechanism of the human system. The human system mechanisms have served as inspiration for development nature inspired algorithms. These nature inspired algorithms such as Artificial Immune System (AIS) and Artificial Neural Network (ANN) have gained popularity due to their ability to efficiently solve real- world problems[4]. AIS are negative selection algorithm (NSALG), clonal selection algorithm (CSALG), artificial immune network algorithm (AINALG) and dendritic cell Algorithm (DCALG) [5].Recently, many IDS models are implemented using AIS algorithms[6]–[10]. In this paper, a NSALG based anomaly IDS model is proposed using wrapper based feature selection to tackle the scalability issue. The model has three modules: data preprocessing module, NSALG module and the Alert generation module. The rest of the paper is organized as follows: Section 2 provides review on IDS, AIS and NSALG concepts. Then, the methodology is explained in Section 3. Section 4 presents the experiment conducted, results obtained as well as comparison between NSALG with and without feature selection (FS). Lastly, Section 5 will include conclusion and future direction of the research. II. RELATED WORK A. Intrusion Detection System Intrusion detection is the process of detecting activities on a computer or networked computers that attempts to compromise confidentiality, integrity and availability of resources. Generally, components of intrusion detection systems are data collection, detection and the response. The data collection component is composed of the target system, event generator, log data storage and data collection configuration. The detection component is made up of analysis engine, state information, and detection policy. Lastly, the response component is made up of response unit and response policy. The components of IDS are depicted in Fig. 1. Fig. 1. Components of IDS [11] The target system is usually the system under surveillance. It collects data such as network traffic, system logs, and application logs. The event generator controls log information. Also, data transformation and cleaning takes place during the event generation. The transformed data resides in the log data storage in preparation for analysis. The data collection configuration contains configuration information on how data collected are handled. Analysis This research work is partially funded by National Office for Technology Acquisition and Promotion (NOTAP)-Industrial Technology Transfer Fellowship. 202 Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 19:55:35 UTC from IEEE Xplore. Restrictions apply. engine handles implementation of the detection algorithm. The detection policy contains information about representation, threshold values and affinity measures. The state information system contains details about the current state of the system. Response unit receives information about the nature of event that occurred; is it normal or intrusive? Using the stored rules in the response policy database, the appropriate countermeasure to the event is triggered. IDS can be categorized either by analysis/detection or placement/location approach. Considering the method of detection, IDS can be misuse or anomaly detection. Misuse detection uses database of known intrusion while anomaly detection uses database of normal processes on the computer or network. For placement approach, IDS can be Host-based (Resident on the host computer) or Network-based (resident on the network). For detailed taxonomy of IDS refer to [12], [13]. Several techniques have been used for designing intrusion detection systems such as statistical methods, data mining techniques, knowledge-based techniques, mobile agent based and machine learning approach [2], [3], [13]–[17]. As promising has these methods seem, there are still challenges in differentiating malicious actions from normal actions, difficulty in parameter selection and modelling behavior using stochastic methods. Furthermore, problem in getting high quality training data, high resources consumption, space issue for huge amount of data required for analysis, time, high false positive as well as inability to detect new intrusions are issues that require attention. [2], [18], [19]. B. Negative Selection Algorithm NSALG works based on the principles of T cells maturation and ability to distinguish between self and non- self cells in the human body. These functions of maturation and discrimination takes place in the thymus. In the thymus, T-cells that bind to self-proteins (self-cells) are destroyed via apoptosis. Afterwards, only T-cells that can recognize antigens remain i.e. those that do not bind to self-protein. The remaining T-cells are the matured ones that eventually leave the thymus to flow through the body to protect against foreign antigen [7], [20], [21]. In the field of computer security, NSALG first came into limelight in 1994 by Forrest[22]. Then, it was applied to determine the effect of computer virus. The algorithm has two phases: detector generation and detection phases. In the detector generation phase, self- elements are compared with randomly generated elements in the system to be monitored. Self-elements refer to normal elements in a system. For instance, self-elements in a host system are normal system call sequences. Any randomly generated element that matches any self-element is removed while those that do not match becomes valid detectors. This first phase can be likened to a process of training the system. In the detection phase, which on the other hand can be likened to the testing of the system, any non-self-element entering the system is recognized when any of the valid detectors matches incoming elements. Such element is tagged non-self or an anomaly. NSALG is distinctive and desirable for anomaly intrusion detection due to its abilities of non- prior knowledge of known attacks coupled with being a one-class classifier as only the self-element is used to train the system[23]. However, studies in the past revealed the cons of NSALG has scaling due to large number of detector required for detection, curse of dimensionality as well as high false positives [5], [6], [23], [24]. This present study therefore seeks to tackle the scalability issue using dimensionality reduction. C. Feature Selection Feature Selection (FS) is a type of dimensionality reduction technique used to reduce the number of attributes (dimensions) of a dataset in order to alleviate the problems of curse of dimensionality. This usually involves the removal of irrelevant and redundant features to improve performance and enhance model understanding. According to Tan, Li, Lin, & Lin (2015) it is a data preprocessing stage. Wrapper based FS was adopted for this study because it retains the core semantics of the dataset as against feature extraction that changes the core semantics of the dataset [26]–[28]. Wrapper based feature selection determines feature relevance by using a learning algorithm on the selected feature subset. Afterwards, the subset is judged based on the accuracy obtained when wrapped around a classifier. In this paper, ANN was used to perform the wrapper based FS. ANN is a machine-learning algorithm with high- speed processing power. It is an information manager model that borrows idea from the human brain nervous system[29]. It make models that handle enormous data instances accurate and easy to use. ANN has several application areas such as optimization, intrusion detection and classification just to mention a few. However, the strong point for its consideration for wrapper FS is its ability to learn trends in data and patterns [29], [30]. III. METHODOLOGY To improve the scalability issue of the NSALG when used for network intrusion detection, this study introduces wrapper based FS to reduce the number of features as well as detectors required for detection. The proposed model has three modules as shown in Fig. 2. The data preprocessing module, the NSALG module and the alert generation module. The details are explained as follows: A. Phase I: Data Preprocessing Stage Data encoding, data normalization and feature selection takes place in this phase. Changing symbolic data to numeric value was the main activity