(1) Understanding Dataset: UNSW-NB15The raw network packets of the UNSW-NB15 | dataset was created by the IXIA PerfectStormtool in the Cyber Range Lab of the Australian Centre for Cyber Security...

1 answer below »

This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings. Dataset: UNSW-NB15




(1) Understanding Dataset: UNSW-NB15 The raw network packets of the UNSW-NB15 | dataset was created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modem normal activities and synthetic contemporary attack behaviours. Tepdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label. a) The features are described here. b) The number of attacks and their sub-categories is described here, ©) In this coursework, we use the total number of 10-million records that was stored in the CSV file (download). The total size is about 600MB, which is big enough to employ big data methodologies for analytics. As a big data specialist, firstly, we would like to read and understand its features, then apply modeling techniques. If you want to see a few records of this dataset, you can import t into Hadoop HDF, then make a Hive query for printing the first 5-10 records for your understanding. (2) Big Data Query & Analysis by Apache Hive [30 marks] This task is using Apache Hive for converting big raw data into useful information for the end users. To do so, firstly understand the dataset carefully. Then, make at least 4 Hive Queries (refer to the marking scheme). Apply appropriate visualization tools to present your findings numerically and graphically. Interpret shortly your findings. Einally, take screenshot of your outcomes (e.q.. tables and plots) together with the scripts/queries into the report.
Answered 157 days AfterJun 05, 2022

Answer To: (1) Understanding Dataset: UNSW-NB15The raw network packets of the UNSW-NB15 | dataset was created...

Banasree answered on Nov 10 2022
50 Votes
Ans.
L
ist of attacks UNSW NB -15
    Category
    Training set
    Testing...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here