Answer To: Assessment item 5 Project Closure: Project Report (Blog) and Seminar Value: 40% Due...
Amar answered on May 28 2020
Running Header: Big Data and Its Infrastructures 1
Big Data and Its Infrastructures
ABSTRACT
Big Data in essence represents an emerging domain that is applied towards the management of datasets that have sizes which are beyond abilities of the software tools that are commonly used for purposes of capturing, managing, as well as undertaking the timely analysis of data of this size. All the data shall most often be in the unstructured format as well as sourced from varied set of sources like that of sensors, social media, surveillance, scientific applications, image / video archives, medical records, indexing of internet search, system logs as well as business transactions In this context, the aims and objectives of this study included the Determination of the security level requirements for developing transmission network concerning the data exchange between IoT and Big data, and, the determination of features as well as policies that are required for network to ensure adequate levels of safety with respect to threats from data theft. The research approach encompassed development of a critical set of search criteria, employ the same to determine suitable set of materials and literature. The same shall be subjective to data analysis using qualitative reasoning for determining suitable outcome for the research aims and objectives. The outcome of the study is presented in this report.
Contents
ABSTRACT 2
INTRODUCTION 4
MATERIALS & METHODS 7
LITERATURE REVIEW 8
RESULTS 18
DISCUSSION 18
CONCLUSION 21
REFERENCES 22
INTRODUCTION
Big Data in essence represents an emerging domain that is applied towards the management of datasets that have sizes which are beyond abilities of the software tools that are commonly used for purposes of capturing, managing, as well as undertaking the timely analysis of data of this size. The overall quantity concerning data which needs to be analysed based on various projections can be expected for doubling each 2 years (Li et al., 2016; Kreutz et al., 2015; Botta et al., 2016). All the data shall most often be in the unstructured format as well as sourced from varied set of sources like that of sensors, social media, surveillance, scientific applications, image / video archives, medical records, indexing of internet search, system logs as well as business transactions (Li et al., 2016; Kreutz et al., 2015; Botta et al., 2016). In essence, Big data continues to gain increasing levels of attention as the overall number concerning the devices that are interconnected to Internet of Things (“IoT”) continues to increase reaching levels that are unforeseen, thereby producing significant amounts in data that shall require in being suitably transformed to that of valuable form of information. In addition, the same can be stated to be popular for buying additional set of computing power on an on-demand basis as well as storage by way of public based cloud providers for the performance of intensive and high level data parallel processing. By this approach, privacy as well as security issues could be boosted potentially by way of variety, volume, as well as wider area of deployment concerning system infrastructure for supporting the Big Data related applications (Li et al., 2016; Kreutz et al., 2015; Botta et al., 2016).
As there is expansion with respect to Big Data with aid of the conventional form of security solutions, public clouds, etc. tailored for private computing related infrastructures limited with respect to security perimeter that is well-defined, like that of demilitarized zones (“DMZ”) as well as firewalls seem to be not much effective (Li et al., 2016; Kreutz et al., 2015; Botta et al., 2016). The utilization of Big Data, functions of security are required for working over heterogeneous form of composition concerning the diverse set of hardware, network domains as well as operating systems. In the computing environment that is in the form of puzzle, abstraction capabilities concerning the Software Defined Network (“SDN”) seem to be highly significant characteristic which could enable efficient form of deployment concerning Big Data related secure services over top of heterogeneous form of infrastructure. There is an introduction by SDN leads to abstraction as the same separate the higher / control plane from that of system infrastructure that is underlying that is being controlled as well as supervised (Li et al., 2016; Kreutz et al., 2015; Botta et al., 2016). The separation of the control logic concerning network from that of physical routers as well as switches that is underlying which shall forward traffic and allows the system administrators for writing higher level of control programs which specifications of behaviour concerning the overall network, with contrast towards the conventional form of networks, wherein the administrators, in case it is allowed for doing the same by way of device manufacturers, need to codify the functionality with respect to lower levels of the device configuration (Yang et al., 2015; Lake et al., 2014; Li et al., 2015). The utilization of SDN, intelligent form of management concerning the secure set of functions could be implemented over the centralized controllers that is logical and simplification of the aspects which include the implementation concerning security related rules, system re-configuration, as well as, system evolution. In essence, robustness drawback concerning the centralized form of SDN solutions could be mitigated employing the hierarchy concerning controllers and / or by way of usage concerning the redundant form of controllers at the least with respect to most of the key set of system functions that shall be controlled (Yang et al., 2015; Lake et al., 2014; Li et al., 2015).
By way of proliferation concerning connected devices towards Internet as well as connected with one other, overall volume concerning collection of data, storage, as well as processed are increasing on a day to day basis that in addition brings about newer challenges with respect to information security (Yang et al., 2015; Lake et al., 2014; Li et al., 2015). Further the security mechanisms which are currently employed like that of DMZ as well as firewalls cannot be utilized with respect to infrastructure of Big Data as security related mechanisms need to be expanded out over the perimeter concerning the network of organizations for fulfilling data / user related mobility requirements as well as policies concerning Bring Your Own Devices (“BYOD”) (Yang et al., 2015; Lake et al., 2014; Li et al., 2015). The consideration of the newer set of scenarios, pertinent question shall be what the privacy as well as security technologies and policies shall be adequate in fulfilling current and highly relevant of the Big Data related privacy as well as the security related demands. These various challenges could be suitably organized in the form of four different elements of Big Data like that of infrastructure security like in case of the secure form of distributed computations utilizing MapReduce, data privacy like in case of data mining which shall preserve the granular / privacy accessibility, data management like in case of secure form of data provenance as well as storage, and, reactive and integrity security like in case of real time monitoring concerning the attacks as well as anomalies (Yang et al., 2015; Lake et al., 2014; Li et al., 2015).
In this context, the aims and objectives of this study includes the following –
· Determine the security level requirements for developing transmission network concerning the data exchange between IoT and Big data
· Determine features as well as policies that are required for network to ensure adequate levels of safety with respect to threats from data theft
MATERIALS & METHODS
The research approach encompasses development of a critical set of search criteria, employ the same to determine suitable set of materials and literature. The same shall be subjective to data analysis using qualitative reasoning for determining suitable outcome for the research aims and objectives.
The search criteria is presented as follows –
Attribute Name
Data Type
Description
IoT
Nominal
Search Term
Big Data
Nominal
Search Term
Data Security
Nominal
Search Term
Privacy
Nominal
Search Term
Device Security
Nominal
Search Term
2014
Numeric
Papers published on or after the year 2014
LITERATURE REVIEW
Singh & Reddy (2015) in the study titled "A survey on platforms for big data analytics" state that the primary level purpose concerning their study lies in the provision of analysis that of in-depth across varied platforms that are available to perform the big data analytics. Singh & Reddy (2015) survey the varied platforms of hardware that are available to undertake big data analytics as well as assess overall advantages as well as drawbacks concerning each of the platforms on the basis of varied set of metrics like that of data I / O rates, scalability, real time processing, fault tolerance, data size that is supported as well as the iterative form of task support. Further, over and above concerned hardware, the detailed form of description concerning the frameworks of software employed with respect to each of the platforms shall in addition be discussed in complement with the relative strengths as well as drawbacks. Certain critical form of characteristics which are described herein could aid potentially readers to make the decisions that are informed with respect to optimal choice concerning the platforms on the basis of computational requirements. By designing and utilizing star ratings matrix, Singh & Reddy (2015) present rigorous form of qualitative comparison amongst the various platforms as well as suitable discussions with respect to all six of the characteristics which form crucial elements for algorithms pertaining to big data analytics. For the purposes of providing increased insights with respect to effectiveness concerning each platform with respect to big data analytics, particular implementation / execution level information concerning the widely employed algorithm of k-means clustering across various platforms has also been presented as a pseudo-code.
Win et al. (2018) in the study titled "Big data based security analytics for protecting virtualized infrastructures in cloud computing" state that the virtualized form of infrastructure with respect to cloud computing have turned out to be attractive form of target amongst the cyber attackers for launching sophisticated as well as advanced attacks. Win et al. (2018) proposes a new and big data reliant approach for security analytics that shall detect the advanced form of attacks with respect to virtualized infrastructures. The network logs and the logs concerning user application shall be collected in a periodical manner from virtual machines (“VM”) of the guests and stored within Hadoop Distributed File Systems (“HDFS”). Later, extraction concerning the features of attack shall be undertaken by way pf graph based events correlation as well as the MapReduce parser form of identification concerning the paths of potential attacks. Further, determination of the presence of an attack shall be undertaken by way of...