ASSIGNMENT 1 CSC3502 Principles of Big Data Management Student name and number: Marks Selection of publicly available data set(s): Provided the file names of at least 6 selected data sets from this...

1 answer below »
plz, find attached dox.




ASSIGNMENT 1 CSC3502 Principles of Big Data Management Student name and number: Marks  Selection of publicly available data set(s):       Provided the file names of at least 6 selected data sets from this web site https://data.qld.gov.au/dataset/visitor-statistics-publications-qld-gov-au.   /3  Understanding the data itself       The report conveys critical analysis of the data contained in the Queensland Government, Open Data Portal, contains public data sets (visitor statistics) ranging from 2013 to 2019. It shows an understanding of the data contained in these data sets, and provides additional consideration for potential grouping or categorising this data. The critical analysis highlights particular patterns identified by considering the number of times a web site has been accessed at particular times within the years. The explanation of considering using different file types (CVS, XML, PDF, Json) compared to the current one used for these data sets, shows a clear understanding of importing data of the different files types into a database.   /37  Propose a new design       Data modelling conveys the details of the new design, including use of existing and proposed new data, for the data sets, that would allow a software developer to redesign the way this data is captured. The tables created in the database are part of communicating this new design. The report takes the reader through the design of the data set, explaining the corresponding data modelling and new database tables. It provides an insight into the new design and how this may impact and improve querying the database compared to the current data set. This is further enhanced by the description of the relationship diagram and corresponding screen captures.   /35  Privacy law and ethics       The discussion on the privacy law applied for shows an understanding of the privacy law and how this has been applied by the Queensland Government to their current data sets.   /5 Writing style and report format       Succinct throughout the report demonstrating good writing skills  (5 marks) Correctly applied report format according to the link provided to students in the assignment specification (5 marks) Used EndNote for in-text citation and referencing throughout the report (5 marks) Adhered to the limitations set in the assignment specification (5 marks)   /20 Total   /100 Assignment 1 - Assignment specification For your assignment, you are required to critically analyse and interpret the knowledge from large datasets and propose a new design, use data modelling, creation database tables, relationship diagram, and provide a report on your analysis and design, and discuss privacy law writing to specialist and non-specialist audiences. For details see further below. The data sets to use for this assignment The Queensland Government, Open Data Portal, contains public data sets (visitor statistics) ranging from 2013 to 2020.  Critically analyse the data sets You are tasked to critically analyse the data in the visitor statistics data sets (years 2013 to 2019). Communicate your understanding of the data in the data sets and provide additional consideration around grouping or categorising data.  For example, is there a pattern between the highest number of people accessing a particular web site (URL in the data set) at a given time (month and year) that could potentially be linked with something else, such as flu season? Therefore can the data be grouped into health, sport, etc. Explain your consideration of different file formats such as CVS, XML, PDF, and JSON data files compared to the current file types. In writing provide evidence that you understand the impact of different file types when importing into a database or eyeballing the data Propose a new design Your aim is to propose a better way to capture all of this data to improve the public data sets and usability. You can include suggestions of additional data to be captured (without compromising privacy). You are focusing on what data should be captured so that this can be used by software developers to be implemented. For your proposed changes to improve the data sets, you need to apply data modelling,  create tables in a database (a database that we use in this course), and using the database feature to create the relationship between the tables. In writing provide the audience with your insight into this design and how this new design is going to improve working with the data sets to find patterns. Provide evidence of data modelling, all tables, and relationship diagram through screen captures and discuss the screen captures in your writing. Considering that people would want to use the data sets that are already there provide suggestions on how these current data sets can potentially be converted to fit into your new proposed model. Privacy law For your report, review the web site that makes these data sets available and in your report discusses what kind of information about privacy is available and if the data sets have been anonymized. Describe your process of ensuring that the data does not contain any sensitive information that could potentially identify an individual person or company. Explain what you would do if you did find sensitive information considering the privacy law.  The report · The "Introduction" sub chapter "Links you will need" in this book, provide you with links on how to write a report, and how to use EndNote. · All assignments in this course must use EndNote for your in-text citation and reference list.  · You need to submit the word document (not PDF) as this provides evidence of you having used EndNote and the database. · You do not have to provide a letter of transmittal, but must cater to all aspects of a report and use headings in your body that clearly identify the requirements. · You must apply the correct Harvard referencing and in-text citation. · The maximum number of words is 2,000. The word limit applies to the body of the report. What to submit · Use your student surname and student number as the file name for the report and the database. · On the study desk in the corresponding submission link submit two files, the word document and the database.
Answered Same DayJul 29, 2021CSC3502

Answer To: ASSIGNMENT 1 CSC3502 Principles of Big Data Management Student name and number: Marks Selection of...

Kshitij answered on Aug 07 2021
146 Votes
Principles of big data management
Assignment 1
Student name
Student ID
Introduction
This report consists of the various information and the detailed analysis related to the various datasets which again rated to the dataset and the information related to the Queensland government. The report also undertakes the information and the analysis related to the public dataset which co
ntains the information and the data of the general public. For preparation and presentation of this particular report 6 major datasets were undertaken and analysis was being performed on the same to attain the requisite information and knowledge to attain the objective of the same.
Basically, a dataset is considered to be the set or the collection of the data or the information of the relevant location and the audience of the particular sector which might also undertake the sensitive and personal information. In this report, the analysis is, made particularly of the tabled dataset which is represented in the tabular form and in the same the dataset which has been represented and chosen consist of one or more tables which are made up of multiple rows and columns and a particular variable is undertaken in that particular row and columns which are commonly known as attributes which signify the relevant information to the same. Hence to understand the information, to consider the need for the new design and the importance of amendment to the dataset, to understand the privacy laws and the ethics adhered to the same, this report has been prepared and managed.
Understanding of the dataset
As per the guidelines and the instructions, 6 datasets have been taken into account and the detailed analyses have been performed on the same to extract the important and the relevant information from the same (Lundberg, et al., 2019). The datasets selected are ranging from the years 2013 to 2020.
This Para deals with the dataset of the year 2013. In the month of August 2013, this was prepared and it can be stated after performing the due analysis of the same that the same consists of the attributes as datasets and also some numerical stating the number related to the audience and their respective counts. The dataset consists of 2 columns and various rows representing the various counts of the dataset according to their respective audience.
This particular Para deals with the dataset of the respective year 2014 and the specific month of July, this dataset consists of the information which is having the 2 specific attributes which are the page and the page views which deals with the number of pages which the dataset consist and the views which have been noticed and recorded in the particular dataset. This is the dataset of the Queensland with the count of the related views relating to the collection of the data.
Specifically, this data consists of the dataset which is having the relevant information of the of the publications which were made in the month of June and the year 2015 in the Queensland as per the government records stated in the same. This dataset consists of the two attributes and in the same, the first one is the relevant publications and another one relates to the views on the same. This gives an idea related to the likes and preferences of the audience.
In this particular Para, the relevance has been given to the publications and the information which was recorded in October 2016. The attributes which relate to the same are the views to the relevant publications and the page of the publication of the government of the Queensland (Greytak1, et al., 2018).
In this, the data and the information is related to the dataset which was formed for the year of 2017 and in the month of September having the attributes as page and the views upon the same. This is even having the publications and the count of the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here