plz, find attached dox.
ASSIGNMENT 1 CSC3502 Principles of Big Data Management Student name and number: Marks Selection of publicly available data set(s): Provided the file names of at least 6 selected data sets from this web site https://data.qld.gov.au/dataset/visitor-statistics-publications-qld-gov-au. /3 Understanding the data itself The report conveys critical analysis of the data contained in the Queensland Government, Open Data Portal, contains public data sets (visitor statistics) ranging from 2013 to 2019. It shows an understanding of the data contained in these data sets, and provides additional consideration for potential grouping or categorising this data. The critical analysis highlights particular patterns identified by considering the number of times a web site has been accessed at particular times within the years. The explanation of considering using different file types (CVS, XML, PDF, Json) compared to the current one used for these data sets, shows a clear understanding of importing data of the different files types into a database. /37 Propose a new design Data modelling conveys the details of the new design, including use of existing and proposed new data, for the data sets, that would allow a software developer to redesign the way this data is captured. The tables created in the database are part of communicating this new design. The report takes the reader through the design of the data set, explaining the corresponding data modelling and new database tables. It provides an insight into the new design and how this may impact and improve querying the database compared to the current data set. This is further enhanced by the description of the relationship diagram and corresponding screen captures. /35 Privacy law and ethics The discussion on the privacy law applied for shows an understanding of the privacy law and how this has been applied by the Queensland Government to their current data sets. /5 Writing style and report format Succinct throughout the report demonstrating good writing skills (5 marks) Correctly applied report format according to the link provided to students in the assignment specification (5 marks) Used EndNote for in-text citation and referencing throughout the report (5 marks) Adhered to the limitations set in the assignment specification (5 marks) /20 Total /100 Assignment 1 - Assignment specification For your assignment, you are required to critically analyse and interpret the knowledge from large datasets and propose a new design, use data modelling, creation database tables, relationship diagram, and provide a report on your analysis and design, and discuss privacy law writing to specialist and non-specialist audiences. For details see further below. The data sets to use for this assignment The Queensland Government, Open Data Portal, contains public data sets (visitor statistics) ranging from 2013 to 2020. Critically analyse the data sets You are tasked to critically analyse the data in the visitor statistics data sets (years 2013 to 2019). Communicate your understanding of the data in the data sets and provide additional consideration around grouping or categorising data. For example, is there a pattern between the highest number of people accessing a particular web site (URL in the data set) at a given time (month and year) that could potentially be linked with something else, such as flu season? Therefore can the data be grouped into health, sport, etc. Explain your consideration of different file formats such as CVS, XML, PDF, and JSON data files compared to the current file types. In writing provide evidence that you understand the impact of different file types when importing into a database or eyeballing the data Propose a new design Your aim is to propose a better way to capture all of this data to improve the public data sets and usability. You can include suggestions of additional data to be captured (without compromising privacy). You are focusing on what data should be captured so that this can be used by software developers to be implemented. For your proposed changes to improve the data sets, you need to apply data modelling, create tables in a database (a database that we use in this course), and using the database feature to create the relationship between the tables. In writing provide the audience with your insight into this design and how this new design is going to improve working with the data sets to find patterns. Provide evidence of data modelling, all tables, and relationship diagram through screen captures and discuss the screen captures in your writing. Considering that people would want to use the data sets that are already there provide suggestions on how these current data sets can potentially be converted to fit into your new proposed model. Privacy law For your report, review the web site that makes these data sets available and in your report discusses what kind of information about privacy is available and if the data sets have been anonymized. Describe your process of ensuring that the data does not contain any sensitive information that could potentially identify an individual person or company. Explain what you would do if you did find sensitive information considering the privacy law. The report · The "Introduction" sub chapter "Links you will need" in this book, provide you with links on how to write a report, and how to use EndNote. · All assignments in this course must use EndNote for your in-text citation and reference list. · You need to submit the word document (not PDF) as this provides evidence of you having used EndNote and the database. · You do not have to provide a letter of transmittal, but must cater to all aspects of a report and use headings in your body that clearly identify the requirements. · You must apply the correct Harvard referencing and in-text citation. · The maximum number of words is 2,000. The word limit applies to the body of the report. What to submit · Use your student surname and student number as the file name for the report and the database. · On the study desk in the corresponding submission link submit two files, the word document and the database.