Data Science 311Lab 4 (10 points)Due at 10am on Oct. 31, 2022Read all of the instructions. Late work will not be accepted.OverviewIn this lab you will create a dataset for an intended...

1 answer below »
Using the attatched pdf please follow the instructions in the document to complete the lab. Any dataset is allowed. It will need to be completed in google colab and python 3 language. Please finish by the due date and complete all tasks



Data Science 311 Lab 4 (10 points) Due at 10am on Oct. 31, 2022 Read all of the instructions. Late work will not be accepted. Overview In this lab you will create a dataset for an intended analysis which could be conducted by future data analysts. The lab will be an exercise in data curation as well as time management and tolerance to ambiguity. You have complete freedom on the original data used to create your dataset as well as the description of the intended analysis. Students should find a data source that is workable from the methods taught in class, i.e., downloadable as a csv or json that can be readily loaded into a pandas dataframe. Please take care to make a data curation plan that can be accomplished in a suitable time frame for the lab due date. Collaboration For this lab, you can brainstorm with any classmates about ideas for datasets and data curation methodology. However, your dataset and corresponding explanation of the intended analysis should be unique. Your submission must acknowledge ideas or suggestions you received from other classmates in an acknowledgement section at the end of your readme.txt. Details Data You can use any data source of your choosing for this lab. This includes any data previously discussed in lecture and accompanying notes as well as other data that you might be interested in getting aquainted with. Tasks In this lab you are dipping your feet in the deep end of data science by collecting, organizing, and annotating data. The lab is structured into 4 parts which you will document in a plain text file readme.txt. Carefully follow the steps outlined below to complete the lab assignment. 1. Create a file in a text editor called readme.txt. Stub out the required sections listed below. (a) Title: Lab 4 DS 311 Dataset Curation (b) Author: Your name (c) Dataset name (d) Dataset description (e) Data provenance 1 (f) Intended usage (g) Data curation (h) Data faults (i) Acknowledgements Note As it turns out, the order of the sections for a well-structured readme are often not the order in which you end up elaborating them. For instance, the interested reader of a readme will want to know the dataset description first hand, but you will know the dataset provenance long before you have assembled the final dataset. The dataset description will be the last thing you usually write. 2. Decide on a data source. Search around for some accessible data that interested in check- ing out and can easily be loaded into a pandas dataframe. Create a file, lab4.ipynb, and write the code to load the data from a url or google drive in the notebook. If the data was accessed from a graphical user interface and then downloaded to disc, the orig- inal data must be included in the submission zip file with the title original data.csv so the TA can run your code without going through the data collection process. 3. Complete the Data provenance section of readme.txt containing complete and detailed instructions for acquiring the data from the original source, e.g., url, search parameters for GUI or API, reference to relevant code for accessing the data in lab4.ipynb. 4. Complete the intended usage section of readme.txt. This should describe the analysis intended for the dataset you are going to curate. 5. Fill out the Data curation section of readme.txt by describing your plan to curate the data for the intended analysis. The ultimate dataset after data curation should be in a standard pandas dataframe format with rows corresponding to datapoints and columns corresponding to data features. If personal data is submitted, deanonymization must be part of the data curation plan (e.g. abstract from locations in location data by using relative distances). 6. Enact the data curation plan in lab4.ipynb. It’s okay at this point to adapt your plan if insurmountable difficulties arise due to unanticipated data quirks or time con- straints. Document any difficulties your plan enactment encountered in readme.txt in the Data faults section, and then document the completed data curation plan in the Data curation section if it’s in any way changed from the original. 7. Fill out the Dataset description section of readme.txt. This can be as simple as describing what the datapoints are and describing in words each feature of the dataset. 8. In lab4.ipynb, save your curated dataset to a csv format called dataset.csv. 9. Fill out the Acknowledgements section of readme.txt. This should include any col- laboration, tips, or inspiration you received from classmates or instructors as well as any reference articles related to the original data source if the data source requested a citation. 10. Think of a cool name for your dataset. Record this in the Dataset name section of readme.txt 2 Submitting Your Work You will submit a single file, Firstname Lastname lab4.zip. containing your 3 (or 4 if applicable) files: 1. lab4.ipynb 2. readme.txt 3. dataset.csv 4. original data.csv (if applicable) (where spelling, spacing and capitalization matter) and upload the zip via Canvas. Grading Dataset creation will be graded on the following: • 25% of total grade will be on data curation. • 25% of total grade will be on whether the dataset.csv in the zip file matches dataset.csv derived from running lab4.ipynb. • 25% of the total grade will be on completeness of readme.txt • 25% on clarity of exposition in readme.txt 3
Answered 1 days AfterOct 30, 2022

Answer To: Data Science 311Lab 4 (10 points)Due at 10am on Oct. 31, 2022Read all of the instructions....

Amar Kumar answered on Oct 31 2022
50 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here