Student Guidelines Assessment 1 Research Study & Presentation Due: 22 December XXXXXXXXXX:59 pm Total Weightage: 20% Individual assignment Python is one of the most frequently used programming...

1 answer below »
Hii please check the below attached files, and i need ppt as well with total 10 slides without voice notes.


Student Guidelines Assessment 1 Research Study & Presentation Due: 22 December 2019 - 11:59 pm Total Weightage: 20% Individual assignment Python is one of the most frequently used programming languages in many fields, particularly in data science. It is also one of the best data science tools for the big data job. Task The assignment has two phases: 1) writing a report and 2) presentation of findings using Python codes. 1. Report (Weightage: 10%) Choose data: Choose a data from Kaggle website, https://www.kaggle.com/datasets , or a government open source data. You can also use Twitter data, which you can download using Python Tweepy package. Analytics: Find out what you can do with that data or what kind of decision making you can do with it. First (Step 1), do an exploratory data analysis on the data that you have gathered. Exploratory data analysis is an approach for analysing data sets to summarize their main characteristics, often with visual methods. Then (Step 2), Build a machine learning model on top of your data and make necessary recommendations. Python implementation: To be consistent with all students, implementation must be done in google Colab: https://colab.research.google.com/notebooks/welcome.ipynb Colab is a free notebook environment that requires no setup and runs entirely in the cloud. You need to login to google Colab and write your Python code for analysing the data. Add your google Colab account showing your name on it into your report, by clicking orange button on top-right corner and taking screenshot. Your report should have 1500-2000 words addressing the following: information on the data and why it is important, literature review on the data and methodology you are going to work, what you are going to solve and how, plots and recommendations. The report should have at least 4-6 plots (screenshots) from your findings with explanations. 2. Presentation (Weightage: 10%) The presentation should be a maximum of 10 minutes. It must cover the research report, research findings and visualisation and step by step discussion on how you’ve done this project. https://www.kaggle.com/datasets https://www.kaggle.com/datasets https://colab.research.google.com/notebooks/welcome.ipynb https://colab.research.google.com/notebooks/welcome.ipynb Submission Guidelines All submissions are to be submitted through turn-it-in. Drop-boxes linked to turn-it-in will be set up in the Unit of Study Moodle account. Assignments not submitted through these drop-boxes will not be considered. Submissions must be made by the due date and time (which will be in the session detailed above) and determined by your Unit coordinator. Submissions made after the due date and time will be penalized at the rate of 10% per day (including weekend days). The turn-it-in similarity score will be used in determining the level if any of plagiarism. Turn-it-in will check conference web-sites, Journal articles, the Web and your own class member submissions for plagiarism. You can see your turn-it-in similarity score when you submit your assignment to the appropriate drop-box. If this is a concern you will have a chance to change your assignment and re-submit. However, re-submission is only allowed prior to the submission due date and time. After the due date and time have elapsed you cannot make re-submissions and you will have to live with the similarity score as there will be no chance for changing. Thus, plan early and submit early to take advantage of this feature. You can make multiple submissions, but please remember we only see the last submission, and the date and time you submitted will be taken from that submission. Your report should be a single word or pdf document containing your report. Your presentation file should have a standard video format and it should not exceed 200 MB. Slides and your face should be clear in the video file. You need to submit the presentation file (not link to your video) in the provided video submission link. Please do not submit the link for your video, which will not be considered for marking.
Answered Same DayDec 16, 2021

Answer To: Student Guidelines Assessment 1 Research Study & Presentation Due: 22 December XXXXXXXXXX:59 pm...

Neha answered on Dec 20 2021
144 Votes
1
2
Title of your report
Your name, your id
Course name, Assignment …
VIT address
Supervisor/Lecturer name
Abstract
This report is to demonstrate my work and knowledge regarding the exploratory data analysis. For this task I have used python language. Exploratory data analysis is an approach which helps to determine the behavior of a dataset and find results from it. I us
ed Google Colaboratory to write the code and it was very simple to perform EDA using python. I chose heart patient data from the Kaggle online data portal. This dataset contains data about the heart patients and their details. I performed different calculations and generated graphs and charts for the same which made it much easier to analyses the data based on their heart rates, sex and age.
I. Introduction
Exploratory data analysis can be defined as an approach which is used to analyze the data sets to conclude their characteristics using charts and graphs. Exploratory data analysis helps us to understand the data beyond formal modelling. It is well known for the statistics to explore the data and formulate hypothesis which can lead to new experiments. EDA is not a difficult task for those who have even small knowledge of this. There is multiple software which can be used for EDA like JMP, KNIME, R, Weka etc. but I have used Python for this analysis on the heart disease patients.
Objectives
1) Describe a dataset quickly. It helps to find missing data, type of data, number of rows and columns and preview of the data.
2) It is used to clean the corrupted data. It handles the missing and incorrect values.
3) It is very helpful in visualizing the data distributions using bar charts, box plots and histograms.
4) We can use the EDA to calculate the relationship between the variables using heat map.
I have selected the dataset for heart patients. This dataset contains multiple columns which gives information about the heart disease a patient has.
Attribute Information:
1. Age = It contains the age of the patient
2. Sex = It contains the sex of the patient. 1 = male and 0 = female
3. Cp=It stands for chest pain type which can be 0,1,2 and 3.
4. Trestbps = resting blood pressure of the patient
5. Chol = serum cholestoral of patient in mg/dl
6. FBS = fasting blood sugar > 120 mg/dl
7. Restecg = resting electrocardiographic results (values 0,1,2)
8. Thalach = maximum heart rate achieved for each patient
9. Exang = exercise induced angina of the patient
10. oldpeak = ST depression induced by exercise relative to rest
11. Slope = the slope of the peak exercise ST segment
12. Ca = number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. target=It can be 0 or 1
This dataset contains 303 rows of data with 14 columns for the heart patients.
II. Discussion and Analytics
For performing EDA using python we need to include libraries like NumPy, panda, seaborn and Matplotlib. We should assign an alias to reduce the repeating of the code. Pandas Library helps us to store the dataset into a data frame. Before performing any operation, we should have a clear knowledge of what data we are using and all its attributes. Data frame in which we store the data makes this task very easy. The data frame provides a function called .shape which can be called with the data frame to know about the shape of the dataset.
Fig1. To get the shape of the data frame
To know the type of the data a data frame consists we can use. head function of Pandas to get the top five rows of the data set.
Fig2. To get the first five rows using .head
As the dataset which I am using is already clean so it didn’t require me to perform any operation to clean the data. Otherwise we can use functions like compare to compare the dataset and retrieve only the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here