To help you prepare for working with data on your own, this assignment requires you to create your own jupyter notebook and create python code to prepare and explore a given dataset.
For this assignment, you will use the steps and methods discussed in class toprocess and cleanthisdataset on college admissions
information. Create a jupyter notebook, and make sure to comment your code to indicatewhat you are doing and why.
Before you begin processing the data, note that there is a lot of data in this csv file. Instead of keeping all of it, start by just selecting the information that we want to look at by taking the college name so we know what the data represents and 11 of the numerical columns.
The first column in this dataframe is:
Name - the name of the college or university
The other 11 columns are:
Applicants total - how many applications
Admissions total - how many admissions
Enrolled total - how many of the admitted students chose the university or college
ACT Composite 75th percentile score - ACT Score
Estimated undergraduate enrollment, total
Total price for in-state students living on campus 2013-14
Total price for out-of-state students living on campus 2013-14
Percent of total enrollment that are White
Percent of undergraduate enrollment that are women
Graduation rate - Bachelor degree within 5 years, total
Percent of freshmen receiving any financial aid
When cleaning and processing this data, make sure you address the data quality issues and discuss your findings in comments.
After cleaning the data, then choose at least three variables toexploreusing descriptive statistics and visualizations. Identify any outliers in these variables, and make the case as to why these rows of data should be included or excluded from future analysis.
Use visualizations to identify relationships between your chosen variables. Summarize your findings in commented code. What questions do these findings suggest for further analysis?