The dataset Education - Post 12th Standard.csv contains information on various colleges. You are expected to do a Principal Component Analysis for this case study according to the instructions given....

1 answer below »

The dataset Education - Post 12th Standard.csv contains information on various colleges. You are expected to do a Principal Component Analysis for this case study according to the instructions given. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: Data Dictionary.xlsx.



  • Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed]. What insight do you draw from the EDA?

  • Is scaling necessary for PCA in this case? Give justification and perform scaling.

  • Comment on the comparison between the covariance and the correlation matrices from this data [on scaled data].

  • Check the dataset for outliers before and after scaling. What insight do you derive here? [Please do not treat Outliers unless specifically asked to do so]

  • Extract the eigenvalues and eigenvectors.[print both]

  • Perform PCA and export the data of the Principal Component (eigenvectors) into a data frame with the original features

  • Write down the explicit form of the first PC (in terms of the eigenvectors. Use values with two places of decimals only).

  • Consider the cumulative values of the eigenvalues. How does it help you to decide on the optimum number of principal components? What do the eigenvectors indicate?

  • Explain the business implication of using the Principal Component Analysis for this case study. How may PCs help in the further analysis? [Hint:Write Interpretations of the Principal Components Obtained]


Please reflect on all that you have learned while working on this project. This step is critical in cementing all your concepts and closing the loop. Please write down your thoughtshere.The dataset Education - Post 12th Standard.csv contains information on various colleges. You are expected to do a Principal Component Analysis for this case study according to the instructions given.
Answered 59 days AfterMay 25, 2022

Answer To: The dataset Education - Post 12th Standard.csv contains information on various colleges. You are...

Vishali answered on Jul 23 2022
88 Votes
First of all load the dataset and check shape and data type of variables.
Here, we have 777 rows a
nd 18 columns
Next type is to see summary of our data using describe function.
In order to check whether the data is normally distributed or not, we use distplot
1. Now, we will check normality of data using skewness.
2. Skewness =0 means data is normally distributed, if it is >0 it is left skewed and if it < 0 it is skewed towards right.
df.skew(axis=0,skipna=True)
3. For multivariate Analysis, we plot heatmap and check correlation using df.corr()
sns.heatmap(df.corr(),annot=True)
In order to do scaling we need to remove outliers. It is done to keep data on one common scale. It is kind of data pre processing which can be applied to independent variables or features of data. Another Calculations can also be speed up using scaling.
We have one column with object data type, so we need to drop that...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here