Has to be done in Jupyter Notebook.
This is a project using Python programming language to analyse a specific problem in the following areas, such as Pharmacy, Library, Holiday Booking System, Medical Practice, Concert hall, Motor mechanic, Sales, Customers behaviour, Primary School, Role-playing game and manufacturing etc. Your group may choose data from any other category based on your interest from Kaggle (https://www.kaggle.com/) or UCI (https://archive.ics.uci.edu/ml/index.php) or any other repository. The dataset should have
at least 8000 rows and 10 columns (for example, type of variables may be categorical, continuous and discrete) after cleaning and there is not any maximum limit for data records. Your group may use any data set and complete the following tasks as mentioned below
6) Develop Data Exploratory analysis (EDA) report which shows that the numeric variables in the data set and exhibit no obvious association.
7) Based on your EDA, identify interesting sub-groups of records within the data set that would be worth further investigation.
8) Apply one-hot encoding to categorical variables (at least one variable use from the data set) and discuss the benefits of one-hot encoding to understand the categorical data.
9) Apply PCA on your data set by considering any number of components (at least 2 components must be chosen). Write a short profile of the first few components
extracted based on your understanding.
10) What is the objective of dimensionality reduction? Suppose that you perform the PCA using three components. Considering the following information, which variable or variables might be well advised to omit from the PCA, and why?
The Data and visualizations also need to answer the following questions as I have to do a report about it after.
What countries have the best ranked wines on average based on the point ranking system? (Top 5 countries)
What countries offer a better price/quality ratio based on price and the points system?
Is there a correlation between price and the quality of the wines?
What variety of wine has the best ranking on average?
What region offers the best ranked wines on average and which one offers the worse ranked wines on average?
Already registered? Login
Not Account? Sign up
Enter your email address to reset your password
Back to Login? Click here