This is a Python comp sci assignment due at midnight. The notebook has all the detail and questions in it, the data needing to be used is the .csv file. I attached assignment two for reference of what its supposed to look like!
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is an IDIVIDUAL assignment. You are NOT supposed to work together or share code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each cell includes a comment telling you what to do. After you code, run your code to see the output.\n", "After you are done, save the Jupyter Notebook then upload it on designated area on Canvas." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 1\n", "# use pandas and import the housing.csv dataset (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 2\n", "# make three new attributes, rooms_per_household, bedrooms_per_room, and population_per_household\n", "# add them to your dataset\n", "# exactly like what we did in class (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 3\n", "# use the head() method to show your dataset (2 point)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 4\n", "# use sckitlearn library and split your dataset into train_set and test_set (3 points)\n", "# IMPORTANT --> consider 25% of your dataset as the test_set (random_state=42)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 5\n", "# split inputs and output (median_house_value) (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 6\n", "# split numerical and categorical columns (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 7\n", "# generate numerical pipeline to take care of missing values and scale the dataset (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 8\n", "# generate full pipeline to take care of numerical and categorical data (use OneHotEncoder) (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 9\n", "# apply full pipeline to training set and prepare the data for training ML model (2 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 10\n", "# use prepared data from question 8 and output from question 5 and train a linear regression model (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 11\n", "# use prepared data from question 8 and output from question 5 and train a second degree polynomial regression model(3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 12\n", "# use prepared data from question 8 and output from question 5 and train a third degree polynomial regression model (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 13\n", "# prepare the test set to test the trained model in questions 10 (4 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 14\n", "# use trained linear regression model in question 10 and perform prediction on the prepared test set in question 13 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 15\n", "# calculate rmse for tested model in question 14 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 16\n", "# prepare the test set to test the trained model in questions 11 (4 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 17\n", "# use trained second degree regression model in question 11 and perform prediction on the prepared test set in question 16 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 18\n", "# calculate rmse for tested model in question 17 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 19\n", "# prepare the test set to test the trained model in questions 12 (4 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 20\n", "# use trained third degree regression model in question 12 and perform prediction on the prepared test set in question 19 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 21\n", "# calculate rmse for tested model in question 20 (3 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 22\n", "# train a second degree ridge regression on prepared data in question 11 (10 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 23\n", "# use second degree ridge regression in question 22 and do prediction on prepared test set in question 16 (6 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 24\n", "# calculate RMSE\n", "# if the RMSE is not satisfying, go back to question 22, use different alpha (between 0 and 1) and try to find the best alpha (alpha which result in smallest RMSE) (10 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 25\n", "# train a second degree elastic net on prepared data in question 11 (10 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Question 26\n", "# use second degree elastic net in question 25 and do prediction on prepared test set in question 16 (6 points)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [