Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use python package. a. Find a linear equation by least-squares method. Show all your work! (15 points) b....

1 answer below »
Please see instructions for this assignment in the word document, submit solutions in a ipynb file.


Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use python package. a. Find a linear equation by least-squares method. Show all your work! (15 points) b. Calculate MSE (Mean Squared Error). (5 points) c. Calculate the R2 score. (5 points) d. If operation hours increase two hours, how much maintenance cost will increase/decrease based on your regression result? (5 points) Practice (70 points) 1. Using sklearn.linear_model, please make a linear regression model. (40 points) a. Find “housing.csv” file and read it as data frame. b. Choose “LSTAT” as x (explanatory variable) and “MEDV” as y (response variable). (5 points) c. Do a linear regression and find intercept and slope. (5 points) d. Draw scatter plot for all data and draw a line (scatter plot) you obtained in 1.c. on the same graph. (5 points) e. Calculate R2 score. (5 points) f. Do RANSAC with the same data set and find intercept and slope. Use “max_trials=100, min_samples=50” (10 points) g. Draw scatter plot for all data (please distinguish between inliers and outliers) and draw a line you obtained in 1.f. on the same graph. (5 points) h. Calculate R2 score for only inliers. (5 points) 2. Using sklearn.linear_model, please make a linear regression model. (30 points) a. Find “housing_hw.csv” file and read it as data frame. b. Choose “LSTAT” and “RM” as x (explanatory variables) and “MEDV” as y (response variable). (5 points) c. Do a (multivariate) linear regression and find optimal coefficients. (5 points) d. Calculate R2 score. (5 points) e. Now, please do polynomial regression with order=2 and find the optimal coefficients (5 points). f. Calculate R2 score. (5 points) g. Which variables are the most important to increase/decrease “MEDV”.
Answered Same DayNov 17, 2021

Answer To: Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use...

Suraj answered on Nov 18 2021
149 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Slope is 0.9523809523809523\n",
"intercept term is 14.571428571428573\n",
"Regression equation is 14.571428571428573 + 0.9523809523809523 x\n",
"Mean square error is 36.05714285714286\n",
"R-Square value is 0.8094231350045303\n",
"The maintenance cost is 16.476190476190478\n"
]
}
],
"source": [
"#1\n",
"import numpy as np\n",
"import itertools\n",
"op_hr=[18,6,30,48,6,36,18,18,30,36]\n",
"main_cost=[25,17,48,58,23,40,30,39,40,60]\n",
"mean_op_hr=np.mean(op_hr)\n",
"mean_main_cost=np.mean(main_cost)\n",
"sum1=0\n",
"for (i,j) in zip(op_hr,
main_cost):\n",
" val=(i-mean_op_hr)*(j-mean_main_cost)\n",
" sum1=sum1+val\n",
"sum2=0\n",
"for i in op_hr:\n",
" val1=(i-mean_op_hr)**2\n",
" sum2=sum2+val1\n",
"slope=sum1/sum2\n",
"print(\"Slope is\",slope)\n",
"intercept=mean_main_cost-slope*mean_op_hr\n",
"print(\"intercept term is\",intercept)\n",
"print(\"Regression equation is\",intercept,\"+\",slope,\"x\")\n",
"#calculating mean square error\n",
"pred=[]\n",
"for i in op_hr:\n",
" predict=intercept+slope*i\n",
" pred.append(predict)\n",
"error=0\n",
"for (i,j) in zip(main_cost,pred):\n",
" val2=(i-j)**2\n",
" error=error+val2\n",
"mse=error/len(main_cost)\n",
"print(\"Mean square error is\",mse)\n",
"#R-square calculation\n",
"ss_total=0\n",
"for i in main_cost:\n",
" val3=(i-mean_main_cost)**2\n",
" ss_total=ss_total+val3\n",
"r_square=1-(error/ss_total)\n",
"print(\"R-Square value is\",r_square)\n",
"#Prediction when operation hours increases 2 hours\n",
"prediction=intercept+slope*2\n",
"print(\"The maintenance cost is\",prediction)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
CRIMZMINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV
00.0063218.02.3100.5386.57565.24.09001296.015.3396.904.9824.0
10.027310.07.0700.4696.42178.94.96712242.017.8396.909.1421.6
20.027290.07.0700.4697.18561.14.96712242.017.8392.834.0334.7
30.032370.02.1800.4586.99845.86.06223222.018.7394.632.9433.4
40.069050.02.1800.4587.14754.26.06223222.018.7396.905.3336.2
\n",
"
"
],
"text/plain": [
" CRIM ZM INDUS CHAS NOX RM AGE DIS RAD TAX \\\n",
"0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296.0 \n",
"1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242.0 \n",
"2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242.0 \n",
"3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222.0 \n",
"4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222.0 \n",
"\n",
" PTRATIO B LSTAT MEDV \n",
"0 15.3 396.90 4.98 24.0 \n",
"1 17.8 396.90 9.14 21.6 \n",
"2 17.8 392.83 4.03 34.7 \n",
"3 18.7 394.63 2.94 33.4 \n",
"4 18.7 396.90 5.33 36.2 "
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"data=pd.read_csv(\"C:/Users/Hp/Downloads/housing.csv\")\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [],
"source": [
"x=data.iloc[:,12:13]\n",
"y=data.iloc[:,13:14]"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"#modeling\n",
"from sklearn.linear_model import LinearRegression\n",
"lin_reg=LinearRegression()\n",
"model=lin_reg.fit(x,y)\n",
"pred=model.predict(x)"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.5441462975864799\n"
]
}
],
"source": [
"#r-square\n",
"from sklearn.metrics import r2_score\n",
"r2=r2_score(y,pred)\n",
"print(r2)"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here