Please see instructions for this assignment in the word document, submit solutions in a ipynb...

Question

Please see instructions for this assignment in the word document, submit solutions in a ipynb file.

Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use python package. a. Find a linear equation by least-squares method. Show all your work! (15 points) b. Calculate MSE (Mean Squared Error). (5 points) c. Calculate the R2 score. (5 points) d. If operation hours increase two hours, how much maintenance cost will increase/decrease based on your regression result? (5 points) Practice (70 points) 1. Using sklearn.linear_model, please make a linear regression model. (40 points) a. Find “housing.csv” file and read it as data frame. b. Choose “LSTAT” as x (explanatory variable) and “MEDV” as y (response variable). (5 points) c. Do a linear regression and find intercept and slope. (5 points) d. Draw scatter plot for all data and draw a line (scatter plot) you obtained in 1.c. on the same graph. (5 points) e. Calculate R2 score. (5 points) f. Do RANSAC with the same data set and find intercept and slope. Use “max_trials=100, min_samples=50” (10 points) g. Draw scatter plot for all data (please distinguish between inliers and outliers) and draw a line you obtained in 1.f. on the same graph. (5 points) h. Calculate R2 score for only inliers. (5 points) 2. Using sklearn.linear_model, please make a linear regression model. (30 points) a. Find “housing_hw.csv” file and read it as data frame. b. Choose “LSTAT” and “RM” as x (explanatory variables) and “MEDV” as y (response variable). (5 points) c. Do a (multivariate) linear regression and find optimal coefficients. (5 points) d. Calculate R2 score. (5 points) e. Now, please do polynomial regression with order=2 and find the optimal coefficients (5 points). f. Calculate R2 score. (5 points) g. Which variables are the most important to increase/decrease “MEDV”.

hw7-u40ogi5d.docx housing-wj2crf5b.csv

Suraj · Accepted Answer

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Slope is 0.9523809523809523
",
      "intercept term is 14.571428571428573
",
      "Regression equation is 14.571428571428573 + 0.9523809523809523 x
",
      "Mean square error is 36.05714285714286
",
      "R-Square value is 0.8094231350045303
",
      "The maintenance cost is 16.476190476190478
"
     ]
    }
   ],
   "source": [
    "#1
",
    "import numpy as np
",
    "import itertools
",
    "op_hr=[18,6,30,48,6,36,18,18,30,36]
",
    "main_cost=[25,17,48,58,23,40,30,39,40,60]
",
    "mean_op_hr=np.mean(op_hr)
",
    "mean_main_cost=np.mean(main_cost)
",
    "sum1=0
",
    "for (i,j) in zip(op_hr,main_cost):
",
    "    val=(i-mean_op_hr)*(j-mean_main_cost)
",
    "    sum1=sum1+val
",
    "sum2=0
",
    "for i in op_hr:
",
    "    val1=(i-mean_op_hr)**2
",
    "    sum2=sum2+val1
",
    "slope=sum1/sum2
",
    "print("Slope is",slope)
",
    "intercept=mean_main_cost-slope*mean_op_hr
",
    "print("intercept term is",intercept)
",
    "print("Regression equation is",intercept,"+",slope,"x")
",
    "#calculating mean square error
",
    "pred=[]
",
    "for i in op_hr:
",
    "    predict=intercept+slope*i
",
    "    pred.append(predict)
",
    "error=0
",
    "for (i,j) in zip(main_cost,pred):
",
    "    val2=(i-j)**2
",
    "    error=error+val2
",
    "mse=error/len(main_cost)
",
    "print("Mean square error is",mse)
",
    "#R-square calculation
",
    "ss_total=0
",
    "for i in main_cost:
",
    "    val3=(i-mean_main_cost)**2
",
    "    ss_total=ss_total+val3
",
    "r_square=1-(error/ss_total)
",
    "print("R-Square value is",r_square)
",
    "#Prediction when operation hours increases 2 hours
",
    "prediction=intercept+slope*2
",
    "print("The maintenance cost is",prediction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      CRIM
",
       "      ZM
",
       "      INDUS
",
       "      CHAS
",
       "      NOX
",
       "      RM
",
       "      AGE
",
       "      DIS
",
       "      RAD
",
       "      TAX
",
       "      PTRATIO
",
       "      B
",
       "      LSTAT
",
       "      MEDV
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      0
",
       "      0.00632
",
       "      18.0
",
       "      2.31
",
       "      0
",
       "      0.538
",
       "      6.575
",
       "      65.2
",
       "      4.0900
",
       "      1
",
       "      296.0
",
       "      15.3
",
       "      396.90
",
       "      4.98
",
       "      24.0
",
       "    
",
       "    
",
       "      1
",
       "      0.02731
",
       "      0.0
",
       "      7.07
",
       "      0
",
       "      0.469
",
       "      6.421
",
       "      78.9
",
       "      4.9671
",
       "      2
",
       "      242.0
",
       "      17.8
",
       "      396.90
",
       "      9.14
",
       "      21.6
",
       "    
",
       "    
",
       "      2
",
       "      0.02729
",
       "      0.0
",
       "      7.07
",
       "      0
",
       "      0.469
",
       "      7.185
",
       "      61.1
",
       "      4.9671
",
       "      2
",
       "      242.0
",
       "      17.8
",
       "      392.83
",
       "      4.03
",
       "      34.7
",
       "    
",
       "    
",
       "      3
",
       "      0.03237
",
       "      0.0
",
       "      2.18
",
       "      0
",
       "      0.458
",
       "      6.998
",
       "      45.8
",
       "      6.0622
",
       "      3
",
       "      222.0
",
       "      18.7
",
       "      394.63
",
       "      2.94
",
       "      33.4
",
       "    
",
       "    
",
       "      4
",
       "      0.06905
",
       "      0.0
",
       "      2.18
",
       "      0
",
       "      0.458
",
       "      7.147
",
       "      54.2
",
       "      6.0622
",
       "      3
",
       "      222.0
",
       "      18.7
",
       "      396.90
",
       "      5.33
",
       "      36.2
",
       "    
",
       "  
",
       "
",
       ""
      ],
      "text/plain": [
       "      CRIM    ZM  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
",
       "0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296.0   
",
       "1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242.0   
",
       "2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242.0   
",
       "3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222.0   
",
       "4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222.0   
",
       "
",
       "   PTRATIO       B  LSTAT  MEDV  
",
       "0     15.3  396.90   4.98  24.0  
",
       "1     17.8  396.90   9.14  21.6  
",
       "2     17.8  392.83   4.03  34.7  
",
       "3     18.7  394.63   2.94  33.4  
",
       "4     18.7  396.90   5.33  36.2  "
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd
",
    "data=pd.read_csv("C:/Users/Hp/Downloads/housing.csv")
",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [],
   "source": [
    "x=data.iloc[:,12:13]
",
    "y=data.iloc[:,13:14]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [],
   "source": [
    "#modeling
",
    "from sklearn.linear_model import LinearRegression
",
    "lin_reg=LinearRegression()
",
    "model=lin_reg.fit(x,y)
",
    "pred=model.predict(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5441462975864799
"
     ]
    }
   ],
   "source": [
    "#r-square
",
    "from sklearn.metrics import r2_score
",
    "r2=r2_score(y,pred)
",
    "print(r2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       ""
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png":

Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use python package. a. Find a linear equation by least-squares method. Show all your work! (15 points) b....

Answer To: Theory (30 points) 1. This is a table for maintenance cost with operation hours. You cannot use...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	CRIM	ZM	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3	222.0	18.7	396.90	5.33	36.2