Q2: Simple linear regression with R2 calculation (5 points)
Recall that the equation for simple linear regression is for an observation i is :
yi=β0+β1 xi 1 ϵi where ϵi ≈N ( 0 , σ2).
In matrix form: y=X β
Do some linear algebra and finally: β=( X′ X )−1 X′ y
Recall that we need to write X as follows for the algebra to work out:
[1 x11
1 x21
. .. . . .
1 xN 1 ]
Part 1:
Create data and fit the model
The synthetic data should be in the following form:
y=−4+2.5 X+ϵi where ϵi ≈N (0, σ2 ).
##START CODE HERE
##import packages. You would need numpy and matplotlib
##END CODE HERE
##Step 1 Create and visualize the dataset
##START CODE HERE
#Step 1: Create n=1000 random samples of shape (1000 X 1)
n =
X_Raw =
y =
##END CODE HERE
##Plot the data generated
##START CODE HERE
plt.figure()
plt.xticks()
plt.yticks()
plt.plot()
plt.show()
##END CODE HERE
##Step 2 Fit the line
##Step 2.1
##START CODE HERE
##reconstruct X
X = ##manipulate X_raw so that it has the column of ones
X_raw.shape ##check it shape to confirm that you have reconstructed X
##END CODE HERE
## Step 2.2 Calculate beta
##START CODE HERE
##Calculate beta using the formula above. You can use the linalg.inv,
dot functions and T attribute
beta =
print(beta)
##END CODE HERE
##STEP 2.3 Make predictions and plot the linear line
##START CODE HERE
y_hat =
##END CODE HERE
##START CODE HERE
plt.figure(figsize=(10,5))
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.plot()##raw data
plt.plot()##fitted line
plt.legend()
plt.show()
##END CODE HERE
Part 2: Calculate R2 (2.5 points)
The following is the formula for R2.
For more details you can look up:
https://en.wikipedia.org/wiki/Coefficient_of_determinationN
SSres=Σ (yi−^yi)2
i=1
N
y=1nΣ (yi)
i=1
N
SStot=Σ (yi−y)2
i=1
R2=1−SSres
SStot
##Create a function to calculate R^2
def r2(y,y_fit):
#START CODE HERE
assert ##check length, 'y and y_fit should be of same length'
N = #calculate the length of y
y_bar = #calculate y_bar as required by the formula
##END CODE HERE
##Step 1: Calculate total sum of squares SS_tot
##START CODE HERE
ss = 0
##Create a loop that loops through all the observations and stores
the square of difference between y and y_bar
##END CODE HERE
##Step 2: Calculate residual sum of squares SS_res
##START CODE HERE
sse = 0
##Create a loop that loops through all the observations and stores
the square of difference between y and y_fit
##END CODE HERE
##Step 3: Calculate the R2 score and return it
##START CODE HERE
r2 = ##use the formula 1 - SS_res/SS_tot
return r2
##END CODE HERE