Please review the attachment. This is related to Machine learning programming for kernel regression using jupyter notebook
Kernel Regression Given a training dataset , kernel regression approximates the unknown nolinear relation between and with a function of form where is a positive definite kernel specified by the users, and is a set of weights. We will use the simple Gaussian radius basis funciton (RBF) kernel, where is a bandwith parameter. Step 1. Simulate a 1-dimensional dataset { ,?? ??} ? ?=1 ? ? ? ≈ ?(?;?) = ?(?, ),∑ ?=1 ? ?? ?? ?(?, )?′ ?? ?(?, ) = ???(− ),?′ ||? − |?′ |2 2ℎ2 ℎ In [3]: Now we have a dataset with 100 training data points. Let us calculate the kernel function. Step 2. Kernel function Your task is to complete the following rbf_kernel function that takes two sets of points (of size ) and (of size ) and the bandwidth and ouputs their pairwise kernel matrix , which is of size . We will represent input data as matrices, with denoting the input features and ? ? ? ′ ? ℎ ? = [?( , )?? ?? ]?? ? × ? ? = [ ∈??] ? ?=1 ? ?×1 ('xTrain shape', (100, 1), 'yTrain shape', (100, 1)) import numpy as np import matplotlib import matplotlib.pyplot as plt %matplotlib inline np.random.seed(100) ### Step 1: Simulate a simple 1D data ### xTrain = np.expand_dims(np.linspace(-5, 5, 100), 1) # 100*1 yTrain = np.sin(xTrain) + 0.5*np.random.uniform(-1, 1, size=xTrain.shape) ## 100 *1 print('xTrain shape', xTrain.shape, 'yTrain shape', yTrain.shape) plt.plot(xTrain, yTrain, '*') plt.show() the input labels.)? = [ ∈??]??=1 ? ?×1 In [4]: Step 3. The median trick for bandwith The choice of the bandwidth A common way to set the bandwith in practice is the so called median trick, which sets to be the median of the pairwise distance on the training data, that is Task: Compelete the median distance function. ℎ ℎ ℎ = ??????({|| − || : ? ≠ ?, ?, ? = 1, . . . , ?}).ℎ??? ?? ?? File "
", line 11 K = ... ^ SyntaxError: invalid syntax """ calcuating kernel matrix between X and Xp """ def rbf_kernel(X, Xp, h): # X: n*1 matrix # Xp: m*1 matrix # h: scalar value ## TODO: please calculate the kernel matrix in the following: # (hint: you can write your own pairwise distance function, or cipy.spatial.distance K = ... return K #n*m ### evaluation: if your implementation is correct, you should expect the output is a 2X3 # [[0.60653066 1. 0.60653066] # [0.13533528 0.60653066 1. ]] k_test = rbf_kernel(np.array([[2],[1]]), np.array([[3],[2],[1]]), 1) print(k_test) In [18]: Step 4. Kernel regression The weights are estimated by minimizing a regularized mean square error: where is the column vector formed by and K is the kernel matrix. Please derive the optimal solution of using matrix inverseion (no need to show the work) Complete the following function to implement the calculation of ?? ( ( − ?( ;?) ) + ? ??,min ? ∑ ?=1 ? ?? ?? ) 2 ?⊤ ? ? = [??] ? ?=1 ? ? In [19]: Step 5. Evaluation and Cross Validation We now need to evaluate the algorithm on the testing data and select the hyperparameters (bandwidth and regularization coefficient) using cross validation 2.0 (100, 1) from scipy.spatial import distance def median_distance(X): # X: n*1 matrix #TODO: Calculate the median of the pairwise distance of $X$ below #(hint: use '[dist[i, j] for i in range(len(X)) for j in range(len(X)) if i != j]' t h = ... return h ### Test your functions #evaluation: if your implementation is correct, your answer should be [2.0] h_test = median_distance(np.array([[1],[2],[4]])) print(h_test) def kernel_regression_fitting(xTrain, yTrain, h, beta=1): # X: input data, numpy array, n*1 # Y: input labels, numpy array, n*1 # TODO: calculate W below (it is a n*1 matrix) W = ... return W ### evaluating your code, the shape should be (100, 1) (check the values yourself) h = median_distance(xTrain) W_test = kernel_regression_fitting(xTrain, yTrain, h) print(W_test.shape) In [6]: Step 5.1. Impact of bandwith Run the kernel regression with regularization coefficient and bandwidth . Task: Show the curve learned by different . Comment on how influences the smoothness of . ? = 1 ℎ ∈ {0.1 , , 10 }ℎ??? ℎ??? ℎ??? ℎ ℎ ℎ --------------------------------------------------------------------------- NameError Traceback (most recent call last) in () 19 beta = 1. 20 # calculating bandwith ---> 21 h_med = median_distance(xTrain) 22 yHatk = kernel_regression_fit_and_predict(xTrain, yTrain, xTest, h_med, beta) 23 NameError: name 'median_distance' is not defined # Please run and read the following base code def kernel_regression_fit_and_predict(xTrain, yTrain, xTest, h, beta): #fitting on the training data W = kernel_regression_fitting(xTrain, yTrain, h, beta) # computing the kernel matrix between xTrain and xTest K_xTrain_xTest = rbf_kernel(xTrain, xTest, h) # predict the label of xTest yPred = np.dot( K_xTrain_xTest.T, W) return yPred # generate random testing data xTest = np.expand_dims(np.linspace(-6, 6, 200), 1) ## 200*1 beta = 1. # calculating bandwith h_med = median_distance(xTrain) yHatk = kernel_regression_fit_and_predict(xTrain, yTrain, xTest, h_med, beta) # we also add linear regression for comparision from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(xTrain, yTrain) yHat = lr.predict(xTest) # prediction # visulization plt.plot(xTrain, yTrain, '*') plt.plot(xTest, yHat, '*') plt.plot(xTest, yHatk, '-k') plt.show() In [29]: Step 5.2. Cross Validation (CV) Use 5-fold cross validation to find the optimal combination of and within and . Task: complete the code of cross validation and find the best and . Plot the curve fit with the optimal hyperparameters. ℎ ? ℎ ∈ {0.1 , , 10 }ℎ??? ℎ??? ℎ??? ? ∈ {0.1, 1} ℎ ? ### fitting on the training data ### beta = 1. plt.figure(figsize=(12, 4)) for i, coff in enumerate([0.1, 1., 10]): plt.subplot(1, 3, i+1) ### TODO: run kernel regression with bandwith h = coff * h_med. yHatk_i = ... # visulization plt.plot(xTrain, yTrain, '*') plt.plot(xTest, yHat, '*') plt.plot(xTest, yHatk_i, '-k') plt.title('handwidth {} x h_med'.format(coff)) plt.show() In [41]: ('Beta beta', 1, 'Best bandwith', '0.1*h_med', 'mse', 0.11166229355896191) best_beta, best_coff = 1., 1. best_mse = 1e8 for beta in [0.1, 1]: for coff in [0.1, 1., 10.]: # 5-fold cross validation max_fold = 5 mse = [] for i in range(max_fold): ##TODO: calculate the index of the training/testing partition within 5 fold C # (hint: set trnIdx to be these index with idx%max_fold!=i, and testIdx with trnIdx = ... testIdx = ... i_xTrain, i_yTrain = xTrain[trnIdx], yTrain[trnIdx] i_xValid, i_yValid = xTrain[testIdx], yTrain[testIdx] ##TODO: run kernel regression on (i_xTrain, i_yTrain) and calculate the mean h = ... i_yPred = ... mse.append((i_yValid - i_yPred)**2) mse = np.mean(mse) # keep track of the combination with the best MSE if mse < best_mse: best_beta, best_coff = beta, coff best_mse:="" best_beta,="" best_coff="beta,"> best_mse: best_beta, best_coff = beta, coff>