only need to create functions, not a very lengthy process.
HW1correct.pdf Now, on to the programming: In class, we discussed classification using Max Likelihood and using Max Posterior. For this assignment, you will create the Max Posterior/Bayes classifier to label online social network users based on their posting history. Specifically, you will determine if each user is based in Cairo, Frankfurt, Philadelphia, or Seoul. You will make this determination based on a single feature x the time the user most frequently posts online. Accessing our data The file hw1data.mat is available on our website (and on erdos using cp ~dleeds/MLpublic/hw1data.mat .) Load this file into your Python session to get access to the trainData and testData numpy arrays . For each array, each row is one example data point. The first column represents the user class 0 for Cairo, 1 for Frankfurt, 2 Philadelphia, and 3 Seoul and the second column represents the corresponding postingTime (most common posting time) for the example data point (user). Note postingTime will be determined based on the current time in New York City. Also, time will be recorded on the 24-hour clock, where 0 is midnight, 430 is 4:30am, and 1750 is 5:50pm. Programming assignments: 1. Inspect the distribution of the postingTime feature for each class and determine if it follows a Gaussian or a Uniform distribution. (Note, uniform was shown earlier in Lecture 1.) Record this result as a comment in hw1.py . You can inspect the distribution of values in a list/vector of numbers vector through a histogram. import matplotlib.pyplot as plt plt.hist(vector) plt.show() Regardless of our results from question 1, we will assume all distributions really are Gaussian for the rest of this assignment. 2. Write a function called learnParams that takes in a data set and returns the learned mean and standard deviation for each class. Specifically, the function will be called as: params=learnMean(Data) where Data is a numpy array with shape (N,2) where N is the number of data points and params is a numpy array with shape (M,2) where there are M classes, params[i,0] is the mean for class i and params[i,1] is the standard deviation of class i. learnParams(np.array([[0,200],[1,1500],[0,300],[1,1700], [0,400],[1,1300]]) would return np.array([[300,100],[1500,200]]) 3. Write a function called learnPriors that takes in a data set and returns the prior probability of each class. Specifically, the function will be called as: priors=learnPriors(Data) where Data is a numpy array with shape (N,2) where N is the number of data points and priors is a numpy array with shape (M) where there are M classes, priors[i] is the estimated prior probability for class i . learnPriors(np.array([[0,200],[1,1500],[0,300],[1,1700], [0,400],[1,1300]]) would return np.array([0.5,0.5]) 4. Write a function called labelBayes that takes in posting times for multiple users as well as the learned parameters for the likelihoods and prior, and return the most probably class for each user. Specifically, the function will be called as: labelsOut = labelBayes(postTimes,paramsL,priors) where postTimes is a numpy array of shape (K) containing post times for K users, paramsL is a numpy array with shape (M,2) matching the description of the output for learnParams and priors is a numpy array with shape (M) matching the description of the output for learnPriors ; labelsOut is a numpy array with shape (K) containing the most probable label for each user, where labelsOut[j] corresponds to postTimes[j] . Labels are computed using the Gaussian Bayes classifier! labelBayes(np.array([430,2110,845]), np.array([[300,100],[1500,250]]),np.array([0.2,0.8])) would return np.array([0,1,1]) 5. Write a function called evaluateBayes that takes in classifier parameters for likelihoods and priors, and a set of labels and feature values, and returns the percent of input data correctly classified. Specifically, the function will be called as: accuracy = evaluateBayes(paramsL,priors,testData) where paramsL is a numpy array with shape (M,2) matching the description of the output for learnParams and priors is a numpy array with shape (M) matching the description of the output for learnPriors , testData is a numpy array with shape (J,2) where testData[j,0] contains the label of data point j and testData[j,1] contains the feature value (posting time) for data point j ; accuracy is a number between 0 and 1 indicating the accuracy of the Gaussian Bayes classifier using the specified parameters on the specified input data set. evaluateBayes(np.array([[300,100],[1500,200]]), np.array([0.2,0.8]), np.array([0,430],[1,2110],[0,845])) would return 0.6666 6. Our definition for time-of-day for user posting is not truly linear or continuous. 859 (8:59am) is followed by 900 (9:00am), skipping the integers 860, 861, through 899. 2359 (11:59pm) is followed by 0 (midnight) it is much closer in time to midnight than it is to 2030 (8:30pm), while the integer 2359 is much closer to 2030 than it is to 0. Rewrite either learnParams (from question 2) or labelBayes (from question 4) to more- naturally reflect the circular nature of the clock, and to account for skips in integers for each hour. Call this function learnParamsClock or labelBayesClock. Explain the reasoning of your approach in a comment in your function. There are many reasonable ways to answer this question! testData2:[1200x2 double array]