Python Exam
Microsoft Word - Exam_3_Fall2020_Rubric (1).docx ECE 20875: Python for Data Science Fall 2020 Exam #3 Chris Brinton and Qiang Qiu You have 90 minutes to complete the 9 questions on this exam (the first question is the honor pledge, so it is really only 8 questions). The exam was written for about 60 minutes, so you should have more than enough time, but still be sure to pace yourself. The exam is worth 100 points total, and the points for each question are reflected next to the question. You can either print out this exam and write your answers on it, or you can write your answers on blank sheets of paper. When you are finished, you will use the Gradescope app or a scanner to upload your answers. The printing and uploading times are not part of this 80 minutes. Also keep in mind the following rules: 1. For each answer, you must show all of your work to receive credit. 2. Under NO circumstances are you permitted to communicate with ANYONE in ECE 20875 on the day of the exam. 3. You are NOT allowed to use any programming tools (e.g., calculator) or run any Python/bash scripts (e.g., Python, Jupyter notebook, PyCharm, bash, etc.) while taking the test. 4. The exam is open notes, so you may use the materials on the course webpage http://www.cbrinton.net/ECE20875-2020-Fall.html or your own notes. Question 1: Honor Pledge and Acknowledgment Please sign the honor pledge below by uploading a signature your full legal name. I understand and acknowledge the above instructions and notes. I also affirm that the answers given on this test are mine and mine alone. I did not receive help from any person or material (other than those explicitly allowed). X___________________________________________ Question 2: k-Means Clustering [13 pts] Below is a graph of two-dimensional datapoints which have been clustered into three groups (blue/square, yellow/triangle, red/circle) by running the k-Means algorithm. A) Compute and plot the centroids of each cluster on the graph above. How can you tell the k- Means algorithm has completed its execution? Explain in 1-2 sentences. (5 pts) - +1 Centroid of Blue Squares: (2, 10) - +1 Centroid of red circles: (10, 12) - +1 Centroid of yellow triangles: (11, 5) +2 You can tell the algorithm has run to completion because the distance has been minimized from every point to its relative centroid – and no points from one group are nearer a centroid from a different group B) Consider a new datapoint (?, ?) = (9, 8). Which of the existing clusters will this datapoint be assigned to, according to the k-Means assignment step? (4 pts) +2 points – belongs to yellow triangles +2 correct solving method To make this determination, calculate the Mean square distance between the new point and all centroids. The point is closest to the yellow triangles group with a distance of sqrt(13) ~ 3.6 C) In the plot above, ? = 3 was chosen based on the elbow/knee method. Knowing this, give a rough sketch below of what the sum of squared distances from datapoints to their assigned cluster would be as a function of ?. (4 pts) - +2 for monotonic decreasing +2 for k = 3 near elbow/knee K = 3 Question 3: Gaussian Mixture Models [15 pts] We collect several one-dimensional data points that we are interested in clustering. Using expectation maximization, we estimated the Gaussian Mixture Model (GMM) to be ?!(?) = ?" ∙ ?(?|?", ?"#) + ?# ∙ ?(?|?#, ?##) + ?$ ∙ ?(?|?$, ?$#) where ?" = 0.25, ?" = −2, ?"# = 1 ?# = 0.30, ?# = 0, ?## = 1 ?$ = 0.45, ?$ = 2, ?$# = 1 A) The plot below contains the graph of ?!(?). On this sample plot, give a rough sketch of each weighted component that comprises ?!(?), i.e., ?" ∙ ?(?|?", ?"#), ?# ∙ ?(?|?#, ?##), and ?$ ∙ ?(?|?$, ?$#). Be sure to emphasize the centers, relative heights, and shapes of each cluster. (4 pts) -2 pts for incorrect center or height -1 pts for components higher than p(x) -1 pts for variance being consistent -1 pts for not labeling centers B) Suppose we plan to collect a new datapoint, but do not have any information about its value yet. If we have to predict which cluster the new datapoint will belong to, what is our best guess? Explain your answer in 1-2 sentences. (4 pts) C3 as no x information we predict from highest component weight -2 pts for incorrect prediction -2 pts for incorrect explanation C) We collect the new datapoint and find its value is ? = −1. Without further updating the parameters of the GMM, which cluster will the datapoint be assigned to? Explain in terms of the likelihoods ?%& of this datapoint ? in each cluster ?. (4 pts) C2, y_2j > y_1j because pi_2 > pi_1 and N(-1|0,1) = N(-1|-2,1) y_2j > y_3j because N(-1|0,1) >> N(-1|2,1) answer can also be read from the heights of three curves at x=-1 -2 pts for incorrect prediction -2 pts for incorrect explanation D) If we add the datapoint from C to our dataset and execute one more iteration of EM to optimize the GMM parameters, how would the weight of each Gaussian component change? (a) ?"' and ?#' would increase, ?$' would decrease (b) ?"' and ?$' would decrease, ?#' would increase (c) ?"' and ?#' would decrease, ?$' would increase (d) ?#' and ?$' would increase, ?"' would decrease (e) ?"' and ?$' would increase, ?#' would decrease (3 pts) (a) 1,2 would increase, 3 would decrease -1.5 pts for choosing (b) -3 pts for choosing (c), (d), (e) Question 4: Inheritance [12 pts] The Indianapolis Zoo wants to make an interactive tool where visitors can learn about different types of snakes in their exhibit virtually. They have decided to use Python to write their program and need to catalog snakes in their exhibit before they can release it. They have created two classes: (1) the Snake class, and (2) the Python class which inherits some characteristics of Snake and adds a new attribute morph. A) Write the code for the initialization method of the Python class below. Your solution must use the super() function. B) Using what you know from part A, write the initialization function for the Snake class below. You must fill in the input arguments as well as the body of the function. class Python(Snake): def __init__(self, length, weight, name, morph): #your code here super().__init__(length, weight, name) self.morph = morph +2 points correct use of super() +2 points the code works def getName(self): print(“Python name: {}”.format(self.name)) return self.name class Snake: def __init__(self, length, weight, name): #your code here self.length = length self.weight = weight self.name = name +1 point correct parameters +1 point for each assignment statement def getName(self): print(“Snake name: {}”.format(self.name)) return self.name C) What will the lines of code below print? Write your answer underneath the associated call. (4pts) wirt = Python(30, 2, “Wirt”, “bel”) wirt.getName() (2 pt) “Python name: Wirt” greg = Snake(70, 10, “Greg”) greg.getName() (2 pt) “Snake Name: Greg” Question 5: Iterators and Generators [13 pts] A) Recall from Exam #1 how to calculate the Fibonacci series: the first two numbers of the series are always equal to 1, and each consecutive number is the sum of the last two numbers. For example, the first six Fibonacci numbers are 1, 1, 2, 3, 5, 8. Below is an implementation of skeleton code to generate and print the Fibonacci numbers. import numpy as np def fib_series(): a = 1 b = 1 while 1: #Your