1) Machine Learning is:
A. Optimizing a piece of code to recognize patterns in data
B. Useful in data driven analysis
C. Grouping of data to reveal clusters if any
D. Applied in web searches
2) One of the below is not a category of machine learning algorithms
A. Supervised
B. Unsupervised
C. Semi-Supervised
D. Descriptive
E. Reinforcement
3) (T/F) In Unsupervised Machine Learning, labeled data is useful in training a model so that unknown values can be given a label (category)
4) An example where past student’s study times and grades are used to predict what an incoming student’s grade might look like can be solved using one or more of these ML techniques
A. Regression
B. Neural Networks
C. Naïve Bayes
D. K-Means
E. KNN
5) (T/F) Unsupervised Learning is also referred to as Clustering sometimes
6) Test data is used for:
A. Generating new data
B. Identifying issues with the data
C. Validate Model
D. Write Hypothesis
7) (T/F) An estimator is a function from sample data to some estimand, such as a value of a parameter
8) (T/F) Prediction is the common theme between the disciplines of Statistics and Machine Learning
9) Probability Distribution provides:
A. Knowledge of the properties of distribution families
B. Usefulness in analyzing data
C. Representation of probability values to data points in event
D. A way to assume anything about the data
10) A classifier algorithm will help to:
A. Correlate input data to a class category
B. Predict the continuous values of a variable
C. Define the underlying relationship between only two variables
D. All of the above
11) Naïve Bayes classifier is based on:
A. Markov model
B. KNN method
C. Neural Network
D. Bayes Theorem
12) The assumption of conditional independence refers to:
A. Independently working on solving problems
B. Creating a Data Model with categorical variables
C. Creating a Data Model with only Binary variables
D. Non-dependence or mutual exclusivity of two events
13) (T/F) In k-means clustering algorithm, subjective choice for the value of K is made
14) (T/F) K-means and KNN are both examples of clustering algorithms
15) ANN is a type of:
A. Data Model
B. Data Structure
C. Machine Learning Algorithm
D. Unsupervised machine learning algorithm
16) What python library/package did you use to encode categorical variables to numeric values
A. Matplotlib
B. Sklearn
C. Numpy
D. LabelEncoder
17-19) Write a python syntax/pseudocode that will perform the following tasks:
a) Use relevant libraries to import machine learning package sklearn
b) Create an array of two variables (features: Size & Color) and one label (Sales):
Size: Small, Medium, Large
Color: Red, Green, Blue, Orange
Sales: Low, Medium, High
c) Populate the training data with the values seen below
Size Color Sales
Small Red Medium
Small Blue Low
Medium Orange High
Large Blue High
Large Red Low
Small Blue High
Medium Red High
Small Blue Medium
Large Orange Low
Small Blue High
d) Invoke a KNN classifier function and predict if size is small and color blue, what are the sales prediction
20) Briefly describe any application where you would use a machine learning algorithm
- Write a problem statement
- Method recommended
- Steps involved and analysis