Need assignment solution and R file
CS688 Assignment 2: Machine Learning & Data Vis Please follow the submission requirements at the end of the assignment! Make sure you complete both pages of questions. Part A: (80 points) ** Important: Enable your facilitator to reproduce your results. Because some classifiers break ties by random selection, you should set the random seed at appropriate places in your code. Be sure to test your code to ensure that it gives the same result EVERY TIME you re-run it. If your facilitator cannot replicate your results it is highly unlikely that you will be awarded full credit. 1. (5 points) Load the provided file “creditcardspend.csv” into R (read.csv) or Python (csv.reader). 2. (15 points) Select a supervised learning classifier (can use one demonstrated in live class or any other classifier of your choosing) to determine whether features in the data can be used to predict which Buyer used the credit card. Run it against the training data (look for number 0 in the column named “Validation”). Then measure the classifier recall, precision, F-score, and accuracy against the test data (number 1 in the column named “Validation”). ** You will need to pick one of the Buyers to be a ‘positive’ and the other Buyer will have to be a ‘negative’. In this case it is not too important, but be sure to document your selection. 3. (25 points) Consider whether feature generation and/or choosing different values for hyperparameters may impact one of the measurements of classifier performance. Take steps that you believe might improve the classifier’s performance. Explain why you took these steps and what you learned. Full credit requires you to either (a) explore several different values for two or more hyperparameters or (b) explore feature generation AND several different values for one or more hyperparameters. 4. (5 points) Which configuration of the classifier are you happiest with? Explain why. 5. (15 points) Determine how to write a file to disk that contains your preferred hyperparameter settings and the features you selected and/or generated. Clear or reset these variables in your R or Python session. Then re-load the settings and features from the disk, reset your random seed, and re-run your classifier. Show that this produces the same solution as the one you arrived at earlier. 6. (15 points) Suppose you receive the following transaction: Card 3, Category 1, Amount $135.86, but the Buyer is unknown. Use your classifier to predict the Buyer (is it Buyer 0 or Buyer 1)? What type of learning is this: supervised, unsupervised, or semi-supervised? Explain. Part B: (20 points) Obtain a data set from the web on a topic that is interesting to you that you believe contains insights you could share in a data visualization. You might choose a table of player statistics in your favorite sport (games started, minutes played, fouls, goals, etc.). Or you might choose to consider the spread of a disease over time. Perhaps you might look at the value of an investment, or where the average person spends their money. (Task continues on the next page) 1. (3 points) Load the data into R or Python. (Also be sure to provide a valid URL or upload a copy of the data with your submission!) 2. (12 points) Create five data visualizations of your choice that highlight interesting elements in the data you chose. You must use at least two different types of plots (examples: scatter plots, bar plots, line graphs, etc.). 3. (5 points) Discuss the principles of data visualization and any actions you took to ensure your data visualizations followed the principles. (If you did not take any actions to follow data visualization principles, defend this choice.) SUBMISSION REQUIREMENTS: · Create a Word, PDF, or Rmd document. If you use Rmd you will need to make sure to save the output as a PDF. · For each question, state the question you are answering. Then answer the question by explaining in sentences (in English, not in R or other languages) what you did to get to the answer. You may include screenshots and/or copy-paste of key lines of code and the corresponding output in your answer. (If you are using Rmd, this means you must generally use echo=FALSE and/or include=FALSE for the body of the document.) · For Part A, full code should be included as an Appendix to your Word or PDF document. Part A may be coded in R or Python. Do NOT include full code in the main part of your document. · For Part B, coding must be in R, and because of the nature of this question, please DO provide the full code in the main part of your docoument. · Please ensure that a Word or PDF file as the first file in your submission. · You may also separately upload your R and/or Rmd code to Blackboard. · If your facilitator tells you to submit the files differently than the above guidelines, you are expected to respect your facilitator’s wishes starting on the next assignment. · Facilitators can deduct up to 20% if you fail to follow these requirements (more if the questions are not actually answered). · Facilitators can deduct 5% for each day the assignment is late. You may submit one (and only one) of the six assignments up to three days late with no penalty but all other assignments will be penalized. · Unless your facilitator or the professor agrees, your assignment will not be graded if it is more than 3 days late (e.g., no credit will be given after Friday at 6 AM Boston time). The professor will usually ask the facilitator to make the decision but in rare cases (<1% of the time) has overridden a facilitator. do not expect the professor to override in most cases. credit card,category,amount,buyer,validation,,cardkey,categorykey 1,1,16.53,0,0,,1 = amex,1 = grocery 1,2,33.01,0,0,,2 = store,2 = gas & car maintenance 1,3,22.62,1,0,,3 = visa,3 = dining 1,1,16.04,1,0,,4 = master,4 = misc 1,4,4.99,0,0,,5 = discover,5 = health 1,1,66.39,0,0,,,6 = education 1,1,16.65,1,0,,,7 = travel 1,2,42.02,0,0,,,8 = home improvement & furnishings 1,2,28.85,1,0,,,"9 - cell phones, cable & utilities" 1,1,157.12,0,0,,,10 = books 1,1,170.03,0,0,,,11 = clothing 1,1,45.48,0,0,,, 1,2,38.91,0,0,,, 1,1,33.95,1,0,,, 1,5,18.14,1,0,,, 1,1,3.78,1,0,,, 1,2,32.75,0,0,,, 1,2,34.75,1,0,,, 1,1,20.6,1,0,,, 1,1,50.88,0,0,,, 1,4,5.3,0,0,,, 1,1,86.99,0,0,,, 1,1,57.21,0,0,,, 1,1,42.35,0,0,,, 1,1,42.8,0,0,,, 1,2,39.22,0,0,,, 1,1,38.15,1,0,,, 1,1,9.07,1,0,,, 1,1,15.17,1,0,,, 1,1,255.77,0,0,,, 1,1,34.65,0,0,,, 1,1,25,1,0,,, 1,1,143.65,0,0,,, 1,1,293.26,0,0,,, 1,1,1.49,1,0,,, 1,1,69.68,1,0,,, 1,2,45.58,0,0,,, 1,2,38.6,1,1,,, 1,1,12.16,1,1,,, 1,4,6.36,0,1,,, 1,2,45.31,0,1,,, 1,1,165.19,0,1,,, 1,1,133.68,0,1,,, 1,1,49.85,1,1,,, 1,2,36.6,1,1,,, 1,1,10.45,1,1,,, 1,1,49.67,0,1,,, 1,1,57.95,1,1,,, 1,2,25.09,1,1,,, 1,1,16.84,1,1,,, 1,6,4.27,0,1,,, 1,1,173.12,0,1,,, 1,1,30.84,0,1,,, 1,2,26.92,1,1,,, 1,1,92.49,0,1,,, 1,7,665.2,1,0,,, 1,7,4.8,1,1,,, 2,11,30.1,0,0,,, 2,11,21.44,0,0,,, 2,11,47.7,0,1,,, 3,4,32.11,1,0,,, 3,1,262.2,1,0,,, 3,8,50.52,1,0,,, 3,8,48.13,1,0,,, 3,8,176.63,0,0,,, 3,8,20.11,1,0,,, 3,8,50.75,1,0,,, 3,8,13.41,1,0,,, 3,8,-33.99,0,0,,, 3,8,212.43,0,0,,, 3,8,1.93,1,0,,, 3,8,3.94,1,0,,, 3,9,167.34,1,0,,, 3,2,36.9,0,0,,, 3,9,167.34,1,0,,, 3,5,20.97,0,1,,, 3,4,60,1,1,,, 3,2,38,0,1,,, 3,1,55.27,0,1,,, 3,2,23.38,1,1,,, 3,9,167.34,1,1,,, 4,3,19.32,1,0,,, 4,5,139.23,0,0,,, 4,3,8,0,0,,, 4,10,50.29,0,0,,, 4,11,132.91,0,0,,, 4,4,9,0,0,,, 4,3,27.02,0,0,,, 4,10,3.62,0,0,,, 4,3,28.9,0,0,,, 4,8,234.27,0,0,,, 4,3,68.44,1,0,,, 4,4,8.92,0,0,,, 4,3,31.95,1,0,,, 4,8,163.48,1,0,,, 4,8,-13.6,1,0,,, 4,3,40,1,0,,, 4,9,144.22,1,0,,, 4,3,17.05,1,0 of="" the="" time)="" has="" overridden="" a="" facilitator.="" do="" not="" expect="" the="" professor="" to="" override="" in="" most="" cases.="" credit="" card,category,amount,buyer,validation,,cardkey,categorykey="" 1,1,16.53,0,0,,1="amex,1" =="" grocery="" 1,2,33.01,0,0,,2="store,2" =="" gas="" &="" car="" maintenance="" 1,3,22.62,1,0,,3="visa,3" =="" dining="" 1,1,16.04,1,0,,4="master,4" =="" misc="" 1,4,4.99,0,0,,5="discover,5" =="" health="" 1,1,66.39,0,0,,,6="education" 1,1,16.65,1,0,,,7="travel" 1,2,42.02,0,0,,,8="home" improvement="" &="" furnishings="" 1,2,28.85,1,0,,,"9="" -="" cell="" phones,="" cable="" &="" utilities"="" 1,1,157.12,0,0,,,10="books" 1,1,170.03,0,0,,,11="clothing" 1,1,45.48,0,0,,,="" 1,2,38.91,0,0,,,="" 1,1,33.95,1,0,,,="" 1,5,18.14,1,0,,,="" 1,1,3.78,1,0,,,="" 1,2,32.75,0,0,,,="" 1,2,34.75,1,0,,,="" 1,1,20.6,1,0,,,="" 1,1,50.88,0,0,,,="" 1,4,5.3,0,0,,,="" 1,1,86.99,0,0,,,="" 1,1,57.21,0,0,,,="" 1,1,42.35,0,0,,,="" 1,1,42.8,0,0,,,="" 1,2,39.22,0,0,,,="" 1,1,38.15,1,0,,,="" 1,1,9.07,1,0,,,="" 1,1,15.17,1,0,,,="" 1,1,255.77,0,0,,,="" 1,1,34.65,0,0,,,="" 1,1,25,1,0,,,="" 1,1,143.65,0,0,,,="" 1,1,293.26,0,0,,,="" 1,1,1.49,1,0,,,="" 1,1,69.68,1,0,,,="" 1,2,45.58,0,0,,,="" 1,2,38.6,1,1,,,="" 1,1,12.16,1,1,,,="" 1,4,6.36,0,1,,,="" 1,2,45.31,0,1,,,="" 1,1,165.19,0,1,,,="" 1,1,133.68,0,1,,,="" 1,1,49.85,1,1,,,="" 1,2,36.6,1,1,,,="" 1,1,10.45,1,1,,,="" 1,1,49.67,0,1,,,="" 1,1,57.95,1,1,,,="" 1,2,25.09,1,1,,,="" 1,1,16.84,1,1,,,="" 1,6,4.27,0,1,,,="" 1,1,173.12,0,1,,,="" 1,1,30.84,0,1,,,="" 1,2,26.92,1,1,,,="" 1,1,92.49,0,1,,,="" 1,7,665.2,1,0,,,="" 1,7,4.8,1,1,,,="" 2,11,30.1,0,0,,,="" 2,11,21.44,0,0,,,="" 2,11,47.7,0,1,,,="" 3,4,32.11,1,0,,,="" 3,1,262.2,1,0,,,="" 3,8,50.52,1,0,,,="" 3,8,48.13,1,0,,,="" 3,8,176.63,0,0,,,="" 3,8,20.11,1,0,,,="" 3,8,50.75,1,0,,,="" 3,8,13.41,1,0,,,="" 3,8,-33.99,0,0,,,="" 3,8,212.43,0,0,,,="" 3,8,1.93,1,0,,,="" 3,8,3.94,1,0,,,="" 3,9,167.34,1,0,,,="" 3,2,36.9,0,0,,,="" 3,9,167.34,1,0,,,="" 3,5,20.97,0,1,,,="" 3,4,60,1,1,,,="" 3,2,38,0,1,,,="" 3,1,55.27,0,1,,,="" 3,2,23.38,1,1,,,="" 3,9,167.34,1,1,,,="" 4,3,19.32,1,0,,,="" 4,5,139.23,0,0,,,="" 4,3,8,0,0,,,="" 4,10,50.29,0,0,,,="" 4,11,132.91,0,0,,,="" 4,4,9,0,0,,,="" 4,3,27.02,0,0,,,="" 4,10,3.62,0,0,,,="" 4,3,28.9,0,0,,,="" 4,8,234.27,0,0,,,="" 4,3,68.44,1,0,,,="" 4,4,8.92,0,0,,,="" 4,3,31.95,1,0,,,="" 4,8,163.48,1,0,,,="" 4,8,-13.6,1,0,,,="" 4,3,40,1,0,,,="" 4,9,144.22,1,0,,,="">1% of the time) has overridden a facilitator. do not expect the professor to override in most cases. credit card,category,amount,buyer,validation,,cardkey,categorykey 1,1,16.53,0,0,,1 = amex,1 = grocery 1,2,33.01,0,0,,2 = store,2 = gas & car maintenance 1,3,22.62,1,0,,3 = visa,3 = dining 1,1,16.04,1,0,,4 = master,4 = misc 1,4,4.99,0,0,,5 = discover,5 = health 1,1,66.39,0,0,,,6 = education 1,1,16.65,1,0,,,7 = travel 1,2,42.02,0,0,,,8 = home improvement & furnishings 1,2,28.85,1,0,,,"9 - cell phones, cable & utilities" 1,1,157.12,0,0,,,10 = books 1,1,170.03,0,0,,,11 = clothing 1,1,45.48,0,0,,, 1,2,38.91,0,0,,, 1,1,33.95,1,0,,, 1,5,18.14,1,0,,, 1,1,3.78,1,0,,, 1,2,32.75,0,0,,, 1,2,34.75,1,0,,, 1,1,20.6,1,0,,, 1,1,50.88,0,0,,, 1,4,5.3,0,0,,, 1,1,86.99,0,0,,, 1,1,57.21,0,0,,, 1,1,42.35,0,0,,, 1,1,42.8,0,0,,, 1,2,39.22,0,0,,, 1,1,38.15,1,0,,, 1,1,9.07,1,0,,, 1,1,15.17,1,0,,, 1,1,255.77,0,0,,, 1,1,34.65,0,0,,, 1,1,25,1,0,,, 1,1,143.65,0,0,,, 1,1,293.26,0,0,,, 1,1,1.49,1,0,,, 1,1,69.68,1,0,,, 1,2,45.58,0,0,,, 1,2,38.6,1,1,,, 1,1,12.16,1,1,,, 1,4,6.36,0,1,,, 1,2,45.31,0,1,,, 1,1,165.19,0,1,,, 1,1,133.68,0,1,,, 1,1,49.85,1,1,,, 1,2,36.6,1,1,,, 1,1,10.45,1,1,,, 1,1,49.67,0,1,,, 1,1,57.95,1,1,,, 1,2,25.09,1,1,,, 1,1,16.84,1,1,,, 1,6,4.27,0,1,,, 1,1,173.12,0,1,,, 1,1,30.84,0,1,,, 1,2,26.92,1,1,,, 1,1,92.49,0,1,,, 1,7,665.2,1,0,,, 1,7,4.8,1,1,,, 2,11,30.1,0,0,,, 2,11,21.44,0,0,,, 2,11,47.7,0,1,,, 3,4,32.11,1,0,,, 3,1,262.2,1,0,,, 3,8,50.52,1,0,,, 3,8,48.13,1,0,,, 3,8,176.63,0,0,,, 3,8,20.11,1,0,,, 3,8,50.75,1,0,,, 3,8,13.41,1,0,,, 3,8,-33.99,0,0,,, 3,8,212.43,0,0,,, 3,8,1.93,1,0,,, 3,8,3.94,1,0,,, 3,9,167.34,1,0,,, 3,2,36.9,0,0,,, 3,9,167.34,1,0,,, 3,5,20.97,0,1,,, 3,4,60,1,1,,, 3,2,38,0,1,,, 3,1,55.27,0,1,,, 3,2,23.38,1,1,,, 3,9,167.34,1,1,,, 4,3,19.32,1,0,,, 4,5,139.23,0,0,,, 4,3,8,0,0,,, 4,10,50.29,0,0,,, 4,11,132.91,0,0,,, 4,4,9,0,0,,, 4,3,27.02,0,0,,, 4,10,3.62,0,0,,, 4,3,28.9,0,0,,, 4,8,234.27,0,0,,, 4,3,68.44,1,0,,, 4,4,8.92,0,0,,, 4,3,31.95,1,0,,, 4,8,163.48,1,0,,, 4,8,-13.6,1,0,,, 4,3,40,1,0,,, 4,9,144.22,1,0,,, 4,3,17.05,1,0>