BIA 6301 APPLIED DATA MINING HOMEWORK ASSIGNMENT #2 General Instruction: The homework assignment is due on the assigned date at 5:45 PM. Assignment turned in after the due date and time will lose 2...

instructions in word doc


BIA 6301 APPLIED DATA MINING HOMEWORK ASSIGNMENT #2 General Instruction: The homework assignment is due on the assigned date at 5:45 PM. Assignment turned in after the due date and time will lose 2 points for every day late. No assignment will be accepted one week after it is assigned. Here is a breakdown of the point distribution. Task Points Possible Preferred File Name Part A: 3 questions Part B: 3 questions 3 x 3/each = 9 points 3 X 4/each = 12 LastName_text.docx R markdown documentation for Parts A & B 2 parts x 2 points/each part = 4 points LastName.html Total 25 points To help facilitate the grading process, please use the file naming convention listed in the table above. Please upload the required files onto Blackboard for grading. The html outputs of your markdown files will not be graded, but they will be checked if necessary to verify your findings and recommendations. Point deductions may occur if there are major discrepancies between your written answers and memorandums and the knitted markdown files. Please upload your Word documents individually onto Blackboard. Please put your html files in a zipped folder and upload it. Blackboard’s upload feature does not accept html files. Underwriting at General Casualty Kansas City Insurance You are a newly hired data scientist at General Casualty Kansas City (GCKC) insurance. The CEO of GCKC would like to try to improve their pricing and product offerings for automobile insurance. They have traditionally only operated in the Midwest (IL, IN, MI, OH, WI, IA, KA, MN, MO, NE, ND, SD) but would like to go nationwide. She gives you data on fatal automobile accidents from 2011 and asks you to “see what the data tell you.” The file crash.csv contains nearly 5000 observations on the following variables: · State · Atmospheric condition - weather · Crash date (mm/dd/yy) · Fatalities in crash · Roadway – type of location where crash occurred · Age of driver · Alcohol results – blood alcohol level for the driver · Person type – who was killed · Drug involved · Race of the driver · Gender of the driver · Severity of the injury · Year of the crash (2011) · Month of the crash (integer) · Day of the month of the crash (integer) · Day of the week of the crash (integer) GCKC is still seeking approval for their expansion from the regulators and may only be allowed to add one additional region at a time. (The other regions are defined as follows: Northeast: CT, MN, MA, NH, RI, VT; South: DE, FL, GA, MD, NC, SC, VA, DC, WV); West: AZ, CO, ID, MO, NE, NM, UT, WY, AK, CA, HI, OR, WA) Because the CEO hasn’t given you a target variable, you decide that an unsupervised learning method is appropriate to start. You choose to perform k-means cluster analysis on the provided data set. Part A: Define the question(s) and prepare the data Because these data are from law enforcement files, they are not in the best shape for insurance analysis. A fair amount of cleaning and preprocessing is necessary. You will be working with other data scientists in the department so your code should clearly show what you have done. Documentation is key here. You will be asked to report to the senior data scientist what you have found and how you intend to proceed. In no more than 3 or 4 paragraphs (not including tables and /or charts if you choose to use them), answer these questions: 1. Perform some preliminary EDA on the data and describe the sample. What do you learn about fatal automobile accidents from this exercise? How can you use that information? 2. What steps will you take to prepare the data? Why? 3. What questions will you be able to answer with the cleaned data? What limitations will you face? (Note: because the audience here are other data scientists, technical terminology is fine.) Part B: Unsupervised learning analysis The senior data scientist agrees with your approach and tells you to continue the analysis and brief the CEO. Perform a k-means analysis with at least 3 different k’s. In no more than 3 or 4 paragraphs, answer the following questions for the CEO. 1. How many types or groups of crashes are prevalent in the provided data set after you have conducted the analysis? How did you determine that? Describe the characteristics of each group. (Hint: You should generate aggregate profiles for the clusters and then sort the centroid values along with the item names in descending order. You may also want to filter out the items with very low centroid values. These tasks will make it easier to identify the characteristics of the groups.) 2. Are there geographic differences in the crash types or fatalities? If yes, what were they? How might that affect the CEO’s decision? 3. Can you make at least two or three recommendations for policy or pricing options to the CEO based on your analysis? These recommendations can range from suggestions on whether or not to offer automobile insurance to certain areas or drivers as well to suggesting pricing differentials for different customers. Explain your reasoning. Because you are briefing the CEO, technical language is NOT appropriate. She doesn’t understand terms like “k-means” and “centroid” so do not use them in your response. Your paragraphs should contain clear, concise, and correct language with no technical jargon. State,Atmospheric Condition,Crash Date,Fatalities in crash,Roadway,Age,Alcohol Results,Person Type,Drug Involvement,Race,Gender,Injury Severity,Crash Date.year,Crash Date.month,Crash Date.day-of-month,Crash Date.day-of-week Alaska,Clear,1/5/2011,1,Rural-Principal Arterial-Interstate,27,0,Driver of a Motor Vehicle In-Transport,Yes,,Male,Non-incapacitating Evident Injury (B),2011,1,5,3 Alaska,Clear,1/5/2011,1,Rural-Principal Arterial-Interstate,60,0,Driver of a Motor Vehicle In-Transport,No,White,Female,Fatal Injury (K),2011,1,5,3 Arizona,Clear,1/1/2011,1,Urban-Other Principal Arterial,24,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Female,No Injury (O),2011,1,1,6 Arizona,Clear,1/1/2011,1,Urban-Other Principal Arterial,27,0,Driver of a Motor Vehicle In-Transport,No,,Male,No Injury (O),2011,1,1,6 Arizona,Clear,1/1/2011,1,Urban-Other Principal Arterial,82,0,Pedestrian,No,Unknown,Female,Fatal Injury (K),2011,1,1,6 Arkansas,Clear,1/2/2011,1,Rural-Minor Arterial,40,0,Driver of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,2,7 Colorado,Clear,1/2/2011,1,Rural-Local Road or Street,17,0,Driver of a Motor Vehicle In-Transport,No,White,Female,Fatal Injury (K),2011,1,2,7 Connecticut,Clear,1/1/2011,1,Urban-Local Road or Street,22,0.21,Driver of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,1,6 Delaware,Cloudy,1/4/2011,1,Urban-Other Principal Arterial,4,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Female,Non-incapacitating Evident Injury (B),2011,1,4,2 Delaware,Cloudy,1/4/2011,1,Urban-Other Principal Arterial,20,0.21,Driver of a Motor Vehicle In-Transport,Yes,White,Male,Fatal Injury (K),2011,1,4,2 District of Columbia,Clear,1/14/2011,2,Urban-Local Road or Street,20,,Driver of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,14,5 District of Columbia,Clear,1/14/2011,2,Urban-Local Road or Street,31,0.1,Driver of a Motor Vehicle In-Transport,No,White,Male,Fatal Injury (K),2011,1,14,5 Florida,Rain,1/1/2011,1,Urban-Minor Arterial,5,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,1,6 Florida,Rain,1/1/2011,1,Urban-Minor Arterial,70,,Driver of a Motor Vehicle In-Transport,Not Reported,White,Female,Fatal Injury (K),2011,1,1,6 Georgia,"Fog, Smog, Smoke",1/1/2011,1,Urban-Local Road or Street,23,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Male,Incapacitating Injury (A),2011,1,1,6 Georgia,"Fog, Smog, Smoke",1/1/2011,1,Urban-Local Road or Street,24,0.03,Driver of a Motor Vehicle In-Transport,Unknown,Black,Male,Fatal Injury (K),2011,1,1,6 Hawaii,Clear,1/1/2011,2,Rural-Minor Arterial,0,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Female,No Injury (O),2011,1,1,6 Hawaii,Clear,1/1/2011,2,Rural-Minor Arterial,19,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Female,Non-incapacitating Evident Injury (B),2011,1,1,6 Hawaii,Clear,1/1/2011,2,Rural-Minor Arterial,26,,Driver of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,1,6 Idaho,Clear,1/2/2011,1,Rural-Principal Arterial-Interstate,22,0,Driver of a Motor Vehicle In-Transport,No,,Female,Non-incapacitating Evident Injury (B),2011,1,2,7 Idaho,Clear,1/2/2011,1,Rural-Principal Arterial-Interstate,23,,Passenger of a Motor Vehicle In-Transport,Not Reported,Black,Female,Fatal Injury (K),2011,1,2,7 Idaho,Clear,1/2/2011,1,Rural-Principal Arterial-Interstate,25,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,2,7 Illinois,Clear,1/1/2011,1,Urban-Minor Arterial,27,0,Driver of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,1,6 Indiana,Rain,1/1/2011,1,Rural-Local Road or Street,25,0.09,Driver of a Motor Vehicle In-Transport,Yes,Black,Male,Fatal Injury (K),2011,1,1,6 Kansas,Clear,1/2/2011,2,Rural-Principal Arterial-Other,25,,Driver of a Motor Vehicle In-Transport,No,,Female,Non-incapacitating Evident Injury (B),2011,1,2,7 Kansas,Clear,1/2/2011,2,Rural-Principal Arterial-Other,61,0,Driver of a Motor Vehicle In-Transport,No,White,Male,Fatal Injury (K),2011,1,2,7 Kentucky,Clear,1/1/2011,2,Urban-Other Principal Arterial,7,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Female,Non-incapacitating Evident Injury (B),2011,1,1,6 Kentucky,Clear,1/1/2011,2,Urban-Other Principal Arterial,18,0,Passenger of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,1,6 Kentucky,Clear,1/1/2011,2,Urban-Other Principal Arterial,20,,Passenger of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,1,6 Kentucky,Clear,1/1/2011,2,Urban-Other Principal Arterial,28,0.12,Driver of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,1,6 Kentucky,Clear,1/1/2011,2,Urban-Other Principal Arterial,37,,Driver of a Motor Vehicle In-Transport,Not Reported,,Male,Non-incapacitating Evident Injury (B),2011,1,1,6 Louisiana,Rain,1/1/2011,1,Rural-Principal Arterial-Interstate,41,0,Driver of a Motor Vehicle In-Transport,Unknown,White,Male,Fatal Injury (K),2011,1,1,6 Maine,Clear,1/3/2011,2,Rural-Major Collector,82,0,Driver of a Motor Vehicle In-Transport,Not Reported,White,Male,Fatal Injury (K),2011,1,3,1 Massachusetts,Cloudy,1/25/2011,1,Urban-Other Principal Arterial,63,0,Driver of a Motor Vehicle In-Transport,Not Reported,Black,Female,Fatal Injury (K),2011,1,25,2 Michigan,Rain,1/1/2011,1,Rural-Major Collector,75,,Driver of a Motor Vehicle In-Transport,No,,Female,No Injury (O),2011,1,1,6 Minnesota,Snow,1/3/2011,1,Urban-Minor Arterial,57,0,Driver of a Motor Vehicle In-Transport,Unknown,White,Male,Fatal Injury (K),2011,1,3,1 Mississippi,Clear,1/1/2011,1,Urban-Principal Arterial-Other
Nov 20, 2019
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here