See attached pdf
ISE 5103 Intelligent Data Analytics Homework #2 Instructor: Charles Nicholson See course website for due date Learning objective: Explore and visualize data. Submission notes: 1. You will submit a PDF file with your solutions. Additionally, you will provide the R code you created to address the problems. The PDF is primarily what will be graded. The grader may view your R code, but should never have to in order to find your solutions. 2. In the PDF, clearly identify each problem (e.g., Problem 1a, Problem 2b, etc.) Also, note that only relevant and informative computer output should be provided. 3. Make sure to provide comments on what your R code is doing. Keep it clean and clear! 4. You will submit your complete R script. Note: include library commands to load all packages that are used in the completion of the assignment. Place these statements at the top of your script. 5. Do not zip your files for submission. Submit exactly two files. Name the files “LastName-HW1” with the appropriate file extension (that is, .pdf for the write-up and .R for the script) 1 Learning ggplot2 (50 points) need space For this problem you will read through and work some of the exercises from Chapter 3 of the online book “R for Data Science”. The book can be found here: http://r4ds.had.co.nz/. These questions are relatively easy, but the material in the book is great for learning ggplot2. Please provide any related code and graphs along with your answers for each problem. (a) (30 points) Please address the following questions from Chapter 3 of “R for Data Science”: • 3.2.4 Exercises #4, #5 • 3.3.1 Exercises #3, #4, #6 • 3.5.1 Exercises #4 (b) (20 points) After reading this chapter, you should be ready to reproduce the plot in Figure 1 using the same mpg data from above. Please do so. Make sure you notice the jitter and alpha levels, notice that there is both a loess smoothing and a linear smoothing (in black), and also, that the x and y axes are labeled. 1 http://r4ds.had.co.nz/ Figure 1: Please reproduce this visualization for the mpg data. 4 f r 2 3 4 5 6 7 2 3 4 5 6 7 2 3 4 5 6 7 20 30 40 Displacement H ig hw ay M P G Page 2 2 House prices data: Exploratory Data Analysis and Visualization (50 points) need space The housingData.csv file in the course website is real data associated with 1,000 residential homes sold in Ames, Iowa between 2006 and 2010. The data set includes over 70 explanatory variables – many of which are factors with several levels. The file housingVariables.pdf provides a concise explanation of the variables and the factor levels in the data. We will use this data set again in class. In preparation of that, perform some basic exploratory data analysis and visualization of the data to get an idea of what is here. Specifically, using ggplot2, create at least 5 different, non-trivial, visualizations of the data that you believe are informative. You do not have to analyze every variable! However, I encourage you to play around with different possibilities and present the best ones. For each visualization, you must comment briefly (1-3 sentences) on what is useful/informative in the visualization. Various possible visualizations include (but are not limited to) scatter plots with trend lines, sploms, parallel histograms, ridgeline polots, overlaid density plots, stacked bar charts, parallel plots, heatmap of correlations, missing value visualizations, tree maps, etc. You might want to check out https://www.r-graph-gallery.com/index.html for some ideas. Question: What does the professor mean when he says: “non-trivial” visualizations? Answer: What I mean is push yourself to find something interesting in the data. Do not simply produce 5 scatterplots and call it a day. Use different visualizations. Use color, alpha, jitter, other layers, and comparisons to help find and tell a story. Grading on “non-trivial” is entirely subjective – and I do not like lazy when it comes to visualizations... Just do something great and you’ll be fine. Page 3 https://www.r-graph-gallery.com/index.html Id,MSSubClass,MSZoning,LotFrontage,LotArea,Alley,LotShape,LandContour,LotConfig,LandSlope,Neighborhood,Condition1,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,X1stFlrSF,X2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EncPorchSF,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SalePrice 1,20,RL,NA,11000,NA,IR1,Lvl,CulDSac,Gtl,NAmes,Norm,1Fam,1Story,5,6,1966,1966,Gable,Plywood,Plywood,BrkFace,200,Avg,Avg,CBlock,Avg,Avg,Mn,BLQ,740,Rec,230,184,1154,GasA,AboveAvg,Y,SBrkr,1154,0,0,1154,0,0,1,1,3,1,Avg,6,Typ,1,BelowAvg,Attchd,1966,RFn,2,480,Avg,Avg,Y,0,58,0,0,NA,MnPrv,NA,0,11,2009,WD,154000 2,20,RL,NA,36500,NA,IR1,Low,Inside,Mod,ClearCr,Norm,1Fam,1Story,5,5,1964,1964,Gable,Wd Sdng,Wd Sdng,BrkCmn,621,Avg,AboveAvg,CBlock,Avg,Avg,Av,Rec,812,Unf,0,812,1624,GasA,BelowAvg,Y,SBrkr,1582,0,0,1582,0,1,2,0,4,1,Avg,7,Typ,0,NA,Attchd,1964,Unf,2,390,Avg,Avg,N,168,198,0,0,NA,NA,NA,0,6,2006,WD,190000 3,20,RL,57,9764,NA,IR1,Lvl,other,Gtl,Sawyer,Feedr,1Fam,1Story,5,7,1967,2003,Gable,VinylSd,VinylSd,None,0,Avg,Avg,CBlock,Avg,Avg,No,BLQ,702,Unf,0,192,894,GasA,AboveAvg,Y,SBrkr,894,0,0,894,1,0,1,0,3,1,AboveAvg,5,Typ,0,NA,Attchd,1967,RFn,2,450,Avg,Avg,Y,0,0,0,0,NA,NA,NA,0,5,2008,WD,130000 4,70,RL,NA,7500,NA,IR1,Bnk,Inside,Gtl,Crawfor,Norm,1Fam,2Story,6,7,1942,1950,Gable,Wd Sdng,Wd Sdng,None,0,Avg,Avg,CBlock,Avg,Avg,No,BLQ,547,Unf,0,224,771,GasA,BelowAvg,Y,SBrkr,753,741,0,1494,0,0,1,0,3,1,AboveAvg,7,Typ,2,AboveAvg,Attchd,1942,Unf,1,213,Avg,Avg,P,0,0,224,0,NA,NA,NA,0,11,2009,WD,177500 5,20,RL,80,9200,NA,Reg,Lvl,Inside,Gtl,NAmes,Norm,1Fam,1Story,6,6,1965,1965,Gable,HdBoard,HdBoard,None,0,Avg,Avg,CBlock,Avg,Avg,No,Rec,892,Unf,0,244,1136,GasA,Avg,Y,SBrkr,1136,0,0,1136,1,0,1,0,3,1,Avg,5,Typ,1,AboveAvg,Attchd,1965,RFn,1,384,Avg,Avg,Y,426,0,0,0,NA,NA,NA,0,7,2008,WD,140000 6,60,RL,72,11317,NA,Reg,Lvl,Inside,Gtl,CollgCr,Norm,1Fam,2Story,7,5,2003,2003,Gable,VinylSd,VinylSd,BrkFace,101,AboveAvg,Avg,PConc,AboveAvg,Avg,No,Unf,0,Unf,0,840,840,GasA,AboveAvg,Y,SBrkr,840,828,0,1668,0,0,2,1,3,1,AboveAvg,8,Typ,0,NA,Attchd,2003,RFn,2,500,Avg,Avg,Y,144,68,0,0,NA,NA,NA,0,9,2007,WD,180000 7,20,RL,80,8480,NA,Reg,Lvl,Corner,Gtl,Sawyer,Norm,1Fam,1Story,5,6,1963,1963,Hip,HdBoard,HdBoard,None,0,Avg,Avg,CBlock,Avg,Avg,No,GLQ,630,Unf,0,340,970,GasA,Avg,Y,SBrkr,970,0,0,970,1,0,1,0,2,1,Avg,5,Typ,0,NA,Detchd,1996,Unf,2,624,Avg,Avg,Y,0,24,192,0,NA,NA,NA,0,7,2007,WD,132500 8,70,RM,65,11700,Pave,IR1,Lvl,Corner,Gtl,OldTown,Norm,1Fam,2Story,7,7,1880,2003,other,other,other,None,0,AboveAvg,Avg,other,Avg,BelowAvg,No,Unf,0,Unf,0,1240,1240,other,Avg,N,SBrkr,1320,1320,0,2640,0,0,1,1,4,1,AboveAvg,8,Typ,1,AboveAvg,Detchd,1950,Unf,4,864,Avg,Avg,N,181,0,386,0,NA,NA,NA,0,5,2009,WD,265979 9,60,RL,80,9760,NA,Reg,Lvl,Inside,Mod,NAmes,Norm,1Fam,2Story,6,6,1964,1964,Gable,HdBoard,HdBoard,BrkFace,360,Avg,Avg,CBlock,Avg,Avg,Gd,GLQ,674,LwQ,106,0,780,GasA,Avg,Y,SBrkr,798,813,0,1611,1,0,1,1,4,1,Avg,7,Typ,0,NA,Attchd,1964,RFn,2,442,Avg,Avg,Y,328,128,189,0,NA,NA,NA,0,6,2008,WD,167900 10,60,RL,93,10261,NA,IR1,Lvl,Inside,Gtl,Gilbert,Norm,1Fam,2Story,6,5,2000,2000,Gable,VinylSd,VinylSd,BrkFace,318,Avg,Avg,PConc,AboveAvg,Avg,No,Unf,0,Unf,0,936,936,GasA,AboveAvg,Y,SBrkr,962,830,0,1792,1,0,2,1,3,1,Avg,8,Typ,1,Avg,Attchd,2000,Fin,2,451,Avg,Avg,Y,0,0,0,0,NA,NA,NA,0,5,2008,WD,186500 11,20,RL,100,10175,NA,IR1,Lvl,Inside,Gtl,NAmes,Norm,1Fam,1Story,6,5,1964,1964,Gable,HdBoard,Plywood,BrkFace,272,Avg,Avg,CBlock,Avg,Avg,No,BLQ,490,Unf,0,935,1425,GasA,AboveAvg,Y,SBrkr,1425,0,0,1425,0,0,2,0,3,1,Avg,7,Typ,1,AboveAvg,Attchd,1964,RFn,2,576,Avg,Avg,Y,0,0,407,0,NA,NA,NA,0,7,2008,WD,180500 12,120,RL,43,3182,NA,Reg,Lvl,Inside,Gtl,other,Norm,TwnhsE,1Story,7,5,2005,2006,Gable,VinylSd,VinylSd,BrkFace,16,AboveAvg,Avg,PConc,AboveAvg,Avg,Av,GLQ,16,Unf,0,1357,1373,GasA,AboveAvg,Y,SBrkr,1555,0,0,1555,0,0,2,0,2,1,AboveAvg,7,Typ,1,Avg,Attchd,2005,Fin,2,430,Avg,Avg,Y,143,20,0,0,NA,NA,NA,0,5,2009,WD,192500 13,60,RL,75,7950,NA,IR1,Bnk,Corner,Gtl,Edwards,Norm,1Fam,2Story,6,6,1977,1977,Hip,HdBoard,Plywood,BrkFace,140,Avg,Avg,CBlock,Avg,Avg,No,BLQ,535,Unf,0,155,690,GasA,Avg,Y,SBrkr,698,728,0,1426,0,0,1,1,3,1,Avg,6,Typ,0,NA,Attchd,1977,Fin,2,440,Avg,Avg,Y,252,0,0,0,NA,MnPrv,NA,0,7,2009,WD,159500 14,20,RL,80,9600,NA,Reg,Lvl,other,Gtl,other,Feedr,1Fam,1Story,6,8,1976,1976,Gable,MetalSd,MetalSd,None,0,Avg,Avg,CBlock,AboveAvg,Avg,Gd,ALQ,978,Unf,0,284,1262,GasA,AboveAvg,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,Avg,6,Typ,1,Avg,Attchd,1976,RFn,2,460,Avg,Avg,Y,298,0,0,0,NA,NA,NA,0,5,2007,WD,181500 15,70,RL,66,6858,NA,Reg,Bnk,Corner,Gtl,other,Norm,1Fam,2Story,6,4,1915,1950,Gable,Wd Sdng,Wd Sdng,None,0,Avg,Avg,PConc,AboveAvg,Avg,No,Unf,0,Unf,0,806,806,GasA,Avg,N,FuseF,841,806,0,1647,1,0,1,1,4,1,BelowAvg,6,Typ,0,NA,Detchd,1920,Unf,1,216,Avg,Avg,Y,0,66,136,0,NA,NA,NA,0,5,2010,WD,128000 16,70,RM,60,9600,Grvl,Reg,Lvl,Inside,Gtl,OldTown,Norm,1Fam,2Story,4,2,1900,1950,Gable,other,other,None,0