You should use RStudio (probably with ggplot, tidyr, and dplyr) for this.
We will use a dataset from the UC-Irvine Machine Learning Data Repository.
It’s just a place to keep cool datasets. You might want to check it out sometime.
Wine quality dataset description:
http://archive.ics.uci.edu/ml/datasets/Wine+Quality
12 variables, 1599 rows of Red Wine:
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
Be sure to do these things for the big dataset (smithj should be your name and first initial, not smith, unless your name is J Smith):
Save the data into your own Y: Drive or GoogleDrive Space, using:
write.csv(winequality_red, file="RedWine.csv")
(optional) Make a new script file for your homework called smithj-220hw5.R
Make an RMarkdown file called smithj-220hw5.rmd
The final column, “quality” is a 1-10 variable, where 10 means a very high quality wine (1 is lousy).
This “quality” variable will be your “y” response variable for this assignment.
Import the dataset into RStudio using readr or the Import Dataset tool.
(Notice that the UCI file uses semicolons instead of commas as the delimiter).
Using the “pairs” command, look at all the variables. Eek.
Since we really only care about quality, let’s just look at that one against the others:
winequality_red %>%
gather(-quality, key = "var", value = "value") %>%
ggplot(aes(x = value, y = quality, color= "density")) +
geom_point() +#Would geom_jitter() be a better choice?
stat_smooth(method="lm") +#Might loess work better here?
facet_wrap(~ var, scales = "free")
Perhaps “alcohol” is be the best candidate. Make a scatterplot of the two variables
Make a simple regression predicting quality from density. Spoiler: lm(y~x)
From the simple display, what is your slope and intercept?
Using “summary,” what about r^2? Which variable is best?
Repeat your model using pH and density as the explanatory variable for quality.
Explore with a few more promising candidates, using lm and graphs
In RMarkdown, write some text around your analysis to make this like a report to someone who was trying to pick a great wine for a large party, perhaps a wedding.
Knit it into a .pdf file (probably as a .doc or .html first) and submit just the .pdf