We are particularly interested in what percentage
of COVID-19 cases are clustered in highly dense populations and lower-incomes—compared to
less dense areas and higher-income groups. This initial data set will give us a glimpse into the
total percentages while we dive deeper into understanding how symptoms vary in different
socioeconomic groups.
Question we are curious about:
● Knowing that population density and poverty positively correlate with COVID-19
cases, what are steps that society or the government can take to prevent these
disparities?
● If population density does not aggravate the conception of COVID-19, how does the
early intervention of crisis flatten the curve?
● How accurate can reporting be in lower-income communities if they have less
access to medical care and are less likely to be tested for the novel Coronavirus?
● Does income and population density correlate to symptoms felt by COVID-19
positive patients, either symptomatic or asymptomatic?
GP.G: Bivariate Graphs (10 points) GP.G: Bivariate Graphs (10 points) Instructions 1. Construct a graph that shows the association between your explanatory/independent and out- come/dependent variables (bivariate graph). 2. Write a few sentences describing what your graphs reveal. • How do you think the variables are related, if at all? • How does this correspond with your predictions? • Does the graph reveal anything unexpected or interesting about your relationship of interest? 3. Construct a 2nd graph that shows the association between another explanatory variable and your dependent variable. Again, write a few sentences describing what your graph reveals in terms of the relationships among the variables. 4. Submit your assignment as a PDF document on Canvas. 1 Instructions GP.F: Univariate Graphs (10 points) Instructions If you need a reminder about graphing one variable at a time in R, watch this video. There are a variety of conventional ways to visualize data—tables, histograms, bar graphs, etc. Now that your data have been managed, it is time to graph your variables one at a time and examine both center and spread. Include univariate graphs of your two main constructs. These main constructs, or variables, should be managed (from GP.D). ● Each graph should contain clearly labeled x-axes, y-axes, and legends, if applicable. Write a few sentences describing what your graphs reveal in terms of shape, spread, and center (if variable is quantitative) and most/least frequent categories if variable is categorical. We weren’t sure if we were supposed to include the code or not so here it is including the code. Steps we took: 1. Loaded: library(DescTools) library(ggplot2) COVID_data <- read.csv("covid_data.csv",="" header="TRUE)" 2.="" selected="" our="" variables="" from="" the="" data:="" covid_data="">-><- subset(covid,="" select="c("PHYS1A"," "phys1b",="" "phys1c",="" "phys1d",="" "phys1f"))="" 3.="" changed="" the="" data="" to="" be="" numerical="" form="" because="" not="" all="" of="" it="" was="" using="" the="" ifelse()="" function:="" covid_data$fever="">-><- ifelse(covid_data$phys1a="="(2)" no",2,ifelse(covid_data$phys1a="="(1)" yes",1,ifelse(covid_data$phys1a="="(77)" not="" sure",77,98)))="" covid_data$chills="">-><- ifelse(covid_data$phys1b="="(2)" no",2,ifelse(covid_data$phys1b="="(1)" yes",1,ifelse(covid_data$phys1b="="(77)" not="" sure",77,98)))="" https://youtu.be/mwlyxhfcpde="" https://youtu.be/mwlyxhfcpde="" covid_data$runny_nose="">-><- ifelse(covid_data$phys1c="="(2)" no",2,ifelse(covid_data$phys1c="="(1)" yes",1,ifelse(covid_data$phys1c="="(77)" not="" sure",77,98)))="" covid_data$congestion="">-><- ifelse(covid_data$phys1d="="(2)" no",2,ifelse(covid_data$phys1d="="(1)" yes",1,ifelse(covid_data$phys1d="="(77)" not="" sure",77,98)))="" covid_data$cough="">-><- ifelse(covid_data$phys1f="="(2)" no",2,ifelse(covid_data$phys1f="="(1)" yes",1,ifelse(covid_data$phys1f="="(77)" not="" sure",77,98)))="" 4.="" removed="" the="" unnecessary="" data="" from="" the="" columns:="" covid_data$phys1a="">-><- ifelse(covid_data$phys1a="="(2)" no","no",ifelse(covid_data$phys1a="="(1)" yes","yes",ifelse(covid_data$phys1a="="(77)" not="" sure","not="" sure","skipped="" on="" web")))="" covid_data$phys1b="">-><- ifelse(covid_data$phys1b="="(2)" no","no",ifelse(covid_data$phys1b="="(1)" yes","yes",ifelse(covid_data$phys1b="="(77)" not="" sure","not="" sure","skipped="" on="" web")))="" covid_data$phys1c="">-><- ifelse(covid_data$phys1c="="(2)" no","no",ifelse(covid_data$phys1c="="(1)" covid_data$phys1d="">-><- ifelse(covid_data$phys1d="="(2)" no","no",ifelse(covid_data$phys1d="="(1)" covid_data$phys1f="">-><- ifelse(covid_data$phys1f=="(2) no","no",ifelse(covid_data$phys1f=="(1) 5. fever variable data, frequency table and bar plot (phys1a) #fever data and freq table freq(covid_data$phys1a) #bar plot for fever ggplot(covid_data, aes(x = phys1a, fill=phys1a)) + geom_bar() 12.2% of patients experienced a fever. 6. chills variable data, frequency table and bar plot (phys1b) #frequency table for the chills variable freq(covid_data$phys1b) #barplot for chills variable ggplot(covid_data, aes(x = phys1b, fill=phys1b)) + geom_bar() from the above frequency table and barplot only 11.9% of the people only having chills and majority of the patients are not having the chills 7. runny nose # frequency table for the variable runny_nose freq(covid_data$phys1c) # barplot for runny_nose ggplot(covid_data, aes(x = phys1c, fill=phys1c)) + geom_bar() from the above frequency table and barplot only 12.1% of the people only have runny_nose and majority of the patients do not have a runny nose. ifelse(covid_data$phys1f="="(2)" no","no",ifelse(covid_data$phys1f="="(1)" 5.="" fever="" variable="" data,="" frequency="" table="" and="" bar="" plot="" (phys1a)="" #fever="" data="" and="" freq="" table="" freq(covid_data$phys1a)="" #bar="" plot="" for="" fever="" ggplot(covid_data,="" aes(x="PHYS1A," fill="PHYS1A))" +="" geom_bar()="" 12.2%="" of="" patients="" experienced="" a="" fever.="" 6.="" chills="" variable="" data,="" frequency="" table="" and="" bar="" plot="" (phys1b)="" #frequency="" table="" for="" the="" chills="" variable="" freq(covid_data$phys1b)="" #barplot="" for="" chills="" variable="" ggplot(covid_data,="" aes(x="PHYS1B," fill="PHYS1B))" +="" geom_bar()="" from="" the="" above="" frequency="" table="" and="" barplot="" only="" 11.9%="" of="" the="" people="" only="" having="" chills="" and="" majority="" of="" the="" patients="" are="" not="" having="" the="" chills="" 7.="" runny="" nose="" #="" frequency="" table="" for="" the="" variable="" runny_nose="" freq(covid_data$phys1c)="" #="" barplot="" for="" runny_nose="" ggplot(covid_data,="" aes(x="PHYS1C," fill="PHYS1C))" +="" geom_bar()="" from="" the="" above="" frequency="" table="" and="" barplot="" only="" 12.1%="" of="" the="" people="" only="" have="" runny_nose="" and="" majority="" of="" the="" patients="" do="" not="" have="" a="" runny="">- ifelse(covid_data$phys1f=="(2) no","no",ifelse(covid_data$phys1f=="(1) 5. fever variable data, frequency table and bar plot (phys1a) #fever data and freq table freq(covid_data$phys1a) #bar plot for fever ggplot(covid_data, aes(x = phys1a, fill=phys1a)) + geom_bar() 12.2% of patients experienced a fever. 6. chills variable data, frequency table and bar plot (phys1b) #frequency table for the chills variable freq(covid_data$phys1b) #barplot for chills variable ggplot(covid_data, aes(x = phys1b, fill=phys1b)) + geom_bar() from the above frequency table and barplot only 11.9% of the people only having chills and majority of the patients are not having the chills 7. runny nose # frequency table for the variable runny_nose freq(covid_data$phys1c) # barplot for runny_nose ggplot(covid_data, aes(x = phys1c, fill=phys1c)) + geom_bar() from the above frequency table and barplot only 12.1% of the people only have runny_nose and majority of the patients do not have a runny nose.>