This module: Last module we explored data visualization in R using the ggplot2 package. This module continues to use ggplot, together with a companion package called ggmap. This package enhances the...

1 answer below »
I need help with r programming


This module: Last module we explored data visualization in R using the ggplot2 package. This module continues to use ggplot, together with a companion package called ggmap. This package enhances the capabilities of ggplot by adding the capability to draw geographic outlines (polygons), shading, labeling, and other map markings. In addition, we will merge datasets using the built-in merge( ) function, which provides a similar capability to a JOIN in SQL. Many analytical strategies require joining data from different sources based on a “key” – a field that two datasets have in common. Please make sure you have included an attribution statement (see syllabus if you have questions). Step 1: Load the population data A. The following lines of code will help you read a json file into an R dataframe. Examine the resulting pop dataframe with View() and add comments explaining what each column contains. library(jsonlite) url=”https://ist387.s3.us-east-2.amazonaws.com/data/cities.json” pop <- jsonlite::fromjson(url)="" b.="" calculate="" the="" average="" population="" in="" the="" dataframe.="" why="" is="" using="" mean()="" directly="" not="" working?="" find="" a="" way="" to="" correct="" the="" data="" type="" of="" this="" variable="" so="" you="" can="" calculate="" the="" average.="" c.="" what="" is="" the="" population="" of="" the="" smallest="" city="" in="" the="" dataframe?="" which="" state="" is="" it="" in?="" step="" 2:="" merge="" the="" population="" data="" with="" the="" state="" name="" data="" d)="" read="" in="" the="" state="" name="" .csv="" file="" from="" the="" url="" below="" into="" a="" dataframe="" named="" abbr="" –="" make="" sure="" to="" use="" the="" read_csv()="" function="" from="" the="" tidyverse="" package:="" https://ist387.s3.us-east-2.amazonaws.com/data/states.csv="" e)="" to="" successfully="" merge="" the="" dataframe="" pop="" with="" the="" abbr="" dataframe,="" we="" need="" to="" identify="" a="" column="" they="" have="" in="" common="" which="" will="" serve="" as="" the="" “key”="" to="" merge="" on.="" one="" column="" both="" dataframes="" have="" is="" the="" state="" column.="" the="" only="" problem="" is="" the="" slight="" column="" name="" discrepancy="" –="" in="" pop,="" the="" column="" is="" called="" “state”="" and="" in="" abbr="" –="" “state.”="" these="" names="" need="" to="" be="" reconciled="" for="" the="" merge()="" function="" to="" work.="" find="" a="" way="" to="" rename="" abbr’s="" “state”="" to="" match="" the="" state="" column="" in="" pop.="" f)="" merge="" the="" two="" dataframes="" (using="" the="" ‘state’="" attribute="" from="" both="" dataframes),="" storing="" the="" resulting="" dataframe="" in="" dfnew.="" g)="" review="" the="" structure="" of="" dfnew="" and="" explain="" the="" attributes="" in="" that="" dataframe.="" step="" 3:="" visualize="" the="" data="" h)="" plot="" points="" (on="" top="" of="" a="" map="" of="" the="" us)="" for="" each="" city="" (don’t="" forget="" to="" library="" ggplot2="" and="" ggmap).="" have="" the="" color="" represent="" the="" population.="" i)="" add="" a="" block="" comment="" that="" criticizes="" the="" resulting="" map.="" it’s="" not="" very="" good.="" step="" 4:="" use="" aggregate()="" to="" make="" a="" dataframe="" of="" state-by-state="" population="" run="" the="" following="" lines="" of="" code="" to="" create="" a="" new="" data="" frame:="" dfsimple="aggregate(dfNew$population," by="list(dfNew$state_name)," fun="sum)" dfsimple$name=""><- dfsimple$group.1="" dfsimple$group.1=""><- null="" dfsimple$statepop=""><- dfsimple$x="" dfsimple$x=""><- null="" j)="" add="" a="" comment="" describing="" what="" each="" line="" of="" code="" does.="" make="" sure="" to="" describe="" how="" many="" rows="" there="" are="" in="" dfsimple="" (and="" why="" there="" are="" that="" many="" rows).="" k)="" name="" the="" most="" and="" least="" populous="" states="" in="" dfsimple="" and="" show="" the="" code="" you="" used="" to="" determine="" them.="" step="" 5:="" use="" ggplot="" and="" ggmap="" to="" shade="" a="" map="" of="" the="" u.s.="" with="" state="" population="" l)="" copy="" the="" ggplot="" code="" from="" step="" 3.="" in="" the="" initial="" ggplot="" statement,="" you="" will="" need="" to="" use="" your="" new="" dataframe,="" so="" substitute="" dfsimple="" in="" place="" of="" dfnew.="" additionally,="" instead="" of="" using="" geom_point="" to="" plot="" points,="" use="" this="" aesthetic="" to="" fill="" the="" polygons="" with="" a="" color="" for="" each="" state.="" make="" sure="" to="" expand="" the="" limits="" correctly="" and="" that="" you="" have="" used="" coord_map="" appropriately.="" the="" chapter="" on="" linear="" models="" (“lining="" up="" our="" models”)="" introduces="" linear="" predictive="" modeling="" using="" the="" tool="" known="" as="" multiple="" regression.="" the="" term="" “multiple="" regression”="" has="" an="" odd="" history,="" dating="" back="" to="" an="" early="" scientific="" observation="" of="" a="" phenomenon="" called="" “regression="" to="" the="" mean.”="" these="" days,="" multiple="" regression="" is="" just="" an="" interesting="" name="" for="" using="" linear="" modeling="" to="" assess="" the="" connection="" between="" one="" or="" more="" predictor="" variables="" and="" an="" outcome="" variable.="" in="" this="" exercise,="" you="" will="" predict="" ozone="" air="" levels="" from="" three="" predictors.="" please="" make="" sure="" you="" have="" included="" an="" attribution="" statement="" (see="" syllabus="" if="" you="" have="" questions).="" a.="" we="" will="" be="" using="" the="" airquality="" data="" set="" available="" in="" r.="" copy="" it="" into="" a="" dataframe="" called="" air="" and="" use="" the="" appropriate="" functions="" to="" summarize="" the="" data.="" b.="" in="" the="" analysis="" that="" follows,="" ozone="" will="" be="" considered="" as="" the="" outcome="" variable,="" and="" solar.r,="" wind,="" and="" temp="" as="" the="" predictors.="" add="" a="" comment="" to="" briefly="" explain="" the="" outcome="" and="" predictor="" variables="" in="" the="" dataframe="" using="" airquality.="" c.="" inspect="" the="" outcome="" and="" predictor="" variables="" –="" are="" there="" any="" missing="" values?="" show="" the="" code="" you="" used="" to="" check="" for="" that.="" d.="" use="" the="" na_interpolation()="" function="" from="" the="" imputets="" package="" from="" hw="" 6="" to="" fill="" in="" the="" missing="" values="" in="" each="" of="" the="" 4="" columns.="" make="" sure="" there="" are="" no="" more="" missing="" values="" using="" the="" commands="" from="" step="" c.="" e.="" create="" 3="" bivariate="" scatterplots="" (x-y)="" plots="" for="" each="" of="" the="" predictors="" with="" the="" outcome.="" hint:="" in="" each="" case,="" put="" ozone="" on="" the="" y-axis,="" and="" a="" predictor="" on="" the="" x-axis.="" add="" a="" comment="" to="" each,="" describing="" the="" plot="" and="" explaining="" whether="" there="" appears="" to="" be="" a="" linear="" relationship="" between="" the="" outcome="" variable="" and="" the="" respective="" predictor.="" f.="" next,="" create="" a="" simple="" regression="" model="" predicting="" ozone="" based="" on="" wind.="" refer="" to="" page="" 202="" in="" the="" text="" for="" syntax="" and="" explanations="" of="" the="" lm(="" )="" command.="" in="" a="" comment,="" report="" the="" coefficient="" (aka="" slope="" or="" beta="" weight)="" of="" wind="" in="" the="" regression="" output="" and,="" if="" it="" is="" statistically="" significant,="" interpret="" it="" with="" respect="" to="" ozone.="" report="" the="" adjusted="" r-squared="" of="" the="" model="" and="" try="" to="" explain="" what="" it="" means.="" g.="" create="" a="" multiple="" regression="" model="" predicting="" ozone="" based="" on="" solar.r,="" wind,="" and="" temp.="" make="" sure="" to="" include="" all="" three="" predictors="" in="" one="" model="" –="" not="" three="" different="" models="" each="" with="" one="" predictor.="" h.="" report="" the="" adjusted="" r-squared="" in="" a="" comment="" –="" how="" does="" it="" compare="" to="" the="" adjusted="" r-squared="" from="" step="" f?="" is="" this="" better="" or="" worse?="" which="" of="" the="" predictors="" are="" statistically="" significant="" in="" the="" model?="" in="" a="" comment,="" report="" the="" coefficient="" of="" each="" predictor="" that="" is="" statistically="" significant.="" do="" not="" report="" the="" coefficients="" for="" predictors="" that="" are="" not="" significant.="" i.="" create="" a="" one-row="" data="" frame="" like="" this:="" preddf=""><- data.frame(solar.r="290," wind="13," temp="61)" and="" use="" it="" with="" the="" predict(="" )="" function="" to="" predict="" the="" expected="" value="" of="" ozone.="" j.="" create="" an="" additional="" multiple="" regression="" model,="" with="" temp="" as="" the="" outcome="" variable,="" and="" the="" other="" 3="" variables="" as="" the="" predictors.="" review="" the="" quality="" of="" the="" model="" by="" commenting="" on="" its="" adjusted="" r-squared.="" association="" mining="" can="" be="" applied="" to="" many="" data="" problems="" beyond="" the="" well-known="" example="" of="" finding="" relationships="" between="" different="" products="" in="" customer="" shopping="" data.="" in="" this="" homework="" assignment,="" we="" will="" explore="" real="" data="" from="" the="" banking="" sector="" and="" look="" for="" patterns="" associated="" with="" the="" likelihood="" of="" responding="" positively="" to="" a="" direct="" marketing="" campaign="" and="" signing="" up="" for="" a="" term="" deposit="" with="" the="" bank="" (stored="" in="" the="" variable="" “y”).="" you="" can="" find="" out="" more="" about="" the="" variables="" in="" this="" dataset="" here:="" https://archive.ics.uci.edu/ml/datasets/bank+marketing="" please="" make="" sure="" you="" have="" included="" an="" attribution="" statement="" (see="" syllabus="" if="" you="" have="" questions).="" part="" 1:="" explore="" data="" set="" a.="" copy="" the="" contents="" of="" the="" following="" url="" to="" a="" dataframe="" called="" bank:="" https://ist387.s3.us-east-2.amazonaws.com/data/bank-full.csv="" hint:="" even="" though="" this="" is="" a="" .csv="" file,="" chances="" are="" r="" won’t="" be="" able="" to="" read="" it="" in="" correctly="" using="" the="" read_csv()="" function.="" if="" you="" take="" a="" closer="" look="" at="" the="" contents="" of="" the="" url="" file,="" you="" may="" notice="" each="" field="" is="" separated="" by="" a="" semicolon="" (;)="" rather="" than="" a="" comma.="" in="" situations="" like="" this,="" consider="" using="" something="" like="" this:="" bank=""><- read.table(url,="" sep=";" ,="" header="TRUE)" make="" sure="" there="" are="" 41,188="" rows="" and="" 21="" columns="" in="" your="" bank="" df.="" b.="" next,="" we="" will="" focus="" on="" some="" key="" factor="" variables="" from="" the="" dataset,="" and="" convert="" a="" few="" numeric="" ones="" to="" factor="" variables.="" execute="" the="" following="" commands="" and="" write="" a="" comment="" describing="" how="" the="" conversion="" for="" each="" numeric="" variable="" works="" and="" what="" the="" variables="" in="" the="" resulting="" dataframe="" are.="" bank_new=""><- data.frame(job="bank$job," marital="bank$marital," housing_loan="bank$housing," young="">1)), contacted_before_this_campaign=as.factor((bank$previous<0)), success=(bank$y)) c. count the number of successful term deposit sign-ups, using the table( ) command on the success variable. d. express the results of problem c as percentages by sending the results of the table( ) command into the prop.table( ) command e. using the same techniques, show the percentages for the marital and housing_loan variables as well. part 2: coerce the data frame into transactions f. install and library two packages: arules and arulesviz. g. coerce the bank_new data frame into a sparse transactions matrix called bankx. h. use the itemfrequency( ) and itemfrequencyplot( ) commands to explore the contents of bankx. what do you see? i. this is a fairly large dataset, so we will explore only the first 10 observations in the bankx transaction matrix: inspect(bankx[1:10]) explain the difference between bank_new and bankx in a block comment. part 3: use arules to discover patterns support is the proportion of times that a particular set of items occurs relative to the whole dataset. confidence is proportion of times that the consequent occurs when the antecedent is present.. j. use apriori to generate a set of rules with support over 0.005 and confidence over 0.3, and trying to predict who successfully signed up for a term deposit. hint: you need to define the right-hand side rule (rhs). k. use inspect()to review of the ruleset. l. use the output of inspect( ) or inspectdt( ) and describe any 2 rules the algorithm found. success="(bank$y))" c.="" count="" the="" number="" of="" successful="" term="" deposit="" sign-ups,="" using="" the="" table(="" )="" command="" on="" the="" success="" variable.="" d.="" express="" the="" results="" of="" problem="" c="" as="" percentages="" by="" sending="" the="" results="" of="" the="" table(="" )="" command="" into="" the="" prop.table(="" )="" command="" e.="" using="" the="" same="" techniques,="" show="" the="" percentages="" for="" the="" marital="" and="" housing_loan="" variables="" as="" well.="" part="" 2:="" coerce="" the="" data="" frame="" into="" transactions="" f.="" install="" and="" library="" two="" packages:="" arules="" and="" arulesviz.="" g.="" coerce="" the="" bank_new="" data="" frame="" into="" a="" sparse="" transactions="" matrix="" called="" bankx.="" h.="" use="" the="" itemfrequency(="" )="" and="" itemfrequencyplot(="" )="" commands="" to="" explore="" the="" contents="" of="" bankx.="" what="" do="" you="" see?="" i.="" this="" is="" a="" fairly="" large="" dataset,="" so="" we="" will="" explore="" only="" the="" first="" 10="" observations="" in="" the="" bankx="" transaction="" matrix:="" inspect(bankx[1:10])="" explain="" the="" difference="" between="" bank_new="" and="" bankx="" in="" a="" block="" comment.="" part="" 3:="" use="" arules="" to="" discover="" patterns="" support="" is="" the="" proportion="" of="" times="" that="" a="" particular="" set="" of="" items="" occurs="" relative="" to="" the="" whole="" dataset.="" confidence="" is="" proportion="" of="" times="" that="" the="" consequent="" occurs="" when="" the="" antecedent="" is="" present..="" j.="" use="" apriori="" to="" generate="" a="" set="" of="" rules="" with="" support="" over="" 0.005="" and="" confidence="" over="" 0.3,="" and="" trying="" to="" predict="" who="" successfully="" signed="" up="" for="" a="" term="" deposit.="" hint:="" you="" need="" to="" define="" the="" right-hand="" side="" rule="" (rhs).="" k.="" use="" inspect()to="" review="" of="" the="" ruleset.="" l.="" use="" the="" output="" of="" inspect(="" )="" or="" inspectdt(="" )="" and="" describe="" any="" 2="" rules="" the="" algorithm="">
Answered Same DayApr 20, 2021

Answer To: This module: Last module we explored data visualization in R using the ggplot2 package. This module...

Subhanbasha answered on Apr 21 2021
150 Votes
Assignment 7
Step 1:
A).
Here we have read the data into R y using the URL and the columns are described as follows
Here we have the first column that is cities and the se
cond column is population growth in between 2000 to 2013 of the corresponding city.
The next two attributes are longitude and latitude of the cities in US. 5th column is the population of corresponding city and the rank is the ranks given based on the population growth rate. The last attribute is the state of the city.
B).
Here we can’t directly found the mean of the population of the entire data because of the column population is in the data type character. In R the mean function won’t work for the character type of data. So, here we are converted the population column as numeric then only mean function will work in R.
C).
The lowest population in the data or in the US cities is 36877 and that is corresponding to Panama City which means that Panama City is having the lowest population among all the cities in US.
Step 2:
D).
Here we are used the read_csv() function in R to read the data into R.
E).
The states data having the name State but in the population data we have the name as state so we can’t make merge two data frames using state column because the names are slightly hanged. So here we renamed the column name as state by using the colnames() in R then it was merged.
F).
Merged the data frames and saved the new data frame as dfNew.
G).
The structure of dfNew is the data frame and the first column is states of the country, second column is city of the state, third attribute is the percentage of population growth in between 2000 to 2013 of the corresponding city, fourth and fifth attributes are longitude and latitude of the cities, sixth...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here