I need help with r programming
This module: Last module we explored data visualization in R using the ggplot2 package. This module continues to use ggplot, together with a companion package called ggmap. This package enhances the capabilities of ggplot by adding the capability to draw geographic outlines (polygons), shading, labeling, and other map markings. In addition, we will merge datasets using the built-in merge( ) function, which provides a similar capability to a JOIN in SQL. Many analytical strategies require joining data from different sources based on a “key” – a field that two datasets have in common. Please make sure you have included an attribution statement (see syllabus if you have questions). Step 1: Load the population data A. The following lines of code will help you read a json file into an R dataframe. Examine the resulting pop dataframe with View() and add comments explaining what each column contains. library(jsonlite) url=”https://ist387.s3.us-east-2.amazonaws.com/data/cities.json” pop <- jsonlite::fromjson(url)="" b.="" calculate="" the="" average="" population="" in="" the="" dataframe.="" why="" is="" using="" mean()="" directly="" not="" working?="" find="" a="" way="" to="" correct="" the="" data="" type="" of="" this="" variable="" so="" you="" can="" calculate="" the="" average.="" c.="" what="" is="" the="" population="" of="" the="" smallest="" city="" in="" the="" dataframe?="" which="" state="" is="" it="" in?="" step="" 2:="" merge="" the="" population="" data="" with="" the="" state="" name="" data="" d)="" read="" in="" the="" state="" name="" .csv="" file="" from="" the="" url="" below="" into="" a="" dataframe="" named="" abbr="" –="" make="" sure="" to="" use="" the="" read_csv()="" function="" from="" the="" tidyverse="" package:="" https://ist387.s3.us-east-2.amazonaws.com/data/states.csv="" e)="" to="" successfully="" merge="" the="" dataframe="" pop="" with="" the="" abbr="" dataframe,="" we="" need="" to="" identify="" a="" column="" they="" have="" in="" common="" which="" will="" serve="" as="" the="" “key”="" to="" merge="" on.="" one="" column="" both="" dataframes="" have="" is="" the="" state="" column.="" the="" only="" problem="" is="" the="" slight="" column="" name="" discrepancy="" –="" in="" pop,="" the="" column="" is="" called="" “state”="" and="" in="" abbr="" –="" “state.”="" these="" names="" need="" to="" be="" reconciled="" for="" the="" merge()="" function="" to="" work.="" find="" a="" way="" to="" rename="" abbr’s="" “state”="" to="" match="" the="" state="" column="" in="" pop.="" f)="" merge="" the="" two="" dataframes="" (using="" the="" ‘state’="" attribute="" from="" both="" dataframes),="" storing="" the="" resulting="" dataframe="" in="" dfnew.="" g)="" review="" the="" structure="" of="" dfnew="" and="" explain="" the="" attributes="" in="" that="" dataframe.="" step="" 3:="" visualize="" the="" data="" h)="" plot="" points="" (on="" top="" of="" a="" map="" of="" the="" us)="" for="" each="" city="" (don’t="" forget="" to="" library="" ggplot2="" and="" ggmap).="" have="" the="" color="" represent="" the="" population.="" i)="" add="" a="" block="" comment="" that="" criticizes="" the="" resulting="" map.="" it’s="" not="" very="" good.="" step="" 4:="" use="" aggregate()="" to="" make="" a="" dataframe="" of="" state-by-state="" population="" run="" the="" following="" lines="" of="" code="" to="" create="" a="" new="" data="" frame:="" dfsimple="aggregate(dfNew$population," by="list(dfNew$state_name)," fun="sum)" dfsimple$name="">-><- dfsimple$group.1="" dfsimple$group.1="">-><- null="" dfsimple$statepop="">-><- dfsimple$x="" dfsimple$x="">-><- null="" j)="" add="" a="" comment="" describing="" what="" each="" line="" of="" code="" does.="" make="" sure="" to="" describe="" how="" many="" rows="" there="" are="" in="" dfsimple="" (and="" why="" there="" are="" that="" many="" rows).="" k)="" name="" the="" most="" and="" least="" populous="" states="" in="" dfsimple="" and="" show="" the="" code="" you="" used="" to="" determine="" them.="" step="" 5:="" use="" ggplot="" and="" ggmap="" to="" shade="" a="" map="" of="" the="" u.s.="" with="" state="" population="" l)="" copy="" the="" ggplot="" code="" from="" step="" 3.="" in="" the="" initial="" ggplot="" statement,="" you="" will="" need="" to="" use="" your="" new="" dataframe,="" so="" substitute="" dfsimple="" in="" place="" of="" dfnew.="" additionally,="" instead="" of="" using="" geom_point="" to="" plot="" points,="" use="" this="" aesthetic="" to="" fill="" the="" polygons="" with="" a="" color="" for="" each="" state.="" make="" sure="" to="" expand="" the="" limits="" correctly="" and="" that="" you="" have="" used="" coord_map="" appropriately.="" the="" chapter="" on="" linear="" models="" (“lining="" up="" our="" models”)="" introduces="" linear="" predictive="" modeling="" using="" the="" tool="" known="" as="" multiple="" regression.="" the="" term="" “multiple="" regression”="" has="" an="" odd="" history,="" dating="" back="" to="" an="" early="" scientific="" observation="" of="" a="" phenomenon="" called="" “regression="" to="" the="" mean.”="" these="" days,="" multiple="" regression="" is="" just="" an="" interesting="" name="" for="" using="" linear="" modeling="" to="" assess="" the="" connection="" between="" one="" or="" more="" predictor="" variables="" and="" an="" outcome="" variable.="" in="" this="" exercise,="" you="" will="" predict="" ozone="" air="" levels="" from="" three="" predictors.="" please="" make="" sure="" you="" have="" included="" an="" attribution="" statement="" (see="" syllabus="" if="" you="" have="" questions).="" a.="" we="" will="" be="" using="" the="" airquality="" data="" set="" available="" in="" r.="" copy="" it="" into="" a="" dataframe="" called="" air="" and="" use="" the="" appropriate="" functions="" to="" summarize="" the="" data.="" b.="" in="" the="" analysis="" that="" follows,="" ozone="" will="" be="" considered="" as="" the="" outcome="" variable,="" and="" solar.r,="" wind,="" and="" temp="" as="" the="" predictors.="" add="" a="" comment="" to="" briefly="" explain="" the="" outcome="" and="" predictor="" variables="" in="" the="" dataframe="" using="" airquality.="" c.="" inspect="" the="" outcome="" and="" predictor="" variables="" –="" are="" there="" any="" missing="" values?="" show="" the="" code="" you="" used="" to="" check="" for="" that.="" d.="" use="" the="" na_interpolation()="" function="" from="" the="" imputets="" package="" from="" hw="" 6="" to="" fill="" in="" the="" missing="" values="" in="" each="" of="" the="" 4="" columns.="" make="" sure="" there="" are="" no="" more="" missing="" values="" using="" the="" commands="" from="" step="" c.="" e.="" create="" 3="" bivariate="" scatterplots="" (x-y)="" plots="" for="" each="" of="" the="" predictors="" with="" the="" outcome.="" hint:="" in="" each="" case,="" put="" ozone="" on="" the="" y-axis,="" and="" a="" predictor="" on="" the="" x-axis.="" add="" a="" comment="" to="" each,="" describing="" the="" plot="" and="" explaining="" whether="" there="" appears="" to="" be="" a="" linear="" relationship="" between="" the="" outcome="" variable="" and="" the="" respective="" predictor.="" f.="" next,="" create="" a="" simple="" regression="" model="" predicting="" ozone="" based="" on="" wind.="" refer="" to="" page="" 202="" in="" the="" text="" for="" syntax="" and="" explanations="" of="" the="" lm(="" )="" command.="" in="" a="" comment,="" report="" the="" coefficient="" (aka="" slope="" or="" beta="" weight)="" of="" wind="" in="" the="" regression="" output="" and,="" if="" it="" is="" statistically="" significant,="" interpret="" it="" with="" respect="" to="" ozone.="" report="" the="" adjusted="" r-squared="" of="" the="" model="" and="" try="" to="" explain="" what="" it="" means.="" g.="" create="" a="" multiple="" regression="" model="" predicting="" ozone="" based="" on="" solar.r,="" wind,="" and="" temp.="" make="" sure="" to="" include="" all="" three="" predictors="" in="" one="" model="" –="" not="" three="" different="" models="" each="" with="" one="" predictor.="" h.="" report="" the="" adjusted="" r-squared="" in="" a="" comment="" –="" how="" does="" it="" compare="" to="" the="" adjusted="" r-squared="" from="" step="" f?="" is="" this="" better="" or="" worse?="" which="" of="" the="" predictors="" are="" statistically="" significant="" in="" the="" model?="" in="" a="" comment,="" report="" the="" coefficient="" of="" each="" predictor="" that="" is="" statistically="" significant.="" do="" not="" report="" the="" coefficients="" for="" predictors="" that="" are="" not="" significant.="" i.="" create="" a="" one-row="" data="" frame="" like="" this:="" preddf="">-><- data.frame(solar.r="290," wind="13," temp="61)" and="" use="" it="" with="" the="" predict(="" )="" function="" to="" predict="" the="" expected="" value="" of="" ozone.="" j.="" create="" an="" additional="" multiple="" regression="" model,="" with="" temp="" as="" the="" outcome="" variable,="" and="" the="" other="" 3="" variables="" as="" the="" predictors.="" review="" the="" quality="" of="" the="" model="" by="" commenting="" on="" its="" adjusted="" r-squared.="" association="" mining="" can="" be="" applied="" to="" many="" data="" problems="" beyond="" the="" well-known="" example="" of="" finding="" relationships="" between="" different="" products="" in="" customer="" shopping="" data.="" in="" this="" homework="" assignment,="" we="" will="" explore="" real="" data="" from="" the="" banking="" sector="" and="" look="" for="" patterns="" associated="" with="" the="" likelihood="" of="" responding="" positively="" to="" a="" direct="" marketing="" campaign="" and="" signing="" up="" for="" a="" term="" deposit="" with="" the="" bank="" (stored="" in="" the="" variable="" “y”).="" you="" can="" find="" out="" more="" about="" the="" variables="" in="" this="" dataset="" here:="" https://archive.ics.uci.edu/ml/datasets/bank+marketing="" please="" make="" sure="" you="" have="" included="" an="" attribution="" statement="" (see="" syllabus="" if="" you="" have="" questions).="" part="" 1:="" explore="" data="" set="" a.="" copy="" the="" contents="" of="" the="" following="" url="" to="" a="" dataframe="" called="" bank:="" https://ist387.s3.us-east-2.amazonaws.com/data/bank-full.csv="" hint:="" even="" though="" this="" is="" a="" .csv="" file,="" chances="" are="" r="" won’t="" be="" able="" to="" read="" it="" in="" correctly="" using="" the="" read_csv()="" function.="" if="" you="" take="" a="" closer="" look="" at="" the="" contents="" of="" the="" url="" file,="" you="" may="" notice="" each="" field="" is="" separated="" by="" a="" semicolon="" (;)="" rather="" than="" a="" comma.="" in="" situations="" like="" this,="" consider="" using="" something="" like="" this:="" bank="">-><- read.table(url,="" sep=";" ,="" header="TRUE)" make="" sure="" there="" are="" 41,188="" rows="" and="" 21="" columns="" in="" your="" bank="" df.="" b.="" next,="" we="" will="" focus="" on="" some="" key="" factor="" variables="" from="" the="" dataset,="" and="" convert="" a="" few="" numeric="" ones="" to="" factor="" variables.="" execute="" the="" following="" commands="" and="" write="" a="" comment="" describing="" how="" the="" conversion="" for="" each="" numeric="" variable="" works="" and="" what="" the="" variables="" in="" the="" resulting="" dataframe="" are.="" bank_new="">-><- data.frame(job="bank$job," marital="bank$marital," housing_loan="bank$housing," young="">->
1)), contacted_before_this_campaign=as.factor((bank$previous<0)), success=(bank$y)) c. count the number of successful term deposit sign-ups, using the table( ) command on the success variable. d. express the results of problem c as percentages by sending the results of the table( ) command into the prop.table( ) command e. using the same techniques, show the percentages for the marital and housing_loan variables as well. part 2: coerce the data frame into transactions f. install and library two packages: arules and arulesviz. g. coerce the bank_new data frame into a sparse transactions matrix called bankx. h. use the itemfrequency( ) and itemfrequencyplot( ) commands to explore the contents of bankx. what do you see? i. this is a fairly large dataset, so we will explore only the first 10 observations in the bankx transaction matrix: inspect(bankx[1:10]) explain the difference between bank_new and bankx in a block comment. part 3: use arules to discover patterns support is the proportion of times that a particular set of items occurs relative to the whole dataset. confidence is proportion of times that the consequent occurs when the antecedent is present.. j. use apriori to generate a set of rules with support over 0.005 and confidence over 0.3, and trying to predict who successfully signed up for a term deposit. hint: you need to define the right-hand side rule (rhs). k. use inspect()to review of the ruleset. l. use the output of inspect( ) or inspectdt( ) and describe any 2 rules the algorithm found. success="(bank$y))" c.="" count="" the="" number="" of="" successful="" term="" deposit="" sign-ups,="" using="" the="" table(="" )="" command="" on="" the="" success="" variable.="" d.="" express="" the="" results="" of="" problem="" c="" as="" percentages="" by="" sending="" the="" results="" of="" the="" table(="" )="" command="" into="" the="" prop.table(="" )="" command="" e.="" using="" the="" same="" techniques,="" show="" the="" percentages="" for="" the="" marital="" and="" housing_loan="" variables="" as="" well.="" part="" 2:="" coerce="" the="" data="" frame="" into="" transactions="" f.="" install="" and="" library="" two="" packages:="" arules="" and="" arulesviz.="" g.="" coerce="" the="" bank_new="" data="" frame="" into="" a="" sparse="" transactions="" matrix="" called="" bankx.="" h.="" use="" the="" itemfrequency(="" )="" and="" itemfrequencyplot(="" )="" commands="" to="" explore="" the="" contents="" of="" bankx.="" what="" do="" you="" see?="" i.="" this="" is="" a="" fairly="" large="" dataset,="" so="" we="" will="" explore="" only="" the="" first="" 10="" observations="" in="" the="" bankx="" transaction="" matrix:="" inspect(bankx[1:10])="" explain="" the="" difference="" between="" bank_new="" and="" bankx="" in="" a="" block="" comment.="" part="" 3:="" use="" arules="" to="" discover="" patterns="" support="" is="" the="" proportion="" of="" times="" that="" a="" particular="" set="" of="" items="" occurs="" relative="" to="" the="" whole="" dataset.="" confidence="" is="" proportion="" of="" times="" that="" the="" consequent="" occurs="" when="" the="" antecedent="" is="" present..="" j.="" use="" apriori="" to="" generate="" a="" set="" of="" rules="" with="" support="" over="" 0.005="" and="" confidence="" over="" 0.3,="" and="" trying="" to="" predict="" who="" successfully="" signed="" up="" for="" a="" term="" deposit.="" hint:="" you="" need="" to="" define="" the="" right-hand="" side="" rule="" (rhs).="" k.="" use="" inspect()to="" review="" of="" the="" ruleset.="" l.="" use="" the="" output="" of="" inspect(="" )="" or="" inspectdt(="" )="" and="" describe="" any="" 2="" rules="" the="" algorithm="">0)), success=(bank$y)) c. count the number of successful term deposit sign-ups, using the table( ) command on the success variable. d. express the results of problem c as percentages by sending the results of the table( ) command into the prop.table( ) command e. using the same techniques, show the percentages for the marital and housing_loan variables as well. part 2: coerce the data frame into transactions f. install and library two packages: arules and arulesviz. g. coerce the bank_new data frame into a sparse transactions matrix called bankx. h. use the itemfrequency( ) and itemfrequencyplot( ) commands to explore the contents of bankx. what do you see? i. this is a fairly large dataset, so we will explore only the first 10 observations in the bankx transaction matrix: inspect(bankx[1:10]) explain the difference between bank_new and bankx in a block comment. part 3: use arules to discover patterns support is the proportion of times that a particular set of items occurs relative to the whole dataset. confidence is proportion of times that the consequent occurs when the antecedent is present.. j. use apriori to generate a set of rules with support over 0.005 and confidence over 0.3, and trying to predict who successfully signed up for a term deposit. hint: you need to define the right-hand side rule (rhs). k. use inspect()to review of the ruleset. l. use the output of inspect( ) or inspectdt( ) and describe any 2 rules the algorithm found.>