STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called...

1 answer below »
One is the HW4 rdm file another pdf. please see attach


STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called olive oils from the link http://www.ggobi.org/book/data/olive.xml. Please follow the directions in each step and provide your codes and output. a. Parse the xml data from the above link and store in a object called olive. Obtain the root of the xml file and display its name. b. Examine the actual file by going to the link above and identify the path of categorical variables in the xml tree. Use that path to obtain the categorical variable names. Please keep the names, not nick names and store them in cvNames. Display cvNames. c. Now examine the file by going to the link and identify the path of real variables in the xml tree. Use that path to obtain the real variable names. Please keep the names, not nick names and store them in rvNames. Display rvNames. d. Notice the path for the data in xml file. Use that path to obtain the data and store the data in a data frame called oliveDat. Change the column names as you have obtained the column names. Display some data. e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the column names are different fatty acids. The values are % of fatty acids found in the Italian olive oils coming from different regions and areas. f. Explain what these two lines of codes are doing. r <- xmlroot(olive)="" xmlsapply(r[[1]][[2]],="" xmlgetattr,="" "name")="" 2.="" working="" with="" date-time="" data;="" the="" object="" mydate="" contains="" the="" date="" and="" time="" when="" this="" question="" was="" provided="" to="" you.="" based="" on="" this="" object="" answer="" the="" following="" questions.="" mydate=""><- "2019-10-30="" 19:50:21"="" a.="" convert="" mydate="" into="" a="" date-time="" object="" with="" chicago="" time="" zone.="" display="" the="" result.="" b.="" write="" your="" codes="" so="" that="" it="" displays="" the="" week="" day="" of="" mydate.="" c.="" what="" weekday="" is="" it="" after="" exactly="" 100="" years="" from="" mydate?="" show="" your="" codes="" and="" the="" answer.="" d.="" add="" one="" month="" with="" mydate="" and="" display="" the="" resulting="" date="" time.="" explain="" why="" the="" time="" zone="" has="" changed="" even="" though="" you="" did="" not="" ask="" for="" time="" zone="" change.="" e.="" suppose="" this="" homework="" is="" due="" on="" november="" 8,="" 2019="" by="" 11.59pm.="" compute="" and="" display="" how="" many="" minutes="" you="" got="" to="" complete="" this="" homework?="" 3.="" data="" wrangling="" and="" dates="" in="" this="" problem,="" we="" will="" be="" using="" the="" mdsr="" and="" luhman="" packages.="" a.="" using="" the="" presidential="" dataset,="" show="" a="" simple="" table="" that="" displays="" the="" number="" of="" leap="" years="" that="" occured="" during="" each="" president’s="" time="" in="" office.="" please="" label="" the="" second="" “bush”="" as="" “bush2”.="" b.="" consider="" the="" teams="" dataset="" from="" the="" luhman="" package="" that="" provides="" a="" series="" of="" baseball="" statistics="" over="" a="" number="" of="" years.="" note="" that="" the="" “h”="" column="" refers="" to="" number="" of="" home="" runs.="" the="" following="" outlines="" a="" procedure="" to="" follow="" to="" determine="" the="" number="" of="" home="" runs="" that="" occurred="" during="" each="" presidents’="" (adjusted)="" time="" in="" office.="" 1="" http://www.ggobi.org/book/data/olive.xml="" i.="" first,="" filter="" the="" teams="" dataset="" to="" only="" include="" years="" between="" 1953="" and="" 2016.="" ii.="" next,="" we="" will="" partition="" the="" rows="" of="" the="" presidential="" dataset="" by="" only="" considering="" the="" year="" of="" each="" president’s="" start="" and="" end="" dates="" with="" the="" conditions="" that="" 1)="" if="" a="" president’s="" term="" did="" not="" start="" in="" january,="" then="" we="" will="" not="" include="" that="" year="" in="" their="" time="" in="" office,="" and="" 2)="" if="" a="" president’s="" term="" ended="" in="" january,="" then="" that="" ending="" year="" will="" also="" not="" be="" included.="" for="" example,="" johnson="" will="" be="" considered="" as="" having="" a="" starting="" year="" of="" 1964="" and="" an="" ending="" year="" of="" 1968.="" iii.="" answer="" the="" question:="" which="" president="" had="" the="" most="" number="" of="" home="" runs="" occur="" during="" their="" term?="" report="" this="" number.="" 4.="" creating="" html="" page;="" in="" this="" problem="" we="" would="" like="" to="" create="" a="" basic="" html="" page.="" please="" follow="" each="" of="" the="" steps="" below="" and="" finally="" submit="" your="" html="" file="" on="" canvas.="" please="" note="" that="" you="" don’t="" need="" to="" answer="" these="" questions="" here="" in="" the="" .rmd="" file.="" a.="" open="" a="" notepad="" or="" any="" plain="" text="" editor.="" write="" down="" some="" basic="" html="" codes="" as="" shown="" in="" online="" (year="" 2014)="" lecture="" 15,="" slide="" 6="" and="" modify="" according="" to="" the="" following="" questions.="" save="" the="" file="" as="" hw4.html="" and="" upload="" on="" canvas="" as="" a="" separate="" file.="" b.="" write="" “what="" is="" data="" science?”="" in="" the="" first="" header="">


c. Hw1 solution contains the answer of what is data science. The answer has three paragraphs. Write the three paragraphs of text about data science in three different paragraph tags

. You can copy the text from hw1 solution. d. Write “What we learnt from hw1” in second heading under tag


e. Copy all the points we learnt in hw1 solution. List all the points under ordered list tag
    . Notice that each item of the list should be inside list item tag
  1. . f. Now we want to make the text beautiful. For this we would write some CSS codes in between tag under . For this please refer to online (year 2014) lecture 15 slide 8. First change the fonts of the body tag to Helvetica Neue. g. For the paragraph that contains the definition of data science, give an attribute id='dfn' and in CSS change the color of ‘dfn’ to white, background-color to olive and font to be bold. h. For other paragraphs, give an attribute class='cls' and in CSS change the color of ‘cls’ to green. i. Write CSS so that color of h1,h2 becomes orange. j. Write javaScripts codes so that onClick on h1 header, it shows a message ‘Its about data science’. 5. Boston hubway data; This question will explore Boston hubway data. Please carefully answer each question below including your codes and results. a. Obtain the compressed data, bicycle-rents.csv.zip, from Canvas and display few data rows. b. For each day, count the number of bikes rented for that date and show the data in a time series plot. c. Based on the rent date column, create two new columns weekDay and hourDay which represent week day name and hour of the day respectively. Store the data in myDat and display few records of the data. Hint: For weekday use function wday(). d. Summarize myDat by weekDay based on the number of rents for each weekDay and store the data in weekDat. Display some data. e. Create a suitable plot of the data you stored in weekDay so that it displays number of bike rents for each week day. f. Now we want to investigate what happens in each day. Summarize myDat again but this time by weekDay and hourDay and obtain the number of rents. Store the data in hourDat and Display some data. g. The dataframe hourDat is now ready for plotting. Generate line plots showing number of bike rents vs hour of the day and colored by weekDay. 6. Bonus for undergraduate (3 points) mandatory for graduate students: The following link contains the complete texts of Romeo and Juliet written by Shakespeare. Read the complete text and generate a plot similar to Romeo and Juliet case study in online(year 2014) lecture 13 (last plot). http://shakespeare.mit.edu/romeo_juliet/full.html 2 http://shakespeare.mit.edu/romeo_juliet/full.html 7. Bonus (2 points) question for all : In the United States, a Consumer Expenditure Survey (CE) is conducted each year to collect data on expenditures, income, and demographics. These data are available as public-use microdata (PUMD) files in the following link. Download the data for the year 2016 and explore. Provide some plots and numerical summary that creates some interest about this data. https://www.bls.gov/cex/pumd.htm 3 https://www.bls.gov/cex/pumd.htm
    Answered Same DayNov 11, 2021

    Answer To: STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1....

    Kshitij answered on Nov 13 2021
    143 Votes
    STAT 4410/8416 Homework 4
    STAT 4410/8416 Homework 4
    lastName firstName
    Due on Nov 8, 2019
    1. Exploring XML data; In this problem we will read the xml data. For this we will obtain
    a xml data called olive oils from the link http://www.ggobi.org/book/data/olive.xml.
    Please follow the directions in each step and provide your codes and output.
    a. Parse the xml data from the above link and store in a object called olive. Obtain the
    root of the xml file and display its name.
    library("XML")
    library("xml2",lib.loc="~/R/win-library/3.4")
    library("dplyr")
    library("ggplot2")
    olive<
    -xmlParse("http://www.ggobi.org/book/data/olive.xml")

    root<-xmlRoot(olive)
    xmlName(root)
    ## [1] "ggobidata"
    olive <- read_xml("http://www.ggobi.org/book/data/olive.xml")
    b. Examine the actual file by going to the link above and identify the path of categorical
    variables in the xml tree. Use that path to obtain the categorical variable names. Please
    keep the names, not nick names and store them in cvNames. Display cvNames.
    categoricalPath<-"//ggobidata/data/variables/categoricalvariable"

    colsc <- xml_find_all(olive, categoricalPath)
    cvNames<-xml_attr(colsc,"name")
    cvNames
    ## [1] "region" "area"
    c. Now examine the file by going to the link and identify the path of real variables in the
    xml tree. Use that path to obtain the real variable names. Please keep the names, not
    nick names and store them in rvNames. Display rvNames.
    realPath<-"//ggobidata/data/variables/realvariable"
    colsr <- xml_find_all(olive, realPath)
    rvNames<-xml_attr(colsr,"name")
    rvNames
    ## [1] "palmitic" "palmitoleic" "stearic" "oleic" "linoleic"
    ## [6] "linolenic" "arachidic" "eicosenoic"
    http://www.ggobi.org/book/data/olive.xml
    d. Notice the path for the data in xml file. Use that path to obtain the data and store the
    data in a data frame called oliveDat. Change the column names as you have obtained
    the column names. Display some data.
    oliveDat <- xml_find_all(olive, "//record")
    values <- strsplit((trimws(xml_text(oliveDat))),"\ +")
    oliveDat<-lapply(values,function(x) {
    data.frame(rbind(setNames(as.numeric(x),c(
    cvNames,rvNames))))})
    oliveDat<-do.call(rbind,oliveDat)
    head(oliveDat)
    ## region area palmitic palmitoleic stearic oleic linoleic linolenic
    ## 1 1 1 1075 75 226 7823 672 NA
    ## 2 1 1 1088 73 224 7709 781 31
    ## 3 1 1 911 54 246 8113 549 31
    ## 4 1 1 966 57 240 7952 619 50
    ## 5 1 1 1051 67 259 7771 672 50
    ## 6 1 1 911 49 268 7924 678 51
    ## arachidic eicosenoic
    ## 1 60 29
    ## 2 61 29
    ## 3 63 29
    ## 4 78 35
    ## 5 80 46
    ## 6 70 44
    e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the
    column names are different fatty acids. The values are % of fatty acids found in the
    Italian olive oils coming from different regions and areas.
    rg1<-oliveDat[which(oliveDat[,2]==1),3:10]
    data1<-stack(summarise_all(rg1,mean,na.rm=TRUE))
    bp<- ggplot(data1, aes(x="", y=(data1$values/sum(data1$values)*100), fill=dat
    a1$ind))+
    geom_bar(width = 0.5, stat = "identity")+
    labs(y="percentage",fill="fats")+
    ggtitle("average percentage of various acids found in the Italian olive oil
    s coming from region 1")+
    theme(plot.title = element_text(hjust=0.25))
    bp
    rg2<-oliveDat[which(oliveDat[,2]==2),3:10]
    data2<-stack(summarise_all(rg2,mean,na.rm=TRUE))
    bp2<- ggplot(data2, aes(x="", y=(data2$values/sum(data2$values)*100), fill=da
    ta2$ind))+
    geom_bar(width = 0.5, stat = "identity")+
    labs(y="percentage",fill="fats")+
    ggtitle("average percentage of various acids found in the Italian olive oil
    s coming from region 2")+
    theme(plot.title = element_text(hjust=0.25))
    bp2
    rg3<-oliveDat[which(oliveDat[,2]==3),3:10]
    data3<-stack(summarise_all(rg3,mean,na.rm=TRUE))
    bp3<- ggplot(data3, aes(x="", y=(data3$values/sum(data3$values)*100), fill=da
    ta3$ind))+
    geom_bar(width = 0.5, stat = "identity")+
    labs(y="percentage",fill="fats")+
    ggtitle("average percentage of various acids found in the Italian olive oil
    s coming from region 3")+
    theme(plot.title = element_text(hjust=0.25))
    bp3
    f. Explain what these two lines of codes are doing.
    r <- xmlRoot(olive)
    xmlSApply(r[[1]][[2]], xmlGetAttr, "name")
    Answer: xmlRoot function finds the top level xml node of olive i.e. categorical variable and
    real variable names and passes it to r. the xmlSApply applies the given function to each of
    the children of the given xmlnode. r[[1][2]] is the node representing real variables and
    xmlGetAttr function extracts all the values for the "“name”" attribute from all it’s branches.
    "
    2. Working with date-time data; The object myDate contains the date and time when
    this question was provided to you. Based on this object answer the following
    questions.
    myDate <- "2019-10-30 19:50:21"
    a. Convert myDate into a date-time object with Chicago time zone. Display the result.
    library("lubridate")
    myDate<-ymd_hms(myDate,tz="America/Chicago")
    myDate
    ## [1] "2019-10-30 19:50:21 CDT"
    b. Write your codes so that it displays the week day of myDate.
    weekdays(myDate)
    ## [1] "Wednesday"
    wday(myDate)
    ## [1] 4
    c. What weekday is it after exactly 100 years from myDate? Show your codes and the
    answer.
    tempDate<-myDate
    year(tempDate)=year(myDate)+100
    weekdays(tempDate)
    ## [1] "Monday"
    wday(tempDate)
    ## [1] 2
    d. Add one month with myDate and display the resulting date time. Explain why the time
    zone has changed even though you did not ask for time zone change.
    tempDate<-myDate
    tempDate
    ## [1] "2019-10-30 19:50:21 CDT"...
    SOLUTION.PDF

    Answer To This Question Is Available To Download

    Related Questions & Answers

    More Questions »

    Submit New Assignment

    Copy and Paste Your Assignment Here