This is a Rstudio assignment with four different exercises. Thank you.
Exam 1 Exam 1 INFO 2950 - Spring 2023 Important This exam is due Monday, March 6 at 11:59pm ET. There will be a fifteen-minute grace period for students who wait until the last minute to submit on Gradescope. Any submissions received after 12:14am ET will receive an automatic 20% deduction. Sub- missions will not be accepted after 11:59pm ET on March 7. Overview The exam covers all techniques taught thus far in the class. It consists of four (4) data analysis exercises in R/RStudio and is similar in structure to a homework assignment. Important • See the course site for all the requirements in terms of academic integrity. • You must submit a rendered PDF of your submission. The code must work in order to render. If you are unable to fully complete an exercise, partially complete submissions are highly encouraged. Better to earn some credit rather than none. • If you have clarification questions as you complete the exam, email info2950@cor nell.edu. Getting started • Go to the info2950-s23 organization on GitHub. Click on the repo with the prefix exam-01. It contains the starter documents you need to complete the lab. • Clone the repo and start a new project in RStudio. See the Lab 0 instructions for details on cloning a repo and starting a new R project. 1 https://info2950.infosci.cornell.edu/exam/exam-01.html mailto:
[email protected] mailto:
[email protected] https://github.coecis.cornell.edu/info2950-s23 https://info2950.infosci.cornell.edu/labs/lab-00.html#clone-the-repo-start-new-rstudio-project Joy Alemu Joy Alemu Workflow + formatting Make sure to • Update author name on your document. • Label all code chunks informatively and concisely. • Follow the Tidyverse code style guidelines. • Make at least 3 commits. • It is assumed that all visualizations will follow best practices as learned in the class. This includes (but is not limited to): – Resize figures where needed, avoid tiny or huge plots. – Use informative labels for plot axes, titles, etc. – Adopt optimal color palettes when a variable is mapped to the color or fill aesthetic. • Turn in an organized, well formatted document. Packages We’ll use the tidyverse package for much of the data wrangling and visualization, the scales package for better formatting of labels on visualizations, and lubridate for working with date and time columns. Part 1: Tidying messy data data/wb contains a set of CSV files with data from World Bank Open Data. Each file contains data on a different country that is a current member of the UN Security Council. All of the files contain the exact same structure. Exercise 1 For current UN Security Council members, how has infant mortality changed over time? Construct a faceted line chart to visualize the relevant data, then provide a brief (no more than one paragraph) answer to the question. Utilize your graph to support your answer. Tip • The data is split across 15 CSV files. Fortunately as long as every file contains the same structure (which they do), you can specify the file argument of read_csv() 2 https://data.worldbank.org/ https://www.un.org/securitycouncil/content/current-members as a character vector containing the relative filepaths for each of the CSV files. Pro- vide that character vector as the file argument to import all 15 files simultaneously. For this exercise, that would be something like wb_files <- list.files( path = "data/wb", pattern = "*.csv", full.names = true ) wb_files [1] "data/wb/api_alb_ds2_en_csv_v2_4784591.csv" [2] "data/wb/api_are_ds2_en_csv_v2_4799288.csv" [3] "data/wb/api_bra_ds2_en_csv_v2_4782858.csv" [4] "data/wb/api_che_ds2_en_csv_v2_4771312.csv" [5] "data/wb/api_chn_ds2_en_csv_v2_4773379.csv" [6] "data/wb/api_ecu_ds2_en_csv_v2_4783022.csv" [7] "data/wb/api_fra_ds2_en_csv_v2_4782878.csv" [8] "data/wb/api_gab_ds2_en_csv_v2_4801532.csv" [9] "data/wb/api_gbr_ds2_en_csv_v2_4784641.csv" [10] "data/wb/api_gha_ds2_en_csv_v2_4783027.csv" [11] "data/wb/api_jpn_ds2_en_csv_v2_4775881.csv" [12] "data/wb/api_mlt_ds2_en_csv_v2_4782026.csv" [13] "data/wb/api_moz_ds2_en_csv_v2_4782930.csv" [14] "data/wb/api_rus_ds2_en_csv_v2_4783339.csv" [15] "data/wb/api_usa_ds2_en_csv_v2_4782210.csv" • the source data files are an unusual structure. you may need to adjust your read_csv() parameters to successfully import them. • the original data files are untidy. after successfully importing them, you will need to tidy them before you can construct a plot. your tidied data frame should contain 930 rows. • the specific variable you need to use is “mortality rate, infant (per 1,000 live births)”. it’s code in the dataset is "sp.dyn.imrt.in". • ideally facets are arranged in a meaningful order. in the example below, the facets are ordered based on each country’s first observed child mortality rate (highest to lowest). • the chart below is color-coded to distinguish permanent members from non- permanent members of the security council. • follow best practices in constructing your visualization. 3 https://data.worldbank.org/indicator/sp.dyn.imrt.in https://data.worldbank.org/indicator/sp.dyn.imrt.in your visualization might look like this: france united kingdom switzerland japan russian federation united states china albania malta ghana ecuador gabon mozambique united arab emirates brazil 1960 1980 2000 20201960 1980 2000 20201960 1980 2000 2020 0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 year m or ta lit y ra te , i nf an t ( pe r 1, 00 0 liv e bi rt hs ) status non−permanent member permanent member for current un security council members infant mortality rates over time source: the world bank tip now is a good time to render, commit (with a descriptive and concise commit message), and push again. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. part 2: wrangling and visualizing messy(ish) data the supreme court database contains detailed information of every published decision of the u.s. supreme court since its creation in 1791. it is perhaps the most utilized database in the 4 http://scdb.wustl.edu/ study of judicial politics. in the repository’s data folder, you will find two data files: 1. scdb-case.csv 2. scdb-vote.csv these contain the exact same data you would obtain if you downloaded the files from the original website, but reformatted to be stored as relational data files. that is, scdb-case.csv contains all case-level variables, whereas scdb-vote.csv contains all vote-level variables. the data is structured in a tidy fashion. • scdb-case.csv contains one row for every case and one column for every variable • scdb-vote.csv contains one row for every vote by a justice in every case and one column for every variable the current dataset contains information on every case decided from the 1791-2021 terms.1 there are several id variables which can be used to join the data frames, specifically caseid, docketid, caseissuesid, and term. variables you will want to familiarize yourself with include: • datedecision • decisiontype • direction • issuearea • justice • justicename • majvotes • minvotes • term note each variable above is linked to the relevant documentation page in the online code book. once you import the data files, use your data wrangling and visualization skills to answer the following exercises. 1terms run from october through june, so the 2021 term contains cases decided from october 2021 - september 2022. 5 http://scdb.wustl.edu/documentation.php?var=datedecision http://scdb.wustl.edu/documentation.php?var=decisiontype http://scdb.wustl.edu/documentation.php?var=direction http://scdb.wustl.edu/documentation.php?var=issuearea http://scdb.wustl.edu/documentation.php?var=justice http://scdb.wustl.edu/documentation.php?var=justicename http://scdb.wustl.edu/documentation.php?var=majvotes http://scdb.wustl.edu/documentation.php?var=minvotes http://scdb.wustl.edu/documentation.php?var=term tip pay careful attention to the unit of analysis required to answer each question. some ques- tions only require case-level variables, others only require vote-level variables, and some may require combining the two data frames together. be sure to choose an appropriate relational join function as necessary. exercise 2 how does the percentage of cases in each term are decided by a one-vote margin (i.e. 5-4, 4-3, etc.) change over time? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. your visualization could look like this: 0% 10% 20% 30% 40% 50% 1800 1850 1900 1950 2000 term p er ce nt o f t ot al c as es d ec id ed percent of u.s. supreme court cases decided by 1−vote margin source: the supreme court database 6 tip once again, render, commit, and push. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. exercise 3 for justices currently serving on the supreme court, how often have they voted in the conservative direction in cases involving criminal procedure, civil rights, economic activity, and federal taxation? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. tip • the supreme court’s website maintains a list of active members of the court. retired justices should not be included in your analysis. • make sure to organize the resulting graph by justice in descending order of seniority. seniority is based on when a justice is appointed to the court, so the justice who has served the longest is the most “senior” justice. • note that the chief justice is always considered the most senior member of the court, regardless of appointment date. your visualization might look like one of these: 7 https://www.supremecourt.gov/about/biographies.aspx bmkavanaugh acbarrett ssotomayor ekagan nmgorsuch jgroberts cthomas saalito 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% federal taxation economic activity civil rights criminal procedure federal taxation economic activity civil rights criminal procedure federal taxation economic activity civil rights criminal procedure percent of votes cast percent of cases decided in a conservative direction u.s. supreme court source: the supreme court database economic activity federal taxation criminal procedure civil rights 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% acbarrett bmkavanaugh nmgorsuch ekagan ssotomayor saalito cthomas jgroberts acbarrett bmkavanaugh nmgorsuch ekagan ssotomayor saalito cthomas jgroberts percent of votes cast percent of cases decided in a conservative direction u.s. supreme court source: the supreme court database 8 tip once again, render, commit, and push. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. exercise 4 in each term, how many of the term’s decisions (decided after oral arguments) were announced in a given month? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. tip • most, but not all, of the court’s decisions are published following a set of oral arguments. one of the variables in the dataset indicates how the court arrived at its decision. any case which is explicitly labeled as “orally argued” should be included on this graph. • the supreme court’s calendar runs on the federal government’s fiscal year. that means the first month of the court’s term is october, running through september of the following calendar year. a plot similar to the one below would be ideal: 9 https://en.wikipedia.org/wiki/fiscal_year#federal_government september august july june may april march february january december november october 0 20 40 60 80 number of decisions announced in a term−month number of decisions announced post−oral arguments per month, by term u.s. supreme court source: the supreme court database tip render, commit, and push one last time. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. wrap up submission • go to http://www.gradescope.com and click log in in the top right corner. • click school credentials → cornell university netid and log in using your netid cre- dentials. • click on your info 2950 course. • click on the assignment, and you’ll be prompted to submit it. 10 http://www.gradescope.com joy alemu • mark all the pages associated with exercise. all the pages of your homework should be associated with at least one question (i.e., should be “checked”). grading • exercise 1: 15 points • exercise 2: 10 points • exercise 3: 10 points • exercise 4: 15 points • total: 50 points 11 joy alemu overview getting started workflow + formatting packages part 1: tidying messy data exercise 1 part 2: wrangling and visualizing messy(ish) data exercise 2 exercise 3 exercise 4 wrap up submission grading list.files(="" path="data/wb" ,="" pattern="*.csv" ,="" full.names="TRUE" )="" wb_files="" [1]="" "data/wb/api_alb_ds2_en_csv_v2_4784591.csv"="" [2]="" "data/wb/api_are_ds2_en_csv_v2_4799288.csv"="" [3]="" "data/wb/api_bra_ds2_en_csv_v2_4782858.csv"="" [4]="" "data/wb/api_che_ds2_en_csv_v2_4771312.csv"="" [5]="" "data/wb/api_chn_ds2_en_csv_v2_4773379.csv"="" [6]="" "data/wb/api_ecu_ds2_en_csv_v2_4783022.csv"="" [7]="" "data/wb/api_fra_ds2_en_csv_v2_4782878.csv"="" [8]="" "data/wb/api_gab_ds2_en_csv_v2_4801532.csv"="" [9]="" "data/wb/api_gbr_ds2_en_csv_v2_4784641.csv"="" [10]="" "data/wb/api_gha_ds2_en_csv_v2_4783027.csv"="" [11]="" "data/wb/api_jpn_ds2_en_csv_v2_4775881.csv"="" [12]="" "data/wb/api_mlt_ds2_en_csv_v2_4782026.csv"="" [13]="" "data/wb/api_moz_ds2_en_csv_v2_4782930.csv"="" [14]="" "data/wb/api_rus_ds2_en_csv_v2_4783339.csv"="" [15]="" "data/wb/api_usa_ds2_en_csv_v2_4782210.csv"="" •="" the="" source="" data="" files="" are="" an="" unusual="" structure.="" you="" may="" need="" to="" adjust="" your="" read_csv()="" parameters="" to="" successfully="" import="" them.="" •="" the="" original="" data="" files="" are="" untidy.="" after="" successfully="" importing="" them,="" you="" will="" need="" to="" tidy="" them="" before="" you="" can="" construct="" a="" plot.="" your="" tidied="" data="" frame="" should="" contain="" 930="" rows.="" •="" the="" specific="" variable="" you="" need="" to="" use="" is="" “mortality="" rate,="" infant="" (per="" 1,000="" live="" births)”.="" it’s="" code="" in="" the="" dataset="" is="" "sp.dyn.imrt.in".="" •="" ideally="" facets="" are="" arranged="" in="" a="" meaningful="" order.="" in="" the="" example="" below,="" the="" facets="" are="" ordered="" based="" on="" each="" country’s="" first="" observed="" child="" mortality="" rate="" (highest="" to="" lowest).="" •="" the="" chart="" below="" is="" color-coded="" to="" distinguish="" permanent="" members="" from="" non-="" permanent="" members="" of="" the="" security="" council.="" •="" follow="" best="" practices="" in="" constructing="" your="" visualization.="" 3="" https://data.worldbank.org/indicator/sp.dyn.imrt.in="" https://data.worldbank.org/indicator/sp.dyn.imrt.in="" your="" visualization="" might="" look="" like="" this:="" france="" united="" kingdom="" switzerland="" japan="" russian="" federation="" united="" states="" china="" albania="" malta="" ghana="" ecuador="" gabon="" mozambique="" united="" arab="" emirates="" brazil="" 1960="" 1980="" 2000="" 20201960="" 1980="" 2000="" 20201960="" 1980="" 2000="" 2020="" 0="" 50="" 100="" 150="" 0="" 50="" 100="" 150="" 0="" 50="" 100="" 150="" 0="" 50="" 100="" 150="" 0="" 50="" 100="" 150="" year="" m="" or="" ta="" lit="" y="" ra="" te="" ,="" i="" nf="" an="" t="" (="" pe="" r="" 1,="" 00="" 0="" liv="" e="" bi="" rt="" hs="" )="" status="" non−permanent="" member="" permanent="" member="" for="" current="" un="" security="" council="" members="" infant="" mortality="" rates="" over="" time="" source:="" the="" world="" bank="" tip="" now="" is="" a="" good="" time="" to="" render,="" commit="" (with="" a="" descriptive="" and="" concise="" commit="" message),="" and="" push="" again.="" make="" sure="" that="" you="" commit="" and="" push="" all="" changed="" documents="" and="" your="" git="" pane="" is="" completely="" empty="" before="" proceeding.="" part="" 2:="" wrangling="" and="" visualizing="" messy(ish)="" data="" the="" supreme="" court="" database="" contains="" detailed="" information="" of="" every="" published="" decision="" of="" the="" u.s.="" supreme="" court="" since="" its="" creation="" in="" 1791.="" it="" is="" perhaps="" the="" most="" utilized="" database="" in="" the="" 4="" http://scdb.wustl.edu/="" study="" of="" judicial="" politics.="" in="" the="" repository’s="" data="" folder,="" you="" will="" find="" two="" data="" files:="" 1.="" scdb-case.csv="" 2.="" scdb-vote.csv="" these="" contain="" the="" exact="" same="" data="" you="" would="" obtain="" if="" you="" downloaded="" the="" files="" from="" the="" original="" website,="" but="" reformatted="" to="" be="" stored="" as="" relational="" data="" files.="" that="" is,="" scdb-case.csv="" contains="" all="" case-level="" variables,="" whereas="" scdb-vote.csv="" contains="" all="" vote-level="" variables.="" the="" data="" is="" structured="" in="" a="" tidy="" fashion.="" •="" scdb-case.csv="" contains="" one="" row="" for="" every="" case="" and="" one="" column="" for="" every="" variable="" •="" scdb-vote.csv="" contains="" one="" row="" for="" every="" vote="" by="" a="" justice="" in="" every="" case="" and="" one="" column="" for="" every="" variable="" the="" current="" dataset="" contains="" information="" on="" every="" case="" decided="" from="" the="" 1791-2021="" terms.1="" there="" are="" several="" id="" variables="" which="" can="" be="" used="" to="" join="" the="" data="" frames,="" specifically="" caseid,="" docketid,="" caseissuesid,="" and="" term.="" variables="" you="" will="" want="" to="" familiarize="" yourself="" with="" include:="" •="" datedecision="" •="" decisiontype="" •="" direction="" •="" issuearea="" •="" justice="" •="" justicename="" •="" majvotes="" •="" minvotes="" •="" term="" note="" each="" variable="" above="" is="" linked="" to="" the="" relevant="" documentation="" page="" in="" the="" online="" code="" book.="" once="" you="" import="" the="" data="" files,="" use="" your="" data="" wrangling="" and="" visualization="" skills="" to="" answer="" the="" following="" exercises.="" 1terms="" run="" from="" october="" through="" june,="" so="" the="" 2021="" term="" contains="" cases="" decided="" from="" october="" 2021="" -="" september="" 2022.="" 5="" http://scdb.wustl.edu/documentation.php?var="dateDecision" http://scdb.wustl.edu/documentation.php?var="decisionType" http://scdb.wustl.edu/documentation.php?var="direction" http://scdb.wustl.edu/documentation.php?var="issueArea" http://scdb.wustl.edu/documentation.php?var="justice" http://scdb.wustl.edu/documentation.php?var="justiceName" http://scdb.wustl.edu/documentation.php?var="majVotes" http://scdb.wustl.edu/documentation.php?var="minVotes" http://scdb.wustl.edu/documentation.php?var="term" tip="" pay="" careful="" attention="" to="" the="" unit="" of="" analysis="" required="" to="" answer="" each="" question.="" some="" ques-="" tions="" only="" require="" case-level="" variables,="" others="" only="" require="" vote-level="" variables,="" and="" some="" may="" require="" combining="" the="" two="" data="" frames="" together.="" be="" sure="" to="" choose="" an="" appropriate="" relational="" join="" function="" as="" necessary.="" exercise="" 2="" how="" does="" the="" percentage="" of="" cases="" in="" each="" term="" are="" decided="" by="" a="" one-vote="" margin="" (i.e.="" 5-4,="" 4-3,="" etc.)="" change="" over="" time?="" generate="" an="" appropriate="" visualization,="" then="" provide="" a="" brief="" (no="" more="" than="" one="" paragraph)="" answer="" to="" the="" question.="" utilize="" your="" graph="" to="" support="" your="" answer.="" your="" visualization="" could="" look="" like="" this:="" 0%="" 10%="" 20%="" 30%="" 40%="" 50%="" 1800="" 1850="" 1900="" 1950="" 2000="" term="" p="" er="" ce="" nt="" o="" f="" t="" ot="" al="" c="" as="" es="" d="" ec="" id="" ed="" percent="" of="" u.s.="" supreme="" court="" cases="" decided="" by="" 1−vote="" margin="" source:="" the="" supreme="" court="" database="" 6="" tip="" once="" again,="" render,="" commit,="" and="" push.="" make="" sure="" that="" you="" commit="" and="" push="" all="" changed="" documents="" and="" your="" git="" pane="" is="" completely="" empty="" before="" proceeding.="" exercise="" 3="" for="" justices="" currently="" serving="" on="" the="" supreme="" court,="" how="" often="" have="" they="" voted="" in="" the="" conservative="" direction="" in="" cases="" involving="" criminal="" procedure,="" civil="" rights,="" economic="" activity,="" and="" federal="" taxation?="" generate="" an="" appropriate="" visualization,="" then="" provide="" a="" brief="" (no="" more="" than="" one="" paragraph)="" answer="" to="" the="" question.="" utilize="" your="" graph="" to="" support="" your="" answer.="" tip="" •="" the="" supreme="" court’s="" website="" maintains="" a="" list="" of="" active="" members="" of="" the="" court.="" retired="" justices="" should="" not="" be="" included="" in="" your="" analysis.="" •="" make="" sure="" to="" organize="" the="" resulting="" graph="" by="" justice="" in="" descending="" order="" of="" seniority.="" seniority="" is="" based="" on="" when="" a="" justice="" is="" appointed="" to="" the="" court,="" so="" the="" justice="" who="" has="" served="" the="" longest="" is="" the="" most="" “senior”="" justice.="" •="" note="" that="" the="" chief="" justice="" is="" always="" considered="" the="" most="" senior="" member="" of="" the="" court,="" regardless="" of="" appointment="" date.="" your="" visualization="" might="" look="" like="" one="" of="" these:="" 7="" https://www.supremecourt.gov/about/biographies.aspx="" bmkavanaugh="" acbarrett="" ssotomayor="" ekagan="" nmgorsuch="" jgroberts="" cthomas="" saalito="" 0%="" 25%="" 50%="" 75%="" 100%="" 0%="" 25%="" 50%="" 75%="" 100%="" 0%="" 25%="" 50%="" 75%="" 100%="" federal="" taxation="" economic="" activity="" civil="" rights="" criminal="" procedure="" federal="" taxation="" economic="" activity="" civil="" rights="" criminal="" procedure="" federal="" taxation="" economic="" activity="" civil="" rights="" criminal="" procedure="" percent="" of="" votes="" cast="" percent="" of="" cases="" decided="" in="" a="" conservative="" direction="" u.s.="" supreme="" court="" source:="" the="" supreme="" court="" database="" economic="" activity="" federal="" taxation="" criminal="" procedure="" civil="" rights="" 0%="" 25%="" 50%="" 75%="" 100%="" 0%="" 25%="" 50%="" 75%="" 100%="" acbarrett="" bmkavanaugh="" nmgorsuch="" ekagan="" ssotomayor="" saalito="" cthomas="" jgroberts="" acbarrett="" bmkavanaugh="" nmgorsuch="" ekagan="" ssotomayor="" saalito="" cthomas="" jgroberts="" percent="" of="" votes="" cast="" percent="" of="" cases="" decided="" in="" a="" conservative="" direction="" u.s.="" supreme="" court="" source:="" the="" supreme="" court="" database="" 8="" tip="" once="" again,="" render,="" commit,="" and="" push.="" make="" sure="" that="" you="" commit="" and="" push="" all="" changed="" documents="" and="" your="" git="" pane="" is="" completely="" empty="" before="" proceeding.="" exercise="" 4="" in="" each="" term,="" how="" many="" of="" the="" term’s="" decisions="" (decided="" after="" oral="" arguments)="" were="" announced="" in="" a="" given="" month?="" generate="" an="" appropriate="" visualization,="" then="" provide="" a="" brief="" (no="" more="" than="" one="" paragraph)="" answer="" to="" the="" question.="" utilize="" your="" graph="" to="" support="" your="" answer.="" tip="" •="" most,="" but="" not="" all,="" of="" the="" court’s="" decisions="" are="" published="" following="" a="" set="" of="" oral="" arguments.="" one="" of="" the="" variables="" in="" the="" dataset="" indicates="" how="" the="" court="" arrived="" at="" its="" decision.="" any="" case="" which="" is="" explicitly="" labeled="" as="" “orally="" argued”="" should="" be="" included="" on="" this="" graph.="" •="" the="" supreme="" court’s="" calendar="" runs="" on="" the="" federal="" government’s="" fiscal="" year.="" that="" means="" the="" first="" month="" of="" the="" court’s="" term="" is="" october,="" running="" through="" september="" of="" the="" following="" calendar="" year.="" a="" plot="" similar="" to="" the="" one="" below="" would="" be="" ideal:="" 9="" https://en.wikipedia.org/wiki/fiscal_year#federal_government="" september="" august="" july="" june="" may="" april="" march="" february="" january="" december="" november="" october="" 0="" 20="" 40="" 60="" 80="" number="" of="" decisions="" announced="" in="" a="" term−month="" number="" of="" decisions="" announced="" post−oral="" arguments="" per="" month,="" by="" term="" u.s.="" supreme="" court="" source:="" the="" supreme="" court="" database="" tip="" render,="" commit,="" and="" push="" one="" last="" time.="" make="" sure="" that="" you="" commit="" and="" push="" all="" changed="" documents="" and="" your="" git="" pane="" is="" completely="" empty="" before="" proceeding.="" wrap="" up="" submission="" •="" go="" to="" http://www.gradescope.com="" and="" click="" log="" in="" in="" the="" top="" right="" corner.="" •="" click="" school="" credentials="" →="" cornell="" university="" netid="" and="" log="" in="" using="" your="" netid="" cre-="" dentials.="" •="" click="" on="" your="" info="" 2950="" course.="" •="" click="" on="" the="" assignment,="" and="" you’ll="" be="" prompted="" to="" submit="" it.="" 10="" http://www.gradescope.com="" joy="" alemu="" •="" mark="" all="" the="" pages="" associated="" with="" exercise.="" all="" the="" pages="" of="" your="" homework="" should="" be="" associated="" with="" at="" least="" one="" question="" (i.e.,="" should="" be="" “checked”).="" grading="" •="" exercise="" 1:="" 15="" points="" •="" exercise="" 2:="" 10="" points="" •="" exercise="" 3:="" 10="" points="" •="" exercise="" 4:="" 15="" points="" •="" total:="" 50="" points="" 11="" joy="" alemu="" overview="" getting="" started="" workflow="" +="" formatting="" packages="" part="" 1:="" tidying="" messy="" data="" exercise="" 1="" part="" 2:="" wrangling="" and="" visualizing="" messy(ish)="" data="" exercise="" 2="" exercise="" 3="" exercise="" 4="" wrap="" up="" submission="">- list.files( path = "data/wb", pattern = "*.csv", full.names = true ) wb_files [1] "data/wb/api_alb_ds2_en_csv_v2_4784591.csv" [2] "data/wb/api_are_ds2_en_csv_v2_4799288.csv" [3] "data/wb/api_bra_ds2_en_csv_v2_4782858.csv" [4] "data/wb/api_che_ds2_en_csv_v2_4771312.csv" [5] "data/wb/api_chn_ds2_en_csv_v2_4773379.csv" [6] "data/wb/api_ecu_ds2_en_csv_v2_4783022.csv" [7] "data/wb/api_fra_ds2_en_csv_v2_4782878.csv" [8] "data/wb/api_gab_ds2_en_csv_v2_4801532.csv" [9] "data/wb/api_gbr_ds2_en_csv_v2_4784641.csv" [10] "data/wb/api_gha_ds2_en_csv_v2_4783027.csv" [11] "data/wb/api_jpn_ds2_en_csv_v2_4775881.csv" [12] "data/wb/api_mlt_ds2_en_csv_v2_4782026.csv" [13] "data/wb/api_moz_ds2_en_csv_v2_4782930.csv" [14] "data/wb/api_rus_ds2_en_csv_v2_4783339.csv" [15] "data/wb/api_usa_ds2_en_csv_v2_4782210.csv" • the source data files are an unusual structure. you may need to adjust your read_csv() parameters to successfully import them. • the original data files are untidy. after successfully importing them, you will need to tidy them before you can construct a plot. your tidied data frame should contain 930 rows. • the specific variable you need to use is “mortality rate, infant (per 1,000 live births)”. it’s code in the dataset is "sp.dyn.imrt.in". • ideally facets are arranged in a meaningful order. in the example below, the facets are ordered based on each country’s first observed child mortality rate (highest to lowest). • the chart below is color-coded to distinguish permanent members from non- permanent members of the security council. • follow best practices in constructing your visualization. 3 https://data.worldbank.org/indicator/sp.dyn.imrt.in https://data.worldbank.org/indicator/sp.dyn.imrt.in your visualization might look like this: france united kingdom switzerland japan russian federation united states china albania malta ghana ecuador gabon mozambique united arab emirates brazil 1960 1980 2000 20201960 1980 2000 20201960 1980 2000 2020 0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 0 50 100 150 year m or ta lit y ra te , i nf an t ( pe r 1, 00 0 liv e bi rt hs ) status non−permanent member permanent member for current un security council members infant mortality rates over time source: the world bank tip now is a good time to render, commit (with a descriptive and concise commit message), and push again. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. part 2: wrangling and visualizing messy(ish) data the supreme court database contains detailed information of every published decision of the u.s. supreme court since its creation in 1791. it is perhaps the most utilized database in the 4 http://scdb.wustl.edu/ study of judicial politics. in the repository’s data folder, you will find two data files: 1. scdb-case.csv 2. scdb-vote.csv these contain the exact same data you would obtain if you downloaded the files from the original website, but reformatted to be stored as relational data files. that is, scdb-case.csv contains all case-level variables, whereas scdb-vote.csv contains all vote-level variables. the data is structured in a tidy fashion. • scdb-case.csv contains one row for every case and one column for every variable • scdb-vote.csv contains one row for every vote by a justice in every case and one column for every variable the current dataset contains information on every case decided from the 1791-2021 terms.1 there are several id variables which can be used to join the data frames, specifically caseid, docketid, caseissuesid, and term. variables you will want to familiarize yourself with include: • datedecision • decisiontype • direction • issuearea • justice • justicename • majvotes • minvotes • term note each variable above is linked to the relevant documentation page in the online code book. once you import the data files, use your data wrangling and visualization skills to answer the following exercises. 1terms run from october through june, so the 2021 term contains cases decided from october 2021 - september 2022. 5 http://scdb.wustl.edu/documentation.php?var=datedecision http://scdb.wustl.edu/documentation.php?var=decisiontype http://scdb.wustl.edu/documentation.php?var=direction http://scdb.wustl.edu/documentation.php?var=issuearea http://scdb.wustl.edu/documentation.php?var=justice http://scdb.wustl.edu/documentation.php?var=justicename http://scdb.wustl.edu/documentation.php?var=majvotes http://scdb.wustl.edu/documentation.php?var=minvotes http://scdb.wustl.edu/documentation.php?var=term tip pay careful attention to the unit of analysis required to answer each question. some ques- tions only require case-level variables, others only require vote-level variables, and some may require combining the two data frames together. be sure to choose an appropriate relational join function as necessary. exercise 2 how does the percentage of cases in each term are decided by a one-vote margin (i.e. 5-4, 4-3, etc.) change over time? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. your visualization could look like this: 0% 10% 20% 30% 40% 50% 1800 1850 1900 1950 2000 term p er ce nt o f t ot al c as es d ec id ed percent of u.s. supreme court cases decided by 1−vote margin source: the supreme court database 6 tip once again, render, commit, and push. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. exercise 3 for justices currently serving on the supreme court, how often have they voted in the conservative direction in cases involving criminal procedure, civil rights, economic activity, and federal taxation? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. tip • the supreme court’s website maintains a list of active members of the court. retired justices should not be included in your analysis. • make sure to organize the resulting graph by justice in descending order of seniority. seniority is based on when a justice is appointed to the court, so the justice who has served the longest is the most “senior” justice. • note that the chief justice is always considered the most senior member of the court, regardless of appointment date. your visualization might look like one of these: 7 https://www.supremecourt.gov/about/biographies.aspx bmkavanaugh acbarrett ssotomayor ekagan nmgorsuch jgroberts cthomas saalito 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% federal taxation economic activity civil rights criminal procedure federal taxation economic activity civil rights criminal procedure federal taxation economic activity civil rights criminal procedure percent of votes cast percent of cases decided in a conservative direction u.s. supreme court source: the supreme court database economic activity federal taxation criminal procedure civil rights 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% acbarrett bmkavanaugh nmgorsuch ekagan ssotomayor saalito cthomas jgroberts acbarrett bmkavanaugh nmgorsuch ekagan ssotomayor saalito cthomas jgroberts percent of votes cast percent of cases decided in a conservative direction u.s. supreme court source: the supreme court database 8 tip once again, render, commit, and push. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. exercise 4 in each term, how many of the term’s decisions (decided after oral arguments) were announced in a given month? generate an appropriate visualization, then provide a brief (no more than one paragraph) answer to the question. utilize your graph to support your answer. tip • most, but not all, of the court’s decisions are published following a set of oral arguments. one of the variables in the dataset indicates how the court arrived at its decision. any case which is explicitly labeled as “orally argued” should be included on this graph. • the supreme court’s calendar runs on the federal government’s fiscal year. that means the first month of the court’s term is october, running through september of the following calendar year. a plot similar to the one below would be ideal: 9 https://en.wikipedia.org/wiki/fiscal_year#federal_government september august july june may april march february january december november october 0 20 40 60 80 number of decisions announced in a term−month number of decisions announced post−oral arguments per month, by term u.s. supreme court source: the supreme court database tip render, commit, and push one last time. make sure that you commit and push all changed documents and your git pane is completely empty before proceeding. wrap up submission • go to http://www.gradescope.com and click log in in the top right corner. • click school credentials → cornell university netid and log in using your netid cre- dentials. • click on your info 2950 course. • click on the assignment, and you’ll be prompted to submit it. 10 http://www.gradescope.com joy alemu • mark all the pages associated with exercise. all the pages of your homework should be associated with at least one question (i.e., should be “checked”). grading • exercise 1: 15 points • exercise 2: 10 points • exercise 3: 10 points • exercise 4: 15 points • total: 50 points 11 joy alemu overview getting started workflow + formatting packages part 1: tidying messy data exercise 1 part 2: wrangling and visualizing messy(ish) data exercise 2 exercise 3 exercise 4 wrap up submission grading>