below
Final Project PS 3780 Data Literacy & Visualization, Summer 2021 Due Date: Monday, August 2, 2021 at 11:59 p.m. Final Project Description This is an individual assignment. Your �nal project for this class involves answering an interesting question or testing an interesting theory using data visualization. In the �nal paper, you will do the following: 1. State the question or theory explicitly and explain why you �nd it interesting in the introduction. 2. State one or up to three hypotheses that you derive from your theory to make empirical test. 3. Explain why the data you examine will help you to test your hypothesis/-es. De- scribe the data and where you obtained them, what (if anything) you did to reformat or transform them, how you analyzed them, and what they told you. 4. Create and include at least two (2) unique visualizations (maximum of 4). For maximum credit, at least two (2) visualizations should be made using R. All visu- alizations should be made using programs or websites that we learned in this class. (A list is included below; this does not include Excel). 5. What do you know that you did not know before? Does the answer raise further questions that might be worth investigating? If so, describe them brie�y. I anticipate the text portion of papers to be 3 - 4 pages long (before adding visualizations), double-space, no smaller than 11-point type and 1-inch margins, with in text visualiza- tions (but no larger than 1/4-page each), though succinct writers may take less space and those with more complex problems or answers may take more. Papers should be professional in quality: page numbers, formatting, and paper organization all count, with citations either in text or in footnotes with a works cited page at the end. (Works cited pages do not count toward the total page count.) Unless you collected the data yourself, be sure to cite your data sources! Due to the University's strict timeline for �nal grades, no extensions can be o�ered except in case of genuine emergency. We look forward to receiving your best e�ort by 11:59 p.m. on August 2. You will submit the �nal paper to Carmen as well as the .csv �le(s) of your data and any R code that you used to generate the visualizations within your paper. 1 Tips for Moving Forward Imagine the Would-be World Given the hypotheses that you have proposed, imagine the state of the world that would exist if the hypotheses were true as well as the state of the world where the opposite (your alternative hypotheses so to speak) was true. What evidence would be seen in both cases? Having such expectations will not only help you �nd appropriate data to test your hypotheses but also give you the hints about whether your original hypotheses are favored by the empirical evidence from visualization. Collect Data Use your hypotheses from above to begin searching for data. For this part of the project, focus in particular on specifying how you will measure the di�erent variables speci�ed by your hypothesis/-es. For example, if your argument is that democracies do not �ght one another, you will need to �gure out how you will measure both democracy and international con�ict. Once you have accomplished this, you can begin searching for and collecting data on these variables. Toward this end, it may help you to do the following: 1. Write a paragraph explaining what the relevant variables for your question are, being as speci�c as possible (including the relevant time frame for your question, the relevant states/countries, etc.). 2. Find and download data measuring all of the variables needed to answer your research question. Save this data as a .csv �le or �les. 3. Explain why the data you found will help you answer the question. Here you should describe the data in detail and defend your decision to use it by explaining why it is relevant to the question and why you trust it to be credible information. Make sure you answer these questions: Where do the data come from? What do they tell us generally? What is and is not measured? How is it measured? Analyze the Data & Create Visualizations Now that you have your data, you can begin cleaning and analyzing it. To learn more about your data, I suggest using R to do any or all of the following: 1. Reformat or transform the data if necessary. 2. Do basic descriptive statistics in R, including: mean(), median(), summary(), length(), and table () as appropriate for your speci�c dataset and variables of interest. Approved tools for creating visualizations: R World Bank Databank Google Ngram 2 DataWrapper Gapminder Write Up Your Results Work all of the above into a �nal narrative that includes your question/theory, the reasons for which you �nd it interesting, your hypothesis/-es, your data, your analysis, your visualization(s), and the results. Be succinct. Too often, college students learn to pad papers in order to reach high page limits. The suggested page length is meant to help you un-learn that habit and get right to the point. 3 3780finalpaperrubric-3qfy3hoz.docx PS 3780 Final Paper Rubric Basic Requirements /6 pts Paper is submitted as one pdf, with visualizations included in text, and R code (if applicable) provided in the Appendix. Data file(s) in csv format are included in the submission. /2 pts Paper uses R to make the visualizations /4 pts Paper Content /34 pts Paper proposes a theory, a hypothesis (or several), and the mechanisms by which the hypothesis can be supported or disproved. These theories and hypotheses are considered an interesting political topic. /10 pts Paper describes the data used – where it came from, what organization collected it, and why it is a good dataset. /5 pts Paper provides at least 2 visualizations that fit all the properties of good graphical design, created by a program learned in the class. /10 pts Paper rigorously analyzes the visualization. It also has a clear conclusion where the hypothesis is either supported or refuted. /5 pts Paper is well-written with solid structure, flow and grammar. /4 pts TOTAL /40 pts assignment-92021su-mhrbmaxp.pdf Assignment 9 PS 3780 Data Literacy & Visualization, Spring 2021 Due Date: Friday, July 23, 2021 at 11:59 p.m. Please save your visualizations and answers to these questions as one .pdf �le (use the �save as� function in most word processors). Be sure to include your name, your teammate's name if there is anyone, and the assignment number. Submit the �le to Carmen by the due date. Remember we are looking for professional visualizations so please include a meaningful title as well as axis labels and a legend. Make Money, Feel Good Use R to load the assignment9.csv dataset from Carmen. This data was collected from OSU's subscription to Gallup. There are 5 variables: State, Year, Getting.better, Eco- nomic.con�dence, and Region. Getting.better and Economic.con�dence are survey re- sponses that indicate whether the respondent feels their local situation is `getting better' and whether they have con�dence in the state of the national economy. Choose one item from each section below. Create and analyze an appropriate visualization (or 2) for the chosen questions. When analyzing relationships or trends, use a line plot or add summary lines to the plot (ie. geom_smooth). Write down what commands that you use for each. (4 pt each) 1 Maps Choose a year: Was there any geographic clustering of the feeling that things are getting better and con�dence in the economy? How similar is the geographic vari- ation for those two questions? (2 maps) Choose two years: Was there any geographic clustering to the change in either (choose one: economic con�dence or the feeling that the situation is getting better) between the chosen years? (1 map) 2 Plots What was the general relationship between economic con�dence and the feeling that things are getting better across the years? How varied is the relationship in di�erent years? (1 plot) 1 Is there any variation in the relationship between economic con�dence and the feeling that things are getting better across di�erent regions of the United States? (1 plot) Are the time trends of (choose one: economic con�dence or the feeling that the situation is getting better) di�erent by region? (1 plot) Pick 2 regions: How similar are the time trends of (choose one: economic con�dence or the feeling that the situation is getting better) among the states within the chosen regions? (2 plots) 2 Maps Plots assignment-9-fin-avwmh21w.docx Alexandra G Albanese July 20, 2021 Assignment 9 library(magrittr) library(dplyr) library(ggplot2) library(GGally) library(skimr) library(readr) assignment9 <- read_csv("assignment9.csv", col_types = cols(state = col_factor(levels = c()), region = col_factor(levels = c()))) skim(assignment9) data summary name assignment9 number of rows 306 number of columns 5 _______________________ column type frequency: factor 2 numeric 3 ________________________ group variables none variable type: factor skim_variable n_missing complete_rate ordered n_unique top_counts state 0 1 false 51 ala: 6, ala: 6, ari: 6, ark: 6 region 0 1 false 9 sou: 54, mou: 48, wes: 42, new: 36 variable type: numeric skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist year 0 1 2010.50 1.71 2008.00 2009.00 2010.50 2012.00 2013.0 ▇▃▃▃▃ getting better 0 1 0.58 0.06 0.41 0.53 0.58 0.61 0.8 ▁▆▇▂▁ economic confidence 0 read_csv("assignment9.csv",="" col_types="cols(State" =="" col_factor(levels="c())," region="col_factor(levels" =="" c())))="" skim(assignment9)="" data="" summary="" name="" assignment9="" number="" of="" rows="" 306="" number="" of="" columns="" 5="" _______________________="" column="" type="" frequency:="" factor="" 2="" numeric="" 3="" ________________________="" group="" variables="" none="" variable="" type:="" factor="" skim_variable="" n_missing="" complete_rate="" ordered="" n_unique="" top_counts="" state="" 0="" 1="" false="" 51="" ala:="" 6,="" ala:="" 6,="" ari:="" 6,="" ark:="" 6="" region="" 0="" 1="" false="" 9="" sou:="" 54,="" mou:="" 48,="" wes:="" 42,="" new:="" 36="" variable="" type:="" numeric="" skim_variable="" n_missing="" complete_rate="" mean="" sd="" p0="" p25="" p50="" p75="" p100="" hist="" year="" 0="" 1="" 2010.50="" 1.71="" 2008.00="" 2009.00="" 2010.50="" 2012.00="" 2013.0="" ▇▃▃▃▃="" getting="" better="" 0="" 1="" 0.58="" 0.06="" 0.41="" 0.53="" 0.58="" 0.61="" 0.8="" ▁▆▇▂▁="" economic="" confidence="">- read_csv("assignment9.csv", col_types = cols(state = col_factor(levels = c()), region = col_factor(levels = c()))) skim(assignment9) data summary name assignment9 number of rows 306 number of columns 5 _______________________ column type frequency: factor 2 numeric 3 ________________________ group variables none variable type: factor skim_variable n_missing complete_rate ordered n_unique top_counts state 0 1 false 51 ala: 6, ala: 6, ari: 6, ark: 6 region 0 1 false 9 sou: 54, mou: 48, wes: 42, new: 36 variable type: numeric skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist year 0 1 2010.50 1.71 2008.00 2009.00 2010.50 2012.00 2013.0 ▇▃▃▃▃ getting better 0 1 0.58 0.06 0.41 0.53 0.58 0.61 0.8 ▁▆▇▂▁ economic confidence 0>