MAKE HTML DOCUMENT
--- title: "Assignment 4" author: "Your name and ID here" date: "01/02/2020" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) # packages library(AER) library(tidyverse) library(wooldridge) # load model summary and set options library(modelsummary) gm <- modelsummary::gof_map="" gm$omit="">-><- true="" gm$omit[1]="">-><- false="" gm$omit[6]="">-><- false="" gm$omit[5]="">-><- false="" gm$omit[17]="">-><- false="" ```="" ##="" instructions="" please="" read="" each="" question="" and="" answer="" where="" appropriate.="" the="" assignment="" is="" graded="" on="" a="" scale="" from="" 1="" to="" 5.="" i="" grade="" effort="" as="" well="" as="" content.="" that="" means="" to="" obtain="" a="" 5,="" every="" question="" must="" be="" attempted,="" and="" i="" am="" a="" kind="" grader="" if="" the="" effort="" was="" high,="" but="" the="" result="" was="" not="" quite="" right.="" after="" you="" answer="" the="" questions,="" `knit`="" the="" document="" to="" html="" and="" submit="" on="" moodle.="" i="" will="" **only="" grade**="" html.="" if="" you="" submit="" the="" `rmd`="" file="" instead,="" you="" will="" receive="" a="" zero.="" you="" have="" been="" warned,="" so="" there="" will="" be="" no="" exceptions.="" groups="" of="" up="" to="" four="" are="" allowed,="" but="" every="" student="" must="" submit="" their="" own="" assignment.="" **if="" an="" interpretation="" of="" output="" is="" asked="" for,="" but="" only="" output="" or="" code="" is="" given,="" the="" question="" will="" get="" zero="" points**="" #="" question="" 1:="" polynomails="" and="" interactions="" this="" question="" uses="" the="" wooldridge="" data="" set="" `wage1`.="" i="" have="" loaded="" it="" below="" to="" a="" data="" frame="" called="" `wage1`.="" the="" purpose="" of="" this="" question="" is="" to="" get="" used="" to="" interpreting="" coefficients="" with="" different="" functional="" form="" assumptions.="" ```{r}="" wage1="">-><- wooldridge::wage1="" %="">% filter(complete.cases(.)) ``` Consider the generic regression: $$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_1^2 + \beta_3 x_2 + \beta_4 x_3 + \beta_5 (x_2\cdot x_3) + u $$ Here, the variable $x_1$ is entered into the regression twice -- once as a 'main' effect and once as a squared term. Consider the partial effect of increasing $x_1$ by 1 unit. $$ \frac{\partial y}{\partial x_1} = \beta_1 + \beta_2 x_1 $$ This says that the marginal impact on $y$ from a one unit increase in $x_1$ is not a constant, but a function. The impact depends on the value of $x_1$. For example, suppose that $\beta_1$ is positive and $\beta_2$ is negative. This means that the relationship between $y$ and $x_1$ exhibits decreasing marginal returns. That is, when $x_1$ increases from 0 to 1, the $\frac{\partial y}{\partial x_1}$ is higher than going from 2 to 3 and so on. If we graph this, it would look like an inverted 'U'. For example, it might look like: ```{r} fig.data <- data.frame(x="1:10)" %="">% mutate(y = 100 + 20 * x - 1.5*x^2) ggplot(fig.data, aes(y = y, x = x)) + geom_line() + labs(title = "Typical diminishing marginal returns profile") ``` We can find the the point at which `x_1` turns from positive to negative (the inflection point) but setting $\frac{\partial y}{\partial x_1} = 0$ and solving for $x_1$. This yields $$ x^*_1 = \left|\frac{\beta_1}{2\cdot \beta_2}\right|. $$ The variables $x_2$ and $x_3$ also appear twice; each as an main effect and then as an interaction. Consider the partial effect of $x_2$ on $y$ $$ \frac{\partial y}{\partial x_2} = \beta_3 + \beta_5 x_2 $$ Again, this says that the impact of $x_2$ on $y$ is not a constant. It allows the impact to depend on the value of $x_3$. The treatment of $x_3$ is symmetric. We most often use these types of interactions when one term is a dummy variable. Suppose $x_3$ only takes on two values, 1 and 0. Then $$ \frac{\partial y}{\partial x_2} = \beta_3 + \beta_5 \text{ when } x_3 = 1 \text{ and } \frac{\partial y}{\partial x_2} = \beta_3 \text{ when } x_3 = 0 $$ Since dummy variables denote groups (ie, 2 groups), this allows each group to have its own intercept ($\beta_4$) and slope. Graphically, it looks like: ```{r} ggplot(wage1 %>% filter(educ>5), aes(y = lwage, x = educ, color = factor(west))) + geom_smooth(method = 'lm', se = F) ``` Where each line is a regression of log wages on education, with an interaction for living in the west. The return education for workers in the western United States in this data is higher than the return for those in the rest of the country. In `R`, we can create variables "on the fly" to use in regressions. We use this mostly to create interaction terms and low order polynomial terms. Consider the following code that would estimate the following equation $$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_1^2 + \beta_3 x_2 + \beta_4 x_3 + \beta_5 (x_2\cdot x_3) + u $$ In the code below, each regression is exactly the same, just different ways of expressing it: ```{r, eval = F} mod <- lm(y="" ~="" x_1="" +="" i(x_1^2)="" +="" x_2="" *="" x_3,="" data="data.frame)" mod="">-><- lm(y="" ~="" poly(x1,2,="" raw="T)" +="" x_2="" *="" x_3,="" data="data.frame)" mod="">-><- lm(y="" ~="" x_1="" +="" i(x_1^2)="" +="" x_2="" +="" x_3="" +="" x_2:x_3,="" data="data.frame)" mod="">-><- lm(y="" ~="" x_1="" +="" i(x_1^2)="" +="" x_2="" +="" x_3="" +="" i(x_2*x_3),="" data="data.frame)" ```="" the="" term="" `i()`="" is="" an="" "insulator="" function".="" it="" tells="" `r`="" to="" evaluate="" the="" expression="" inside="" first,="" then="" run="" the="" regression.="" the="" notation="" for="" `x_2*x_3`="" says="" to="" include="" main="" effects="" for="" each="" variable,="" plus="" and="" interaction.="" the="" notation="" `x_2:x_3`="" just="" includes="" an="" interaction.="" finally,="" `poly()`="" constructs="" low="" order="" polynomials.="" the="" `raw="T`" option="" is="" important.="" ```{r}="" wage1="">-><- wooldridge::wage1="" %="">% filter(complete.cases(.)) # fit models models <- list(="" lm(lwage="" ~="" educ="" +="" exper="" +="" i(exper^2)="" +="" nonwhite="" +="" female="" ,="" data="wage1)," lm(lwage="" ~="" educ*female="" +="" exper="" +="" i(exper^2)="" +="" nonwhite="" ,="" data="wage1)" )="" #="" table="" modelsummary(models,="" fmt="5," statistics_override="sandwich," stars="T," gof_map="gm)" ```="" 1.="" in="" the="" first="" column,="" interpret="" the="" return="" to="" experience="" (`exper`).="" after="" how="" many="" years="" of="" experience="" does="" the="" relationship="" turn="" negative?="" 2.="" in="" column="" two,="" what="" is="" the="" return="" to="" education="" for="" men="" and="" women.="" are="" the="" returns="" to="" education="" significantly="" different="" for="" men="" and="" women?="" #="" question="" 2:="" teaching="" evaluations="" many="" college="" courses="" conclude="" by="" giving="" students="" the="" opportunity="" to="" evaluate="" the="" course="" and="" the="" instructor="" anonymously.="" however,="" the="" use="" of="" these="" student="" evaluations="" as="" an="" indicator="" of="" course="" quality="" and="" teaching="" effectiveness="" is="" often="" criticized="" because="" these="" measures="" may="" reflect="" the="" influence="" of="" non-teaching="" related="" characteristics,="" such="" as="" the="" physical="" appearance="" of="" the="" instructor.="" the="" article="" titled,="" “beauty="" in="" the="" classroom:="" instructors’="" pulchritude="" and="" putative="" pedagogical="" productivity”="" (hamermesh="" and="" parker,="" 2005)="" found="" that="" instructors="" who="" are="" viewed="" to="" be="" better="" looking="" receive="" higher="" instructional="" ratings.=""> Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013. [Paper link - not required to read](http://www.sciencedirect.com/science/article/pii/S0272775704001165) ```{r} data("TeachingRatings") # load ratings data df <- teachingratings # re-name as df for convenience ``` 1. the data set `df` constructed in the above code chunk contains different types of variables. use the command `str()` on the data frame `df` to answer below: (a) what type of variable is `credits`? what fraction of the data are single credit courses? (b) what type of variable is `allstudents`? what is the largest class in the data set? (c) construct a variable called `frac` that is the proportion of students in the class that filled out the evaluation. what is the average particiaption rate? 2. you can see the variable definitions by typing "?teachingratings" in the console. suppose we are interested in estimating a causal effect of `beauty` on `eval`. that is, $$ eval_i = \beta_0 + \beta_1 beauty_i + \eta_i $$ using the strategy discussed in class and in chapter 7.6, construct a regression table evaluating the causal effect of beauty on teaching evaluations. your regression table should consider several specifications, starting with the bivariate regression above and then adding more controls, possibly in groups. for each specification, state why you think its important to include for the controls you add. your answer should relate to the cia assumption. interpret your results, do you think that beauty has a causal impact on evaluations. if yes, defend your answer. if not, state why not. ```{r} # regression table here ``` 3. run a regression of `eval` on `beauty`, `gender`, `minority`, `credits`, `division`, `tenure`, `native`. consider my data: i am male, non-minority, native english speaker, teaching multiple credit courses in an upper-division and i have tenure. while i don't have a `beauty` rating, according to [ratemyprofessor.com](https://www.ratemyprofessors.com/showratings.jsp?tid=2033571), i have an evaluation of 2.3. use your regression and my information to infer my what my `beauty` rating would be if i were in this data set. ```{r} # regression here ``` 4. in your regression you ran in part (3), the coefficient on gender shows that women have on average, after controlling for other characteristics, lower evaluations than men. this has lead to additional research on the topic -- evaluations are important for promotion and tenure decisions. add an interaction term between `beauty` and `gender`. interpret your results: is the marginal impact of beauty the same for men and women? are good-looking men treated differently from good-looking women by students in terms of their evaluations? ```{r} # regression here ``` # question 3: birth weight smoking during pregnancy has been shown to have significant adverse health effects for new born babies. smoking is thought to be a preventable cause of low birth weight of infants who in turn, need more resources at delivery and are more likely to have related health problems in infancy and beyond. despite these concerns, many women still smoke during pregnancy. in this section, we analyze the relationship between birth weight and smoking behavior, with the emphasis on identifying a _causal_ impact of smoking on the birth weight of newborns. the relationship we examine is: $$ \log(\texttt{birth weight})_i = \beta_0 + \beta_1 \texttt{smoking}_i + \eta_i $$ where $\texttt{smoking}_i$ will be measured by average cigarettes per day. the term $\eta_i$ captures all of the other things that determine birth weight aside from smoking. ### baseline analysis. investigate the birth weight-smoking relationship and present your results in a table teachingratings="" #="" re-name="" as="" df="" for="" convenience="" ```="" 1.="" the="" data="" set="" `df`="" constructed="" in="" the="" above="" code="" chunk="" contains="" different="" types="" of="" variables.="" use="" the="" command="" `str()`="" on="" the="" data="" frame="" `df`="" to="" answer="" below:="" (a)="" what="" type="" of="" variable="" is="" `credits`?="" what="" fraction="" of="" the="" data="" are="" single="" credit="" courses?="" (b)="" what="" type="" of="" variable="" is="" `allstudents`?="" what="" is="" the="" largest="" class="" in="" the="" data="" set?="" (c)="" construct="" a="" variable="" called="" `frac`="" that="" is="" the="" proportion="" of="" students="" in="" the="" class="" that="" filled="" out="" the="" evaluation.="" what="" is="" the="" average="" particiaption="" rate?="" 2.="" you="" can="" see="" the="" variable="" definitions="" by="" typing="" "?teachingratings"="" in="" the="" console.="" suppose="" we="" are="" interested="" in="" estimating="" a="" causal="" effect="" of="" `beauty`="" on="" `eval`.="" that="" is,="" $$="" eval_i="\beta_0" +="" \beta_1="" beauty_i="" +="" \eta_i="" $$="" using="" the="" strategy="" discussed="" in="" class="" and="" in="" chapter="" 7.6,="" construct="" a="" regression="" table="" evaluating="" the="" causal="" effect="" of="" beauty="" on="" teaching="" evaluations.="" your="" regression="" table="" should="" consider="" several="" specifications,="" starting="" with="" the="" bivariate="" regression="" above="" and="" then="" adding="" more="" controls,="" possibly="" in="" groups.="" for="" each="" specification,="" state="" why="" you="" think="" its="" important="" to="" include="" for="" the="" controls="" you="" add.="" your="" answer="" should="" relate="" to="" the="" cia="" assumption.="" interpret="" your="" results,="" do="" you="" think="" that="" beauty="" has="" a="" causal="" impact="" on="" evaluations.="" if="" yes,="" defend="" your="" answer.="" if="" not,="" state="" why="" not.="" ```{r}="" #="" regression="" table="" here="" ```="" 3.="" run="" a="" regression="" of="" `eval`="" on="" `beauty`,="" `gender`,="" `minority`,="" `credits`,="" `division`,="" `tenure`,="" `native`.="" consider="" my="" data:="" i="" am="" male,="" non-minority,="" native="" english="" speaker,="" teaching="" multiple="" credit="" courses="" in="" an="" upper-division="" and="" i="" have="" tenure.="" while="" i="" don't="" have="" a="" `beauty`="" rating,="" according="" to="" [ratemyprofessor.com](https://www.ratemyprofessors.com/showratings.jsp?tid="2033571)," i="" have="" an="" evaluation="" of="" 2.3.="" use="" your="" regression="" and="" my="" information="" to="" infer="" my="" what="" my="" `beauty`="" rating="" would="" be="" if="" i="" were="" in="" this="" data="" set.="" ```{r}="" #="" regression="" here="" ```="" 4.="" in="" your="" regression="" you="" ran="" in="" part="" (3),="" the="" coefficient="" on="" gender="" shows="" that="" women="" have="" on="" average,="" after="" controlling="" for="" other="" characteristics,="" lower="" evaluations="" than="" men.="" this="" has="" lead="" to="" additional="" research="" on="" the="" topic="" --="" evaluations="" are="" important="" for="" promotion="" and="" tenure="" decisions.="" add="" an="" interaction="" term="" between="" `beauty`="" and="" `gender`.="" interpret="" your="" results:="" is="" the="" marginal="" impact="" of="" beauty="" the="" same="" for="" men="" and="" women?="" are="" good-looking="" men="" treated="" differently="" from="" good-looking="" women="" by="" students="" in="" terms="" of="" their="" evaluations?="" ```{r}="" #="" regression="" here="" ```="" #="" question="" 3:="" birth="" weight="" smoking="" during="" pregnancy="" has="" been="" shown="" to="" have="" significant="" adverse="" health="" effects="" for="" new="" born="" babies.="" smoking="" is="" thought="" to="" be="" a="" preventable="" cause="" of="" low="" birth="" weight="" of="" infants="" who="" in="" turn,="" need="" more="" resources="" at="" delivery="" and="" are="" more="" likely="" to="" have="" related="" health="" problems="" in="" infancy="" and="" beyond.="" despite="" these="" concerns,="" many="" women="" still="" smoke="" during="" pregnancy.="" in="" this="" section,="" we="" analyze="" the="" relationship="" between="" birth="" weight="" and="" smoking="" behavior,="" with="" the="" emphasis="" on="" identifying="" a="" _causal_="" impact="" of="" smoking="" on="" the="" birth="" weight="" of="" newborns.="" the="" relationship="" we="" examine="" is:="" $$="" \log(\texttt{birth="" weight})_i="\beta_0" +="" \beta_1="" \texttt{smoking}_i="" +="" \eta_i="" $$="" where="" $\texttt{smoking}_i$="" will="" be="" measured="" by="" average="" cigarettes="" per="" day.="" the="" term="" $\eta_i$="" captures="" all="" of="" the="" other="" things="" that="" determine="" birth="" weight="" aside="" from="" smoking.="" ###="" baseline="" analysis.="" investigate="" the="" birth="" weight-smoking="" relationship="" and="" present="" your="" results="" in="" a="">- teachingratings # re-name as df for convenience ``` 1. the data set `df` constructed in the above code chunk contains different types of variables. use the command `str()` on the data frame `df` to answer below: (a) what type of variable is `credits`? what fraction of the data are single credit courses? (b) what type of variable is `allstudents`? what is the largest class in the data set? (c) construct a variable called `frac` that is the proportion of students in the class that filled out the evaluation. what is the average particiaption rate? 2. you can see the variable definitions by typing "?teachingratings" in the console. suppose we are interested in estimating a causal effect of `beauty` on `eval`. that is, $$ eval_i = \beta_0 + \beta_1 beauty_i + \eta_i $$ using the strategy discussed in class and in chapter 7.6, construct a regression table evaluating the causal effect of beauty on teaching evaluations. your regression table should consider several specifications, starting with the bivariate regression above and then adding more controls, possibly in groups. for each specification, state why you think its important to include for the controls you add. your answer should relate to the cia assumption. interpret your results, do you think that beauty has a causal impact on evaluations. if yes, defend your answer. if not, state why not. ```{r} # regression table here ``` 3. run a regression of `eval` on `beauty`, `gender`, `minority`, `credits`, `division`, `tenure`, `native`. consider my data: i am male, non-minority, native english speaker, teaching multiple credit courses in an upper-division and i have tenure. while i don't have a `beauty` rating, according to [ratemyprofessor.com](https://www.ratemyprofessors.com/showratings.jsp?tid=2033571), i have an evaluation of 2.3. use your regression and my information to infer my what my `beauty` rating would be if i were in this data set. ```{r} # regression here ``` 4. in your regression you ran in part (3), the coefficient on gender shows that women have on average, after controlling for other characteristics, lower evaluations than men. this has lead to additional research on the topic -- evaluations are important for promotion and tenure decisions. add an interaction term between `beauty` and `gender`. interpret your results: is the marginal impact of beauty the same for men and women? are good-looking men treated differently from good-looking women by students in terms of their evaluations? ```{r} # regression here ``` # question 3: birth weight smoking during pregnancy has been shown to have significant adverse health effects for new born babies. smoking is thought to be a preventable cause of low birth weight of infants who in turn, need more resources at delivery and are more likely to have related health problems in infancy and beyond. despite these concerns, many women still smoke during pregnancy. in this section, we analyze the relationship between birth weight and smoking behavior, with the emphasis on identifying a _causal_ impact of smoking on the birth weight of newborns. the relationship we examine is: $$ \log(\texttt{birth weight})_i = \beta_0 + \beta_1 \texttt{smoking}_i + \eta_i $$ where $\texttt{smoking}_i$ will be measured by average cigarettes per day. the term $\eta_i$ captures all of the other things that determine birth weight aside from smoking. ### baseline analysis. investigate the birth weight-smoking relationship and present your results in a table>->->->->