Answer To: Economics 104: Project 1Fall 2022, UCLADue Date: Oct 12, 2022 by 11:59 PM (PST)For this...
Mohd answered on Oct 12 2022
-
-
-
2022-10-12
1. Provide a descriptive analysis of your variables. This should include histograms and fitted distributions, correlation plot, boxplots, scatterplots, and statistical summaries (e.g., the five-number summary). All figures must include comments.
library(readr)
exams <- read_csv("exams.csv")
## Rows: 1000 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): gender, race/ethnicity, parent_education_level, lunch, test_prep_co...
## dbl (1): math
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(exams)
First look of data
# descriptive measures
skimr::skim(exams)
Data summary
Name
exams
Number of rows
1000
Number of columns
6
_______________________
Column type frequency:
character
5
numeric
1
________________________
Group variables
None
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
gender
0
1
4
6
0
2
0
race/ethnicity
0
1
7
7
0
5
0
parent_education_level
0
1
11
18
0
6
0
lunch
0
1
8
12
0
2
0
test_prep_course
0
1
4
9
0
2
0
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
math
0
1
66.09
15.16
0
57
66
77
100
▁▁▅▇▃
#histogram of math score
hist(exams$math, main="Histogram of math score")
#Boxplot of math score
boxplot(exams$math, main="Boxplot of Math score")
# Removing Outliers
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# using 1.5*IQR where IQR = Q3-Q1
exams<-exams%>%
filter(math>=30)
#After removing outliers
boxplot(exams$math)
hist(exams$math)
Categorical variables distribution
library(ggplot2)
# gender distribution
ggplot(data=exams, aes(x=gender)) +
geom_bar() +
labs (title = "Gender Distribution", x = "Gender", y = "Total Count")+ theme_classic()
#Race/ ethnicity distribution
ggplot(data=exams, aes(x=exams$`race/ethnicity`)) +
geom_bar() +
labs (title = "race/ethnicity Distribution", x = "race/ethnicity", y = "Total Count")+ theme_classic()
#parent Education level distribution
ggplot(data=exams, aes(x=parent_education_level)) +
geom_bar() +
labs (title = "parent_education_level Distribution", x = "parent_education_level", y = "Total Count")+ theme_classic()
#Lunch distribution
ggplot(data=exams, aes(x=lunch)) +
geom_bar() +
labs (title = "lunch Distribution", x = "lunch", y = "Total Count")+ theme_classic()
#test preparation course
ggplot(data=exams, aes(x=test_prep_course)) +
geom_bar() +
labs (title = "test_prep_course Distribution", x = "test_prep_course", y = "Total Count")+ theme_classic()
1. Estimate a multiple linear regression model that includes all the main effects only (i.e., no interactions nor higher order terms). We will use this model as a baseline. Comment on the statistical and economic significance of your estimates. Also, make sure to provide an interpretation of your estimates.
baseline_mod<-lm(math~.,data=exams)
stargazer::stargazer(baseline_mod,type = "text")
##
## ===================================================================
## Dependent variable:
## ---------------------------
## math
## -------------------------------------------------------------------
## gendermale 4.322***
## (0.808)
##
## `race/ethnicity`group B 2.685
## (1.639)
##
## `race/ethnicity`group C 2.746*
## (1.532)
##
## `race/ethnicity`group D 5.413***
## (1.563)
##
## `race/ethnicity`group E 10.033***
## (1.729)
##
## parent_education_levelbachelor's degree 2.074
## ...