DEPARTMENT OF ECONOMICSECON 4041H – RESEARCH METHODOLOGYWinter 2023, PeterboroughAssignment #1Due date: January 31, 2023Instructions: You must provide your own unique solution. You may work...

1 answer below »
All questions
Pdf version, please explain properly



DEPARTMENT OF ECONOMICS ECON 4041H – RESEARCH METHODOLOGY Winter 2023, Peterborough Assignment #1 Due date: January 31, 2023 Instructions: You must provide your own unique solution. You may work with others, but each of you is responsible for submitting your own problem set solution. Question values are listed for each question. Submit solution through SafeAssign. Ideally you will submit your RMarkdown file, preferably in pdf format. Blackboard won’t accept html files, so if submitting an html file, first zip it and submit the zipped version. But if you don’t like using RMarkdown, you may submit two files: your command file and a wordprocessor file containing results, comments and answers to questions, as well as graphs. Please bind all output together in one document file rather than submitting separate files for each question, or for each graph. Your command file will be a separate file. For questions 1–5 use the labour force survey file lfs7797.rds. For question 6, use the 2016 Census PUMF cen16.rds. 1. Some basic data descriptions of datafile lfs7797.rds [15 marks] a. number of observations in the dataframe b. number of observations for variable cowmain–class of worker c. number of missing observations for variable cowmain d. mean wage (hrlyearn) of workers of variable cowmain category: i. “Public employee” ii. “Private employee” e. mean wage (hrlyearn) of workers of variable union category: i. “Union + agreement” ii. “Agreement,no union” 2. Distribution of hrlyearn (wage rate), and uhrsmain (usual weekly hours) [15 marks] a. summary statistics: find mean, median, maximum, minimum, standard deviation of wage rate and weekly hours b. plot the densities of i. wage rate ii. log of wage rate iii. usual weekly hours iv. log of usual weekly hours 3. Generate some 2x2 tables of several variables [15 marks] ECON 4041H - Assignment 1 a. first recode the variables for educational attainment: ed76to89 and educ90, the first is for years prior to 1990, and the second is 1990 on. Recode to create one variable for both years and call it educ i. ed76to89 • “0 to 8 years” and “9-10 yrs schooling”: code as “less than high school” • “11-13 years schooling” and “Some post secondary”: “high school” • “Post secondary certificate of diploma”: “college” (note: keep spelling error) • “University degree”: “university” ii. educ90 • “0 to 8 years” and “Some secondary”: “less than high school” • “Grade 11 to 13,grad” and “Some post secondary”: “high school” • “College diploma”: “college” • “Bachelors degree” or “Graduate degree”: “university” b. now calculate the following conditional means i. mean hourly earnings by sex ii. mean hourly earnings by educational attainment iii. mean weekly hours by sex iv. mean weekly hours by educational attainment 4. Composition of labour force by year: 1977 and 1997 [15 marks] a. by sex (sex) b. by educational attainment (use variable created in previous question) c. by age (use variable age_12) Use the variable lfsstat (labour force status) to subset the labour force. Remember from macro that the labour force is composed of those employed plus those unemployed. 5. Test the central limit theorem, as we did in our demo example. You will draw repeated sam- ples of two variables hrlyearn–wages, and uhrsmain-usual weekly hours worked, saving the mean value of each sample. Then compare the means, standard deviations and distribution of the three samples to the “population” statistics. Note, the data are in a dataframe, so you must either extract each variable as a vector, or make sure you set your command for a dataframe. In order to replicate results, you will need to set a seed value. The seed value determines a starting point for the random number generator. To set your seed value, take your sid, drop the leading 0, then take the sum of the next three digits and the last three. For example, if my sid is 0123456, I would calculate my seed value as 123 + 456 = 579. Then draw the random sample following the example in the Sampling Distribution exercise. [20 marks] a. Draw a sample of 1,000 observations of wages (hrlyearn). Save the mean value. Repeat this for 2,000 repetitions. This yields 2,000 sample means. Then repeat for 5,000 observations, and again for 10,000 observations. This will give you three sets of 2,000 means. Report the mean, standard deviation, and graph the kernel density for each of these three sets. b. What do you see as you increase the sample size? Compare your results—mean, stan- dard deviation, density plot—with those of the aggregate sample. 2 ECON 4041H - Assignment 1 c. Repeat parts a. and b. above, but use the weekly hours variable (uhrsmain). 6. Use the Census 2016 PUMF (cen16.rds) to test whether the relationship between age (factor variable agegrp) and employment income (variable empin) is linear. Restrict your analysis to those in the age range from 20 to 84 years old. The variable agegrp for this range consists of 5-year age groups. Generate a numeric version of this variable and use the numeric variable rather than the factor variable where appropriate. [20 marks] a. generate a scatter plot with employment income on the y-axis and (the numeric version of) age on the x-axis. Use a subset of the census file including only 50,000 observations. The generated plot will otherwise take up a lot of space in your output file. b. generate a loess plot of employment income as a function of (the numeric version of) age. Use a subset of the census file including only 50,000 observations. This command is otherwise very slow. In specifying the loess plot command, make sure to include the option “se = FALSE”, otherwise the estimation is very slow, even on the subset. c. Run a regression of employment income on the numeric version of age. Report the results and interpret. What do they mean? d. Run a regression of employment income on original factor variable version of age. i. Report the results and interpret. What do they mean? Do they tell you anything about whether the relationship is linear? ii. Using the output from the regression above, test the significance of power terms of the age variable using the contrast() command. iii. Generate a plot of the predicted values of employment income for each level of the age factor variable. Interpret. 3
Answered Same DayFeb 07, 2023

Answer To: DEPARTMENT OF ECONOMICSECON 4041H – RESEARCH METHODOLOGYWinter 2023, PeterboroughAssignment...

Mukesh answered on Feb 07 2023
46 Votes
cen= file.choose()
cen= readRDS(cen)
summary(cen$agegrp)
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 315
76 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and over NA’s
## 13528 9139
table(cen$agegrp)
##
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 51025 21349 32783 20674
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 30833 31576 21830 59601
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 60644 62180 60799 59706
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 62484 71589 69829 59991
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 51500 36379 25653 17329
## 85 years and over
## 13528
The following code shows how to convert one categorical variable
in a data frame to a numeric variable:
library(dplyr)
##
## Attaching package: ’dplyr’
## The following objects are masked from ’package:stats’:
##
## filter, lag
## The following objects are masked from ’package:base’:
##
## intersect, setdiff, setequal, union
1
cen$age_num= unclass(cen$agegrp)
unique(cen$age_num)
## [1] 11 5 2 12 15 19 18 14 8 1 16 4 6 9 NA 20 10 3 13 7 17 21
subset= cen %>% filter(age_num > 7, age_num <= 20) %>% sample_n(50000)
summary(subset$agegrp)
## 0 to 4 years 5 to 6 years 7 to 9 years 10 to 11 years
## 0 0 0 0
## 12 to 14 years 15 to 17 years 18 to 19 years 20 to 24 years
## 0 0 0 4189
## 25 to 29 years 30 to 34 years 35 to 39 years 40 to 44 years
## 4376 4470 4376 4293
## 45 to 49 years 50 to 54 years 55 to 59 years 60 to 64 years
## 4468 5137 5055 4322
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 3649 2580 1895 1190
## 85 years and over
## 0
dim(subset)
## [1] 50000 90
dim(cen)
## [1] 930421 90
summary(subset$empin)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
## -50000 15000 37000 49236 65000...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here