please get me a quote
PROBLEM SET 4 - Empirical Asset Pricing Data Overview The data �CRSP_data.csv� contains data on monthly returns for all publicly traded stocks in the United States for the period January 2010 � December 2020. It is taken from the Center for Research in Security Prices (CRSP) monthly stock �le. CRSP is the standard source for stock return information in academic �nance. It can be accessed through the Wharton Research Data Services (WRDS) online portal, which you should have access to as JHU students. There are 10 variables in the data. The variables �cusip� �permno� �permco� and �ticker� are all variables that identify the stock. They mostly accomplish the same thing, but we will use the �permco� variable because it handles mergers and acquisitions appropriately. The variable �connam� is the company name. The variable �ret� is the return for that month, reported in decimal. The variable �prc� is the stock price, and �shrout� is the total number of shares outstanding (in 1,000s). Finally, �sprtrn� is the return on the S&P 500 index. Part A - Basic Data Cleaning Before doing our empirical work, let's �rst clean the data a little. 1. The variable �ret� is the monthly return (in decimal) for each stock. However, CRSP denotes certain returns as �B� and �C� when the data is invalid for various reasons. So the �rst step is to drop those observations, and convert the variable �ret� to a nu- meric variable if it isn't already. 1 2. Next, summarize the return data. Note there are some massive outliers in the upper end of the distribution. Let's keep only observations where the monthly return is <= 100%="" (i.e.="" ret="">=><= 1.00). 3. from the �date� variable, create variables for year, month, and day for each date. 4. in order to have more easily interpretable coe�cients, multiple ret by 100 so a value of 5 means 5% (not 500%) 5. now, let's only keep su�ciently large stocks (this is not necessary, but reduces data burden and data errors). generate a variable called mkt_cap equal to shrout x prc x 1,000. next, in for each date, keep only stocks in the top 1,000 by market capitaliza- tion. 6. some stocks have multiple values for a given year and month. keep only one obser- vation for each permco in each year and month (which one you keep is not important for this problem set) 7. we want to have some level of reliability in our estimates of factor betas. to do so, we should impose a minimum number of monthly observations. this is more art than science. let's keep observations (at the permco level) with at least two years of data � that is, drop any stocks with fewer than 24 monthly return observations in the data. 8. finally, some permco values have more than 132 months of data (corresponding to the 11 years of our sample), this has to do with di�erent share classes. let's not bother with them. just drop any stock with more than 132 observations in the sam- ple. by my calculation, these cleaning steps leave us with a total of 115,953 observations. 2 part b - estimating betas 1. load the fama-french 5-factor data from the �le posted on blackboard 2. create year, month, and day variables for the ff data 3. question: what are the means and variances of the ff factors? 4. now, merge the ff data with the stock return data we created in part a (a) to be clear, in my data i have 1,270 unique �rms (permco) with a total of 115,953 5. now, for each stock, regress the stock return on the market factor (mktrf ), and store the market beta from this regression (β̂mkt,i) and the alpha αi (a) plot a histogram of market beta and alpha (b) what is the average market beta? what is the average alpha? 6. next, regress returns on the 5 fama-french factors, and store the estimated betas (call them β̂mkt,i, β̂smb,i, β̂hml,i, β̂cma,i, β̂rmw,i, note that cma stands for "conservative minus aggressive" investment, and rmw is "robust minus weak" pro�tability). (a) how does the distribution of ff5 alphas compare to the distribution of capm alphas? what about the (adjusted) r2? (notes: to compare distributions you can plot histograms, or report means/medians/standard deviations; and, most regression packages store the adjusted r2 from a regression). how would you evaluate the success of the two models? part c - fama-macbeth 3 1. now we're going to estimate fama-macbeth regressions using the capm model. first, regress monthly returns on mktrf but only for years 2010-2016 2. now, based on these estimated market betas (only use one observation for each �rm), sort �rms into 20 groups based on market beta. that is, the bottom 5% of betas should be in one group, the next 5% smallest betas in the second group, etc. 3. for each of these groups, calculate the average return within the group for each date (so for each date, you should 20 average returns, one for each group) 4. now, for the sample period 2016-2020, estimate capm beta for each of the 20 groups. that is, for each group, run a regression of average group return on mktrf. this will result in 20 estimates of market beta, one for each group. (tip: once you have calcu- lated average returns for each group, you need to (and should) keep only one obser- vation per group per date). 5. now, we want to see how well these estimates of market beta explain returns. for each date (year and month), run a regression of average group return on group mar- ket beta. that is, for each date t you estimate: r̄g,t = γ0,t + γ1,tβg,mkt + εg,t. a few things to note. there are 60 months, so you are running 60 regressions (one for each month). the subscript g denotes the �beta group�, and t denotes the date. since you have run 60 regressions, you have 60 estimates of γ̂0,t and γ̂1,t 6. if the capm is correct, what should be the average value of γ̂0,t and γ̂1,t across the 60 regressions? how do your estimates compare to the theory? 4 1.00).="" 3.="" from="" the="" �date�="" variable,="" create="" variables="" for="" year,="" month,="" and="" day="" for="" each="" date.="" 4.="" in="" order="" to="" have="" more="" easily="" interpretable="" coe�cients,="" multiple="" ret="" by="" 100="" so="" a="" value="" of="" 5="" means="" 5%="" (not="" 500%)="" 5.="" now,="" let's="" only="" keep="" su�ciently="" large="" stocks="" (this="" is="" not="" necessary,="" but="" reduces="" data="" burden="" and="" data="" errors).="" generate="" a="" variable="" called="" mkt_cap="" equal="" to="" shrout="" x="" prc="" x="" 1,000.="" next,="" in="" for="" each="" date,="" keep="" only="" stocks="" in="" the="" top="" 1,000="" by="" market="" capitaliza-="" tion.="" 6.="" some="" stocks="" have="" multiple="" values="" for="" a="" given="" year="" and="" month.="" keep="" only="" one="" obser-="" vation="" for="" each="" permco="" in="" each="" year="" and="" month="" (which="" one="" you="" keep="" is="" not="" important="" for="" this="" problem="" set)="" 7.="" we="" want="" to="" have="" some="" level="" of="" reliability="" in="" our="" estimates="" of="" factor="" betas.="" to="" do="" so,="" we="" should="" impose="" a="" minimum="" number="" of="" monthly="" observations.="" this="" is="" more="" art="" than="" science.="" let's="" keep="" observations="" (at="" the="" permco="" level)="" with="" at="" least="" two="" years="" of="" data="" �="" that="" is,="" drop="" any="" stocks="" with="" fewer="" than="" 24="" monthly="" return="" observations="" in="" the="" data.="" 8.="" finally,="" some="" permco="" values="" have="" more="" than="" 132="" months="" of="" data="" (corresponding="" to="" the="" 11="" years="" of="" our="" sample),="" this="" has="" to="" do="" with="" di�erent="" share="" classes.="" let's="" not="" bother="" with="" them.="" just="" drop="" any="" stock="" with="" more="" than="" 132="" observations="" in="" the="" sam-="" ple.="" by="" my="" calculation,="" these="" cleaning="" steps="" leave="" us="" with="" a="" total="" of="" 115,953="" observations.="" 2="" part="" b="" -="" estimating="" betas="" 1.="" load="" the="" fama-french="" 5-factor="" data="" from="" the="" �le="" posted="" on="" blackboard="" 2.="" create="" year,="" month,="" and="" day="" variables="" for="" the="" ff="" data="" 3.="" question:="" what="" are="" the="" means="" and="" variances="" of="" the="" ff="" factors?="" 4.="" now,="" merge="" the="" ff="" data="" with="" the="" stock="" return="" data="" we="" created="" in="" part="" a="" (a)="" to="" be="" clear,="" in="" my="" data="" i="" have="" 1,270="" unique="" �rms="" (permco)="" with="" a="" total="" of="" 115,953="" 5.="" now,="" for="" each="" stock,="" regress="" the="" stock="" return="" on="" the="" market="" factor="" (mktrf="" ),="" and="" store="" the="" market="" beta="" from="" this="" regression="" (β̂mkt,i)="" and="" the="" alpha="" αi="" (a)="" plot="" a="" histogram="" of="" market="" beta="" and="" alpha="" (b)="" what="" is="" the="" average="" market="" beta?="" what="" is="" the="" average="" alpha?="" 6.="" next,="" regress="" returns="" on="" the="" 5="" fama-french="" factors,="" and="" store="" the="" estimated="" betas="" (call="" them="" β̂mkt,i,="" β̂smb,i,="" β̂hml,i,="" β̂cma,i,="" β̂rmw,i,="" note="" that="" cma="" stands="" for="" "conservative="" minus="" aggressive"="" investment,="" and="" rmw="" is="" "robust="" minus="" weak"="" pro�tability).="" (a)="" how="" does="" the="" distribution="" of="" ff5="" alphas="" compare="" to="" the="" distribution="" of="" capm="" alphas?="" what="" about="" the="" (adjusted)="" r2?="" (notes:="" to="" compare="" distributions="" you="" can="" plot="" histograms,="" or="" report="" means/medians/standard="" deviations;="" and,="" most="" regression="" packages="" store="" the="" adjusted="" r2="" from="" a="" regression).="" how="" would="" you="" evaluate="" the="" success="" of="" the="" two="" models?="" part="" c="" -="" fama-macbeth="" 3="" 1.="" now="" we're="" going="" to="" estimate="" fama-macbeth="" regressions="" using="" the="" capm="" model.="" first,="" regress="" monthly="" returns="" on="" mktrf="" but="" only="" for="" years="" 2010-2016="" 2.="" now,="" based="" on="" these="" estimated="" market="" betas="" (only="" use="" one="" observation="" for="" each="" �rm),="" sort="" �rms="" into="" 20="" groups="" based="" on="" market="" beta.="" that="" is,="" the="" bottom="" 5%="" of="" betas="" should="" be="" in="" one="" group,="" the="" next="" 5%="" smallest="" betas="" in="" the="" second="" group,="" etc.="" 3.="" for="" each="" of="" these="" groups,="" calculate="" the="" average="" return="" within="" the="" group="" for="" each="" date="" (so="" for="" each="" date,="" you="" should="" 20="" average="" returns,="" one="" for="" each="" group)="" 4.="" now,="" for="" the="" sample="" period="" 2016-2020,="" estimate="" capm="" beta="" for="" each="" of="" the="" 20="" groups.="" that="" is,="" for="" each="" group,="" run="" a="" regression="" of="" average="" group="" return="" on="" mktrf.="" this="" will="" result="" in="" 20="" estimates="" of="" market="" beta,="" one="" for="" each="" group.="" (tip:="" once="" you="" have="" calcu-="" lated="" average="" returns="" for="" each="" group,="" you="" need="" to="" (and="" should)="" keep="" only="" one="" obser-="" vation="" per="" group="" per="" date).="" 5.="" now,="" we="" want="" to="" see="" how="" well="" these="" estimates="" of="" market="" beta="" explain="" returns.="" for="" each="" date="" (year="" and="" month),="" run="" a="" regression="" of="" average="" group="" return="" on="" group="" mar-="" ket="" beta.="" that="" is,="" for="" each="" date="" t="" you="" estimate:="" r̄g,t="γ0,t" +="" γ1,tβg,mkt="" +="" εg,t.="" a="" few="" things="" to="" note.="" there="" are="" 60="" months,="" so="" you="" are="" running="" 60="" regressions="" (one="" for="" each="" month).="" the="" subscript="" g="" denotes="" the="" �beta="" group�,="" and="" t="" denotes="" the="" date.="" since="" you="" have="" run="" 60="" regressions,="" you="" have="" 60="" estimates="" of="" γ̂0,t="" and="" γ̂1,t="" 6.="" if="" the="" capm="" is="" correct,="" what="" should="" be="" the="" average="" value="" of="" γ̂0,t="" and="" γ̂1,t="" across="" the="" 60="" regressions?="" how="" do="" your="" estimates="" compare="" to="" the="" theory?="">= 1.00). 3. from the �date� variable, create variables for year, month, and day for each date. 4. in order to have more easily interpretable coe�cients, multiple ret by 100 so a value of 5 means 5% (not 500%) 5. now, let's only keep su�ciently large stocks (this is not necessary, but reduces data burden and data errors). generate a variable called mkt_cap equal to shrout x prc x 1,000. next, in for each date, keep only stocks in the top 1,000 by market capitaliza- tion. 6. some stocks have multiple values for a given year and month. keep only one obser- vation for each permco in each year and month (which one you keep is not important for this problem set) 7. we want to have some level of reliability in our estimates of factor betas. to do so, we should impose a minimum number of monthly observations. this is more art than science. let's keep observations (at the permco level) with at least two years of data � that is, drop any stocks with fewer than 24 monthly return observations in the data. 8. finally, some permco values have more than 132 months of data (corresponding to the 11 years of our sample), this has to do with di�erent share classes. let's not bother with them. just drop any stock with more than 132 observations in the sam- ple. by my calculation, these cleaning steps leave us with a total of 115,953 observations. 2 part b - estimating betas 1. load the fama-french 5-factor data from the �le posted on blackboard 2. create year, month, and day variables for the ff data 3. question: what are the means and variances of the ff factors? 4. now, merge the ff data with the stock return data we created in part a (a) to be clear, in my data i have 1,270 unique �rms (permco) with a total of 115,953 5. now, for each stock, regress the stock return on the market factor (mktrf ), and store the market beta from this regression (β̂mkt,i) and the alpha αi (a) plot a histogram of market beta and alpha (b) what is the average market beta? what is the average alpha? 6. next, regress returns on the 5 fama-french factors, and store the estimated betas (call them β̂mkt,i, β̂smb,i, β̂hml,i, β̂cma,i, β̂rmw,i, note that cma stands for "conservative minus aggressive" investment, and rmw is "robust minus weak" pro�tability). (a) how does the distribution of ff5 alphas compare to the distribution of capm alphas? what about the (adjusted) r2? (notes: to compare distributions you can plot histograms, or report means/medians/standard deviations; and, most regression packages store the adjusted r2 from a regression). how would you evaluate the success of the two models? part c - fama-macbeth 3 1. now we're going to estimate fama-macbeth regressions using the capm model. first, regress monthly returns on mktrf but only for years 2010-2016 2. now, based on these estimated market betas (only use one observation for each �rm), sort �rms into 20 groups based on market beta. that is, the bottom 5% of betas should be in one group, the next 5% smallest betas in the second group, etc. 3. for each of these groups, calculate the average return within the group for each date (so for each date, you should 20 average returns, one for each group) 4. now, for the sample period 2016-2020, estimate capm beta for each of the 20 groups. that is, for each group, run a regression of average group return on mktrf. this will result in 20 estimates of market beta, one for each group. (tip: once you have calcu- lated average returns for each group, you need to (and should) keep only one obser- vation per group per date). 5. now, we want to see how well these estimates of market beta explain returns. for each date (year and month), run a regression of average group return on group mar- ket beta. that is, for each date t you estimate: r̄g,t = γ0,t + γ1,tβg,mkt + εg,t. a few things to note. there are 60 months, so you are running 60 regressions (one for each month). the subscript g denotes the �beta group�, and t denotes the date. since you have run 60 regressions, you have 60 estimates of γ̂0,t and γ̂1,t 6. if the capm is correct, what should be the average value of γ̂0,t and γ̂1,t across the 60 regressions? how do your estimates compare to the theory? 4>