The role of test scores in explaining race and gender differences in wages McKinley L. Blackburn Department of Economics, Darla Moore School of Business, University of South Carolina, Columbia, SC 29208, USA Abstract Previous research has suggested that skills reflected in test-score performance on tests such as the Armed Forces Qualification Test (AFQT) can account for some of the racial differences in average wages. I use a more complete set of test scores available with the National Longitudinal Survey of Youth 1979 Cohort to reconsider this evidence, and the results suggest a conclusion similar to earlier research. I also examine the ability of test scores to account for gender differences in wages. Women do not perform as well as men on two math-oriented tests, but they perform better on two speed-oriented tests that appear to have a strong relationship with wages. On net, the test-score difference can help account for only a small part of the gender difference in wages (for any race). Further results suggest that unexplained race and gender differences in wages have been growing over time for the 1979 cohort. # 2004 Elsevier Ltd. All rights reserved. JEL classification: J31; J71 Keywords: Salary wage differentials; Human capital 1. Introduction Individual scores on aptitude and achievement tests have frequently been used by labor economists as measures of ‘‘ability’’ in empirical wage equations. The general approach has been to consider how inclusion of the test scores as independent variables affects other coefficient estimates in the equation. The associated impact of two different wage determinants has received the most attention: schooling coefficients, and racial differences in wages. Griliches and Mason (1972) were among the first to consider the schooling issue, using a sample of military veterans for which scores on the Armed Forces Qualification Test (AFQT) were available and finding a small upward bias in the usual schooling coefficient estimate calculated ignoring test scores. The other major concern is the extent to which these measures can explain differences in wages among individuals of different races. Gwartney (1970) considered whether or not achievement differences between white and black men might explain part of the racial wage gap, although it was not until Kiker and Liles (1974) that the issue was addressed with data (also from the military) that allowed for the simultaneous control of education and AFQT effects on wages. Both studies suggested that a large proportion of the racial wage gap could be accounted for by education and AFQT differences. The use of test scores in wage equations has received renewed attention in recent years, as AFQT data have been made available in the National Longitudinal Survey of Youth1979 Cohort (NLSY). Unlike the military samples used in previous research, the NLSY data provide AFQT results for a nationally representative sample of young workers in the 1980s. O’Neill (1990) was the first to study how AFQT controls in the NLSY affected estimated black–white wage differences, and concluded that essentially all of the racial wage gap could be explained witha wage Tel.: +1-803-787-2943; fax: +1-803-777-6876. E-mail address:
[email protected] (M.L. Blackburn). 0272-7757/$ - see front matter # 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.econedurev.2004.04.005 equation that incorporated those controls. Maxwell (1994) and Neal and Johnson (1996) used alternative approaches with the NLSY and also found much (though not all) of the gap could be accounted for. The best known application of these data to explaining racial differences in economic outcomes is the analysis of Herrnstein and Murray (1994). While their findings are similar to other research using these data, their conclusions became more controversial because of their argument that AFQT scores reflected an immutable genetic difference in abilities, an argument not adopted by most other users of the data. As these recent studies have examined the impact of AFQT scores on wages, they actually ignore additional information on test-score performance available in the data. The AFQT is a function of only four of the 10 subtest scores that make up the Armed Services Vocational Aptitude Battery (ASVAB). Under the assumption of some researchers (for example, Herrnstein & Murray, 1994), these subtests all reflect a single factor (‘‘intelligence’’), so that using a function of only four of the subtests is not much different than using all 10. Of course, the subtests are not intended to all measure a single factor, and some are clearly more conceptual or abstract in nature than others. If they do measure separate factors, a less restrictive incorporation of the test scores in the wage equation may be appropriate. In the following, I use the NLSY to reanalyze the impact of the ASVAB test scores on wages, and their influence in explaining education-related and racerelated differences in wages. My results suggest that there are differences in how each of the subtests is correlated with wages, but that this fact does not alter the importance of ‘‘ability’’ explanations of racial wage differences. I also examine whether the subtests can help in explaining differences in wages between men and women in the NLSY. While there are very slight gender differences in average AFQT scores, more important differences are apparent when the individual subtests are examined. On net, however, the factors reflected in these test scores appear to play very little role in accounting for gender differences in wages. 2. Omitted-ability concerns for wage differentials based on education and race The usefulness of conventional estimates of wage equations is diminished by the potential presence of omitted factors related to the productivity of workers— usually termed unobserved ability, or just ‘‘ability.’’ The problem is one of omitted variable bias, and economic theories generally suggest that such a bias is likely to arise for many commonly considered wage determinants. A typical wage equation might be written: logðwiÞ ¼ b0 xi þ cAi þ ei ð1Þ where w is the wage rate, x is a vector of observable wage determinants, A is the recognized productivity of the worker for reasons not captured in x, and e is an error term reflecting purely random influences on the wage measure (perhaps due to luck, or due to measurement error in the wage, for example). Conventional estimates are not able to control for all productivity dimensions of the worker, and so some part of this source of wage variation remains as part of the empirical model’s error. If A is correlated withelements of x, least-squares estimates of b in these circumstances will be biased. Two areas of research in which these concerns are of paramount importance is in the estimation of the impacts of educational attainment on wages, and in the accounting for wage differentials potentially associated withlabor market discrimination on the basis of race and/or gender. In estimating schooling returns, the usual expectation is that ability and schooling will be positively correlated (although Blackburn and Neumark (1995), offer a theoretical model in which the correlation is not clear) which should cause the schooling coefficient estimate in a standard wage equation to be upward biased. In recent research, the most common approach to handling this potential bias has been to use an instrumental variable that identifies a source of variation in schooling that is unrelated to ability. The typical finding in this literature is that ignoring ability does not appear to contribute to an upward bias to the schooling coefficient estimate, with some results suggesting that a downward bias may even hold. Card (1999) provides a survey of these results, along with an argument that the instrumental variable estimates themselves may provide upward biased estimates of the ‘‘average’’ impact of education. An alternative approachto this problem is to include explicit measures of ability in the wage equation.1 Griliches and Mason (1972) use AFQT test scores to proxy for ability in an errors-in-variable estimation that instrumented for AFQT. Several later studies—including Griliches (1977) and Blackburn and Neumark (1995)—followed a similar strategy in attempting to control for ability effects in estimating schooling returns. The typical finding was that the 1 A third approach is to use data on twins, regressing the difference in wages between twins on their difference in schooling and thereby differencing out any genetic-specific ability attributes. With Ashenfelter and Kreuger (1994) and Ashenfelter and Rouse (1998) as notable exceptions, the typical finding in this line of research is that the schooling return is biased upwards in usual wage equations. 556 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 conventional schooling coefficient estimates are biased upwards, though by a small amount. Unmeasured ability factors have also played a major role in the interpretation of results in the wage discrimination literature. The primary approach has been to incorporate as many wage determinants as possible to attempt to capture as muchof the ability difference related to race/gender as possible. Again, scores on cognitive-type tests are potentially helpful in this attempt, and have proven to be quite influential in studies that concentrate on potential discrimination on the basis of race (see Kiker & Liles, 1974; O’Neill, 1990). The most controversial study to make use of the AFQT data in the NLSY is Herrnstein and Murray (1994). One of their primary arguments is that genetically based intelligence is a more important determinant of economic success than socioeconomic background. As in O’Neill, they find AFQT differences explain muchof the racial difference in economic outcomes. The study has been widely criticized by a number of authors.2 There are two ways in which I view the use of the ASVAB data differently from Herrnstein and Murray. One, I do not see these test scores as only reflecting innate intelligence (or ‘‘g’’ as it is referred to by psychologists), as many of the tests may require background knowledge or motivation to score well. Second, the skills reflected in these test scores are not likely to be solely genetically determined but rather are also affected by life experiences, and that recognition of this fact is important in comparing test scores across different individuals. Neal and Johnson (1996) use the NLSY to perform an empirical analysis similar to Herrnstein and Murray. They estimate parsimonious wage regressions that include only race, age, and AFQT as independent controls. They argue that this type of specification helps identify ‘‘premarket’’ discrimination, while the more usual type of specification controls for factors that may be affected by labor market discrimination. Like Herrnstein and Murray, they find that most of the black–white and Hispanic-white differentials for men are explained by AFQT scores, and that all of these differentials for women are explained.3 Unlike Herrnstein and Murray, they argue the skills reflected in the AFQT score may be determined by socioeconomic background, and in fact find that much of the racial difference in these scores is explained by family background and school resources. One common criticism of the use of test scores like the AFQT to account for racial differences in outcomes is that the tests are racially biased, understating the true skills for minorities relative to whites (for example, see Rodgers & Spriggs, 1996). Neal and Johnson (1996) address this issue in two ways. First, they cite an earlier study sponsored by the Department of Defense to conclude that the AFQT is an equally useful predictor of performance in the military for all races. Second, they appeal to the finding that the impact of AFQT on wages is the same, if not larger, for minorities compared to whites (which would seem unlikely if the test was not capturing ability amongst blacks). Rodgers and Spriggs counter withan analysis of the individual test components of the AFQT, showing the larger test-score effect for minorities resides in a greater importance of the verbal-type scores for minorities.4 If these differences are real, it may suggest that the labor market rewards different skills for different races (which is itself a form of discrimination), though the conclusion that the test is culturally biased does not immediately follow. In the remainder of the paper, I reanalyze the wage and test score data from the NLSY 1979 cohort. My analysis differs from earlier researchin a number of ways. First, I use adjusted test scores that remove effects from age and education at the time the test was taken. Second, I make use of all 10 components of the ASVAB, not just the four that lie behind the AFQT, and I allow eachto potentially reflect a different skill. Third, I use data from all available waves of the NLSY, rather than restrict my attention to data from one year (as is commonly done in the studies mentioned previously). Most importantly, I use the same type of analysis to examine male/female wage differences to see if skills represented in the ASVAB test scores may play a role.5 I also examine the extent to which the estimated return to education is affected by controlling for test-score effects. 3. Test score data The analysis uses a sample of individuals originally interviewed for the NLSY. The original cohort was itself a sample of individuals residing in the US in 1979 that were born between 1957 and 1964 (making them between the ages of 14 and 22 inclusive at the time of 2 See Heckman (1995) and Goldberger and Manski (1995) for reviews of the basic argument. Korenman and Winship (2000) reanalyze the NLSY data and argue that Herrnstein and Murray understate the importance of education relative to intelligence. 3 In fact, they find a positive coefficient estimate for Hispanic women (relative to white women) once AFQT is included. 4 Rodgers and Spriggs (1996) also reanalyze the wage data using an adjusted AFQT score that removes any racial differences in test scores that cannot be explained by family background or school quality data. This adjusted score explains very little of the raw wage difference between races. 5 I am not aware of any previous research that has examined male/female differences in the ASVAB test scores. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 557 the original interview). The NLSY included a representative sample of the civilian population in that age range, as well as supplemental oversamples of black, Hispanics, ‘‘economic disadvantaged’’ whites, and the military. So as to allow greater precision in the comparisons across races, I use the oversamples for blacks and Hispanics, but to avoid non-representativeness within the race samples I omit the oversamples of poor whites and the military. A primary advantage of the NLSY data is the availability of ASVAB test results. As these play such a major role in the analysis, it is perhaps useful to provide some background on the nature of these tests. The ASVAB is used by the various branches of the US military to help in making recruitment decisions and vocational assignments after recruitment. At the time when the NLSY 1979 cohort was initiated, the Department of Defense was concerned that their comparison of their recruits’ performance on the ASVAB to national norms for test performance was weakened by the necessity of using norms that dated back to World War II (and that were for men only). The Department of Defense recognized the opportunity to use the NLSY as a national sample of individuals of ages similar to their potential recruits, and in cooperation with the Department of Labor contracted with the National Opinion ResearchCenter (NORC) to administer the battery of tests under standard procedures to every NLSY respondent. As the respondents were geographically dispersed, this required administering the test in small groups (generally 5–10 respondents) throughout the US. One attractive aspect of this effort is that the NLSY was able to obtain a very high response rate for these tests—around 94% of the original sample completed the tests. This is partly due to the fact that they were able to pay the respondents $ 50 for taking the test, and that they promised (and delivered) to each individual later a mailing showing their test results and providing an associated vocational assessment.6 This latter fact provided some motivation for the respondents to take the test seriously, given that they were largely at a time in their career when an accurate vocational assessment might be useful. The ASVAB comprises 10 separate subtests, each taken separately. Eight of the 10 subtests are not time compressed, in the sense that the test design does not anticipate that the amount of time accorded those subtests should play a major role in the number that a respondent answers correctly.7 Among these eight nontime-compressed subtests, some attempt to test general conceptual and reasoning ability, while others test the respondent’s knowledge of terms and facts in specific subject areas. Two of the subtests are timed tests, in the sense that it is the ability to perform the requested actions quickly that is being tested, the actions themselves being fairly easy to perform (eachrespondent should be able to perform every action were sufficient time provided). The entire ASVAB test takes about 3 h to complete. Three of the ASVAB subtests can be thought of as testing conceptual and reasoning ability. These subtests (and the nature of the subtests) are: (1) Paragraph comprehension: the respondent is provided a brief written description of a situation, and is expected to obtain requested information from the description; (2) Arithmetic reasoning: the respondent is asked to solve simple contextual problems for which simple algebra would usually be helpful; (3) Mechanical comprehension: the respondent is asked to answer questions in which an understanding of mechanics is helpful, generally from the assessment of a diagrammatic presentation. The latter is not a ‘‘knowledge’’ test in the sense that no testing of terminology is incorporated (as is also true of the first two subtests). The knowledgebased subtests include the following five subtests: (4) Word knowledge: the respondent’s vocabulary is tested through analogy questions; (5) Mathematics knowledge: the respondent’s knowledge of algebraic and geometric principles is tested, often requiring an explicit numerical solution, but not framed in a contextual problem as in the arithmetic reasoning subtest; (6) General science: specific questions from physics, chemistry, and biology; (7) Electronics information: specific questions dealing withelectricity and electronics; (8) Auto and shop information: specific questions dealing withautomobile operation, and shop practices and tools. The two timed subtests are: (9) Numerical operations: simple addition, subtraction, multiplication, and division computations; (10) Coding speed: a series of word/code-number pairs are provided at the top of the sheet, and the respondent must match the code to the word at the bottom of the sheet. In both of the timed subtests, the test-taker is advised that the speed at which they perform the task 6 This background information is taken from the NLSY79 User’s Guide (1997). 7 There is a time limit to each of the subtests, but it is thought to be set at a level where it does not greatly affect performance at the margin. 558 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 correctly is what is being tested, in the sense that they are asked to work as fast as they can without making mistakes. The AFQT is a function of a subtest of the ASVAB subtests. The AFQT abbreviation is more widely known, bothto potential military recruits and to researchers using the NLSY. It is better known to recruits because it is the AFQT percentile score that is quite important in determining eligibility for the military. The other subtests may be used by the military for vocational assignment once the individual is recruited, but are not used in determining eligibility. The NLSY sample itself underlies the determination of the percentile assignments for the AFQT scores. The AFQT is widely known by NLSY researchers because it is the score (presumably in percentile form) that is primarily used as a measure of ‘‘ability’’ for researchers who wish to control for test scores in regression analyses. There are actually two different AFQT definitions that have been used by the military in recent years. The AFQT measure that was in use in 1980 (which I will call AFQT1) is based on raw scores on four different subtests of the ASVAB. In particular, it is calculated as the sum of the word knowledge, paragraph comprehension, arithmetic reasoning, plus one-half of the raw score on the numerical operations subtest.8 As this is the sum of raw scores, the importance of each subtest in the variability of AFQT1 will depend on the underlying variation of the raw score. The most variable raw score is numerical operations (witha standard deviation of 11.0 using the test-score weights), but the formula places a one-half weight on this score reducing its importance. Both the word knowledge and arithmetic reasoning have a similar standard deviation (around 7.5) while the paragraph comprehension is the least variable (standard deviation of 3.5). The AFQT1 seems to be similar in construction to standardized tests such as the Scholastic Aptitude Test (SAT), although the presence of the numerical operations subtest in forming the score is quite different from anything in the SAT. Starting 1 January 1989, the Department of Defense changed the definition of the measure it calls the AFQT. The most important change is that the numerical operations subtest is no longer part of the formula, while the math knowledge subtest is included. There is also some standardization of scores before they are summed together. In particular, the revised AFQT (I will call it AFQT2) is the sum of three scaled scales: the arithmetic reasoning, the math knowledge, and two times a ‘‘verbal’’ subtest. The verbal subtest is formed as a scaled version of the sum of the raw scores of the word knowledge and paragraphcomprehension scores (thereby preserving—intentionally or not—the greater importance of the word knowledge subtest given its higher variability). The scaled scores all have the same standard deviation, so multiplying the verbal subtest by 2 leads to a fairly equal weighting (in terms of underlying variation) of verbal and mathematical skills in computing the AFQT2.9 For both the AFQT1 and the AFQT2 totals, the scores were used to estimate a population distribution function for the scores, so that potential military recruits could be compared to the general population in terms of their performance on the test. With this estimated distribution function, it was also possible to provide a percentile ranking for performance for each member of the sample, and this percentile ranking (for bothAFQT definitions) is provided in the NLSY public use data. In fact, this percentile ranking is the only direct measure of AFQT performance provided by the NLSY. The raw and/or scaled scores for each subtest are provided, so it is possible to construct the AFQT1 and AFQT2 measures following the instructions discussed above. My sense is that most recent analyses using the AFQT as a control have made use of the AFQT2 construction of the variable. Unfortunately, the precise measure is not always discussed (nor the use of ‘‘Attachment 106’’ to the NLSY documentation, which is necessary to perform some of the rescaling). Herrnstein and Murray (1994) do discuss the difference in AFQT measures and appear to have redone their analyses with both measures, concluding that the choice makes little difference.10 Many studies claim to use AFQT as a control without much additional clarification, and it may be the percentile ranking that is used in these cases (as this is the only variable listed as ‘‘AFQT’’ in the public use data). While this variable will be positively correlated withthe underlying AFQT score, a percentile ranking seems like a peculiar choice for an ‘‘ability measure,’’ given that a percentile will by definition follow a uniform distribution while ability is generally thought to follow a distribution closer to the normal. A final function of the ASVAB scores has been used in previous analyses, and I will also use this measure below. As reviewed in Heckman (1995), psychometricians have historically postulated a single ability fac8 There is a slight adjustment to the numerical operations raw score that is applied before this sum is calculated, details of which are provided in ‘‘NLSY Attachment 106.’’ 9 The standard deviation of the verbal test (times two) is actually slightly higher than the standard deviation of the sum of the arithmetic reasoning and math knowledge subtests. 10 They prefer the 1989 revision of the AFQT, and subsequent analyses that have used their data (for example, Korenman & Winship, 2000) are likely to have made use of this version. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 559 tor that underlies scores in achievement tests, referring to this factor as g. This single factor is thought of as measuring intelligence, or IQ. Herrnstein and Murray (1994) note the concept of g, but use AFQT scores instead. In Cawley, Conneely, Heckman, and Vytlacil (1997), g is measured as the first principal component of the 10 ASVAB subtests, after each subtest has been standardized. As they note, each subtest has a positive weight in the first principal component, though some subtests do receive a higher weight than others. 4. Race and gender differences in test scores A fundamental part of the argument that unexplained racial differences in economic outcomes are partly due to some unique measure of ‘‘ability’’—such as the test scores in the ASVAB—is that there is an underlying racial difference in this measure of ability. In this section, I make this comparison among white, black, and Hispanic individuals in the NLSY. The comparison is among residual scores that take out the impact of age and education at the time the test is taken. I find significant evidence of racial differences, withbothblacks and Hispanics scoring lower than whites on all test measures from the ASVAB. I also explore gender differences in the test-score measures to see if there is potentially a role in explaining wage differences between men and women. In this case, exploring the subtest breakdown of the ASVAB is important to considering this potential role, as women score better on average on some tests than on others. The ASVAB tests were taken by the NLSY sample in the summer and fall of 1980. To control for effects of age and education at the time the test is taken, I use the age at the midpoint of 1980 (July 1), and education as of the 1980 interview data. I use the primary race/ ethnicity classification provided by the NLSY. This is derived from a screening interview with household members in 1978 in which questions on race and ethnicity were asked. In this designation, ‘‘black’’ refers to individuals who coded their race as black but with an ethnicity response that is non-Hispanic. ‘‘Hispanic’’ individuals were so designated if they coded their ethnicity as (1) Mexican-American; (2) Cuban; (3) Puerto Rican; (4) Latino; (5) Filipino; or, (6) Portuguese. An individual was also designated as Hispanic if they reported speaking Spanish at home as a child, or if their family surname is listed as Spanish on a Census list. This is a fairly general definition of Hispanic, but I will make use of it given its common use in previous research with the NLSY. The third racial group is ‘‘whites,’’ though it is more accurate to consider it nonblack, non-Hispanic (which would include most Asians for instance). The racial classification is mutually exclusive and exhaustive. The sample for the test analysis includes all NLSY individuals withtest score information, and witha measure of education at the time of the interview in 1980.11 The education variable is the highest grade completed as of May 1 of the survey year. The data are used in an OLS estimation of regressions of the form: TSi ¼ b1 þ b2RSi þ b3AGEi þ b4EDUC þ ui where RS is a race/gender indicator that identifies one of six race/gender groups (the reference category is white men throughout), AGE is a function of the age at the midpoint of 1980, and EDUC is a function of the completed years of education at the time of the interview in 1980. Estimates of a simple specification assuming a linear relationship of the test scores with age and education are presented in Table 1 for both AFQT measures. Specification (1) omits education, so as to provide a baseline for race/gender differences without controls (age is not really a control as it is essentially orthogonal to race and gender given the construction of the sample). Both black men and black women score lower than white men on average, by an amount roughly equal to three-quarters of a standard deviation in the AFQT. Hispanic men and women score even lower, withtheir averages more than a standard deviation below white men. By contrast, men and women of the same race have scores that are quite similar, with white women having a positive coefficient estimate (though statistically insignificant) for AFQT1. Specification (2) of Table 1 controls for education at the time of the test. It shows that part of the racial difference in test scores is related to lower schooling at the time of the test, with this factor more important for blacks than for Hispanics. However, the majority of the difference in raw test scores remains after controlling for education. Specification (2) also provides some suggestion that AFQT2 is lower for women than for men, as all three racial groups have lower expected scores for women (and the difference is statistically significant for whites and blacks). However, the size of these differences is quite small—in the order of onetenthof a standard deviation for AFQT2. There is less evidence of a gender effect for AFQT1, and as we will see this is a reflection of the different subtests that go into these alternative measures. Interestingly, once education is included, the effect of age changes sign, sug11 As mentioned previously, I exclude individuals in the supplemental samples of poor whites, as well as individuals in the military subsample. Some individuals withmissing education in 1980 were still included in the analysis if they had valid reports for education in the 1979 and 1981 interviews, and the education level did not change between those two interviews. 560 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 gesting that older individuals have lower test scores at any educational level. This is an unexpected result, and may reflect that the subtests measure some skills (including test-taking skills) that atrophy fairly quickly. It may also be the case that the linearity assumption of the age/education effects masks a more complicated relationship. The equations estimated for Table 2 incorporate dummy variables for eachpossible value for age (in years) and education. They also include dummy variables for ‘‘potential experience,’’ defined as age minus education minus 6. The first two columns of Table 2 present results from these alternative specifications, using either AFQT1 or AFQT2 as the dependent variable. Rather than present each of the individual age and education dummy variable effects, I estimate an average implied effect of increasing age (or education) by one year (this is a simple average across all the estimated dummy variables effects). The ‘‘out of school’’ effects do suggest that test scores fall significantly after the individual has been out of school at least 2 years. However, there is still a negative and statistically significant age effect even after including these controls. These alternative controls for age and education have only minor impacts on the race/gender differences. Table 2 also presents results using the first principal component—calculated from all 10 subtests—as the dependent variable. These results provide evidence of substantially lower test scores among women than among men, for all three racial groups. The estimate for white women, for example, suggests a lower average test score by one-third of a standard deviation, compared to white men. The Table 2 results imply that there are apparent differences between men and women in ASVAB subtest performance. To examine this possibility more closely, I estimated similar regressions using eachof the individual subtest scores as the dependent variable. These regressions make use of the scaled scores, so each dependent variable has a standard deviation of about 10, facilitating comparison of coefficient estimates across regressions. The independent variables in each regression are the same as in Table 2 (using age, education, and potential experience dummies). Table 3 presents results for the five subtests that play a role in the calculation of either AFQT1 or AFQT2. The results provide some interesting and unexpected results. Black and Hispanic men and women perform worse than white men on average, with the largest difference holding for the word knowledge subtest for Hispanic men and women. The smallest difference between blacks and whites is in the numerical operations score. The coefficient estimates also suggest that women perform better on the paragraph comprehension subtest than men of the same racial group, but perform worse on both mathematical subtests (especially the arithmetic reasoning subtest). Women appear to do better on the numerical operations subtest than men of the same racial group. As the definition of the AFQT changed so as to use the math knowledge rather than the numerical operations subtest, average scores for women relative to men naturally declined. These differences in subtest performance are obscured when only the aggregate AFQT measures are considered, though they may be important to wage determination. Estimated regressions for the other five scaled subtest scores in the ASVAB are provided in Table 4. On all tests, there is a difference between both minority groups and whites of the same gender. However, the suggestion of a gender effect is muchmore obvious in the results for Table 1 OLS estimates of AFQT equations Independent variable Mean (standard deviation) AFQT1 AFQT2 (1) (2) (1) (2) White female 0.26 0.50 (0.54) 0.39 (0.47) 1.40 (0.97) 2.91 (0.84) Black male 0.09 17.61 (0.86) 12.89 (0.77) 30.35 (1.47) 22.40 (1.23) Black female 0.10 18.68 (0.79) 14.06 (0.65) 34.64 (1.34) 26.84 (1.13) Hispanic male 0.15 25.50 (0.67) 22.68 (0.59) 45.05 (1.15) 38.41 (1.00) Hispanic female 0.15 22.28 (0.63) 22.49 (0.54) 43.17 (1.15) 39.98 (0.93) Age in 1980 18.8 (2.2) 1.16 (0.09) 2.58 (0.11) 1.85 (0.16) 4.43 (0.18) Highest grade completed in 1980 11.0 (1.9) 7.04 (0.13) 11.87 (0.22) R2 0.273 0.476 0.265 0.461 Dependent variable means (standard deviation) AFQT1—65.0 (22.4) AFQT2—185.0 (38.4) Note: All regressions also include a constant. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. These regressions are estimated with 9119 observations. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 561 these tests. On all three applied vocational skill subtests (mechanics, electronics, and auto and shop) white women have much lower average scores than white men, and in two cases lower average scores than black men. A similar sizeable gender effect within races is evident for blacks and Hispanics. The coding speed subtest (the other timed subtest) has a very different pattern, with women scoring around 0.4 standard deviations higher for every race. The results from Tables 3 and 4 provide a definite indication that using the AFQT aggregates may not appropriately capture the differences in test performance between racial and gender groups. Earlier research has suggested that much of the racial difference in test-score outcomes may be attributable to differences in family background, as cognitive development may be affected by resources available during childhood.12 Tables 5 and 6 add a fairly stanTable 2 OLS estimates of AFQT and principal component equations withage and education dummies Independent variable AFQT1 AFQT2 1st PC 1st PC White female 0.59 (0.45) 3.29 (0.81) 0.33 (0.02) 0.34 (0.02) Black male 12.32 (0.69) 21.57 (1.19) 0.69 (0.03) 0.67 (0.03) Black female 14.17 (0.63) 27.22 (1.09) 0.97 (0.03) 0.97 (0.03) Hispanic male 21.69 (0.58) 36.61 (0.98) 1.17 (0.03) 1.13 (0.03) Hispanic female 22.27 (0.53) 39.54 (0.91) 1.32 (0.02) 1.30 (0.02) Out of school 1 year 0.26 (0.57) 1.73 (0.99) 0.01 (0.02) Out of school 2 years 7.51 (0.87) 15.71 (1.41) 0.34 (0.04) Out of school 3 years 9.45 (1.17) 18.39 (1.86) 0.40 (0.05) Out of school 4 or more years 8.06 (1.74) 16.67 (2.74) 0.34 (0.07) Average age effect 1.79 (0.43) 2.31 (0.69) 0.10 (0.01) 0.06 (0.02) Average education effect 9.25 (0.48) 13.75 (0.73) 0.30 (0.01) 0.40 (0.02) R2 0.505 0.495 0.513 0.55 Dependent variable means (standard deviation) AFQT1—65.0 (22.4) AFQT2—185.0 (38.4) 1st principal component—0.0 (1.0) Note: All regressions also include a constant. Eight age dummies and nine education dummies are also included (in all but the third column); the average age (and education) effect is the implied average increase in the dependent variable associated with a one-unit increase in the control. The third column includes age and education measured as continuous variables. Numbers in parentheses are robust standard errors. Table 3 OLS estimates of equations for component test scores of the AFQT Independent variable Word knowledge Paragraphcomp. Arithmetic reasoning Mathknowledge Numerical operations White female 0.18 (0.22) 1.71 (0.24) 2.91 (0.24) 1.20 (0.25) 2.08 (0.24) Black male 5.54 (0.35) 5.24 (0.38) 5.71 (0.32) 4.58 (0.33) 3.83 (0.36) Black female 6.66 (0.33) 4.42 (0.35) 8.56 (0.29) 6.28 (0.31) 2.16 (0.36) Hispanic male 10.25 (0.31) 8.66 (0.33) 9.31 (0.26) 6.97 (0.26) 7.46 (0.32) Hispanic female 10.59 (0.28) 7.67 (0.30) 11.45 (0.24) 7.92 (0.25) 5.75 (0.31) Average age effect 0.95 (0.22) 0.95 (0.24) 0.16 (0.18) 0.17 (0.17) 1.02 (0.25) Average education effect 4.43 (0.25) 4.26 (0.27) 2.53 (0.20) 2.16 (0.17) 4.19 (0.29) R2 0.46 0.40 0.41 0.37 0.34 Dependent variable means (standard deviation) Word knowledge—45.6 (11.1) Paragraphcomprehension—46.2 (11.2) Arithmetic reasoning—46.4 (9.8) Mathknowledge—47.3 (9.7) Numerical operations—47.0 (10.8) Note: All regressions include a constant, eight age dummies, nine education dummies, and four dummies for years since last enrolled. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. 12 Both Neal and Johnson (1996) and Rodgers and Spriggs (1996) explore these connections. 562 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 dard set of family background controls to the specifications in Tables 3 and 4. Most of the family-background variables are obvious, but a few deserve discussion. First, being raised in a mother-only family is measured based on family structure when the respondent was age 14 (this is the only such information available in the NLSY). The NLSY provides indicators of whether or not magazines or newspapers were present in the home, and whether or not a library card was available in the home. The presence of other siblings may affect the availability of resources within the household, so the number of siblings, number of older siblings, and whether or not the respondent was the first born are included. Father’s information (education or professional occupational status) was missing for many respondents, so rather than exclude these observations the values of these variables were coded to zero and a dummy variable for the variable being missing was included. The family-background variables do appear to account for a substantial amount of the black/white differences in many of the test scores. This is especially true of the two verbal scores, and somewhat less true of the applied vocational subtests in Table 6 (though it is still the case that much of the black/white difference among men is explained). Family-background variables appear to explain much less of the Hispanic/white differences. Not surprisingly, very little of the male/female differences are explained by family background, as the average family-background measures do not tend to vary across men and women of the same gender. It is also of interest which of the family-background variables appear to be most correlated withgood test performance. Family structure (living in a mother-only situation) does not have much of an impact on testscore outcomes. A US born individual does tend to have slightly higher test scores, though mainly in the science and applied knowledge subtests. In general, having a mother with a professional occupation increases most subtests scores, but having a father with a professional occupation tends to increase subtests scores by more than the mother’s professional status. More educated parents are also beneficial, though here there is some suggestion that mother’s educational level has a larger impact than father’s.13 The presence of magazines, newspapers, and a library card also all tend to increase test scores. Individuals from larger families do tend to have lower test scores, while there is also an evident first born effect leading to higher test scores for most subtests. The results in Tables 3–6 suggest that there are some differences in the underlying characteristics reflected in the various subtests scores. Many previous researchers—most notably, Herrnstein and Murray (1994)—argue that the subtest scores basically reflect a single intelligence factor, and that the high degree of Table 4 OLS estimates of equations for additional test scores in the ASVAB Independent variable General science Mechanical comp. Electronics information Auto and shop info. Coding speed White female 4.12 (0.23) 8.08 (0.24) 7.62 (0.23) 10.92 (0.21) 4.31 (0.24) Black male 7.04 (0.36) 6.61 (0.36) 7.34 (0.36) 7.10 (0.36) 2.54 (0.33) Black female 10.25 (0.32) 12.94 (0.28) 13.53 (0.30) 15.41 (0.26) 0.41 (0.34) Hispanic male 10.75 (0.29) 11.51 (0.27) 10.86 (0.29) 12.51 (0.26) 7.11 (0.28) Hispanic female 12.90 (0.26) 15.24 (0.23) 15.52 (0.25) 17.65 (0.22) 3.83 (0.29) Average age effect 0.65 (0.21) 0.21 (0.19) 0.03 (0.20) 0.08 (0.18) 0.93 (0.22) Average education effect 3.99 (0.25) 2.63 (0.23) 2.80 (0.23) 2.79 (0.23) 4.27 (0.25) R2 0.44 0.42 0.44 0.51 0.35 Dependent variable means (standard deviation) General science—45.9 (10.6) Mechanical comprehension—46.4 (9.8) Electronics information—45.7 (10.2) Auto and shop information—45.9 (9.8) Coding speed—47.0 (10.4) Note: All regressions include a constant, eight age dummies, nine education dummies, and four dummies for years since last enrolled. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. 13 The professional occupation variables may be capturing the correlation of family income with test scores, as family income is not included as a control. The father’s occupation is then likely more important to family income than mother’s occupation. Parent’s education may also be capturing these income effects. That the mother’s education appears relatively more important (than the father’s) may reflect that a higher educational level for the mother may be more important in passing on human capital to the child, if it is more common that the mother spends more time with the child than the father. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 563 correlation among the subtests is a reflection of this.14 The upper diagonal part of Table 7 presents the simple correlation coefficients between the 10 subtests, and in many cases these are quite high (especially between mathand arithmetic reasoning, and between word knowledge and paragraph comprehension, where it is 0.82). However, other subtest score pairs have lower correlations, and the suggestion seems to be (at the least) that the two timed subtests are not as highly correlated with the other eight subtests. In the next section, I examine the ability of the separate subtest scores to explain wage variation, and to account for gender/race differences in wages. Given the high simple correlation coefficients in Table 7, th e reader may wonder whether collinearity problems will make any such estimation highly imprecise. In fact, the intercorrelation between the subtest scores does not appear to be in a problem range. First, the subtestscore indicators I include will be residual-based, taking out the impacts of age and education. Simple correlations between these residual scores are presented in the lower diagonal of Table 7, and these are in general muchlower. In any case, it is the multiple R2 between the independent variables that is relevant to the imprecision of the coefficient estimates. These are presented at the bottom of Table 7, and are generally in the range Table 5 OLS estimates of equations for component test scores of the AFQT with family background controls Independent variable Mean (standard deviation) Word know. Paragraph comp. Arith. reason Math know. Numer. oper. White female 0.26 0.20 (0.22) 1.95 (0.24) 2.74 (0.25) 0.97 (0.25) 2.31 (0.25) Black male 0.09 1.22 (0.47) 1.19 (0.50) 3.40 (0.45) 2.18 (0.45) 1.89 (0.50) Black female 0.10 2.54 (0.47) 0.28 (0.49) 6.09 (0.43) 3.73 (0.44) 0.04 (0.49) Hispanic male 0.14 7.70 (0.34) 6.54 (0.37) 7.81 (0.30) 5.04 (0.30) 6.00 (0.36) Hispanic female 0.15 7.69 (0.33) 5.09 (0.35) 9.77 (0.29) 5.68 (0.29) 3.93 (0.37) US born 0.93 0.28 (0.42) 1.30 (0.46) 0.20 (0.38) 0.59 (0.39) 0.88 (0.47) Foreign language at home 0.23 0.42 (0.36) 0.41 (0.37) 0.18 (0.35) 0.38 (0.36) 0.30 (0.38) Mother-only family 0.17 0.07 (0.28) 0.03 (0.30) 0.13 (0.24) 0.25 (0.25) 0.01 (0.31) Mother in professional occ. 0.08 0.81 (0.30) 0.45 (0.33) 0.35 (0.32) 0.59 (0.34) 0.13 (0.36) Father in professional occ. 0.18 0.81 (0.24) 0.50 (0.26) 1.57 (0.27) 2.27 (0.28) 0.51 (0.27) Magazines in home 0.57 1.93 (0.22) 1.77 (0.24) 1.42 (0.20) 1.52 (0.20) 1.26 (0.24) Newspapers in home 0.76 1.17 (0.25) 1.41 (0.28) 0.83 (0.21) 0.30 (0.22) 1.20 (0.28) Library card at home 0.71 0.91 (0.21) 0.71 (0.24) 0.70 (0.19) 0.68 (0.20) 0.63 (0.24) Mother’s education 10.81 (3.27) 0.38 (0.04) 0.33 (0.05) 0.23 (0.04) 0.28 (0.04) 0.24 (0.05) Father’s education 9.64 (5.17) 0.24 (0.03) 0.23 (0.04) 0.19 (0.03) 0.25 (0.03) 0.18 (0.04) Number of siblings 3.96 (2.61) 0.39 (0.06) 0.34 (0.06) 0.01 (0.05) 0.04 (0.05) 0.08 (0.07) Number of older siblings 2.22 (2.19) 0.24 (0.07) 0.21 (0.08) 0.06 (0.06) 0.02 (0.06) 0.10 (0.08) First born 0.22 0.85 (0.25) 0.39 (0.28) 0.58 (0.24) 0.66 (0.25) 0.44 (0.28) R2 0.520 0.444 0.446 0.425 0.355 Note: All regressions include a constant, eight age dummies, nine education dummies, four dummies for years since last enrolled, dummies for one-parent (non-mother-only) family and for ‘‘other’’ type family, and dummies for missing information on father’s occupational or educational level. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. The sample size is 7709. 14 The two AFQT measures are quite highly correlated (correlation coefficient of 0.97), which is not surprising given the overlap in the makeup of their construction. Perhaps more surprising is each of the AFQT measures has a very high correlation with a sum of the 10 scaled scores (0.96). This would seem to be consistent with the idea that there is a single intelligence factor, but the aggregation of the test scores masks the fact that the individual correlations between scores is not nearly so high. 564 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 0.50–0.70—much less than the ‘‘troublesome’’ range of above 0.90 where the problem becomes quite important.15 In any case, it is the coefficient estimates and their standard errors in the actual wage equations that will allow an assessment of whether or not it is possible to pinpoint important effects from the individual subtest scores. 5. Race and gender differentials in log wages 5.1. Wage models The previous section suggested that there are both race and gender differentials in test-score performance. In this section, I consider whether or not these differences can contribute to explaining wage differentials between the various race/gender groups. The approach is to begin witha standard log wage function withpersonal and labor market characteristics (RS is the race/ gender dummy vector, X is the vector of other characteristics) as independent variables (estimated jointly for all race\gender groups): logðwitÞ ¼ a1 þ a0 2RSit þ a3Xit þ eit with the coefficient vector for the race/gender dummies (a2) interpreted as ‘‘unexplained’’ differences in the usual model. I then add the test-score variable(s) to this specification, and examine how much of the ‘‘usual’’ unexplained difference might be attributable to the skill reflected in the test scores. This also allows me to test whether or not the race/gender differences in log wages are zero once the full set of controls are included. A number of caveats should be discussed concerning this estimation approach. While this approach is often seen in the literature attempting to measure the degree wage discrimination in the labor market, I am not trying to argue that any ‘‘unexplained’’ differences are necessarily due to discrimination. To the extent that there are productivity factors that lie behind the remaining differences, this approach will tend to overstate the size of any labor market discrimination. On the other hand, differences in wage determinants across groups may themselves be the result of discrimination. For example, blacks may have less labor market experience on average because of hiring discrimination. This lower level experience may explain part of the average difference in log wages, but my approachwould not treat that as part of an unexplained discriminatory difference. What I am attempting to ascertain is whether or not current assignment of wages to workers is associated with their race or gender, or is it that case that Table 6 OLS estimates of equations for additional test scores in the ASVAB with family background controls Independent variable General science Mechanical comp. Electronics information Auto and shop info. Coding speed White female 3.81 (0.24) 7.74 (0.25) 7.28 (0.24) 10.68 (0.22) 4.56 (0.26) Black male 3.19 (0.48) 3.61 (0.48) 4.10 (0.48) 4.26 (0.48) 0.17 (0.47) Black female 6.47 (0.46) 10.12 (0.42) 10.49 (0.44) 12.58 (0.41) 2.77 (0.49) Hispanic male 8.54 (0.32) 10.00 (0.31) 8.97 (0.33) 11.21 (0.30) 5.79 (0.32) Hispanic female 10.23 (0.31) 13.57 (0.28) 13.35 (0.29) 16.25 (0.26) 2.26 (0.34) US born 1.50 (0.42) 0.95 (0.40) 0.86 (0.40) 2.47 (0.39) 1.88 (0.45) Foreign language at home 0.17 (0.37) 0.37 (0.35) 0.32 (0.35) 0.42 (0.34) 0.06 (0.38) Mother-only family 0.57 (0.26) 0.00 (0.25) 0.37 (0.26) 0.41 (0.23) 0.27 (0.29) Mother in professional occ. 1.02 (0.33) 0.08 (0.32) 0.26 (0.32) 0.12 (0.30) 0.17 (0.36) Father in professional occ. 1.06 (0.26) 1.17 (0.27) 0.62 (0.26) 0.20 (0.24) 0.05 (0.28) Magazines in home 1.60 (0.21) 1.15 (0.19) 1.22 (0.20) 0.78 (0.18) 1.04 (0.22) Newspapers in home 0.59 (0.23) 0.54 (0.21) 0.66 (0.23) 0.57 (0.20) 0.97 (0.25) Library card at home 0.53 (0.20) 0.21 (0.20) 0.75 (0.20) 0.22 (0.18) 0.87 (0.22) Mother’s education 0.35 (0.04) 0.26 (0.04) 0.27 (0.04) 0.22 (0.04) 0.22 (0.04) Father’s education 0.22 (0.03) 0.14 (0.03) 0.14 (0.03) 0.06 (0.03) 0.10 (0.04) Number of siblings 0.30 (0.05) 0.21 (0.05) 0.30 (0.05) 0.27 (0.05) 0.12 (0.06) Number of older siblings 0.19 (0.07) 0.19 (0.06) 0.18 (0.07) 0.16 (0.06) 0.02 (0.08) First born 1.27 (0.25) 0.17 (0.24) 1.10 (0.25) 0.36 (0.23) 0.17 (0.27) R2 0.494 0.454 0.474 0.532 0.373 Note: All regressions include a constant, eight age dummies, nine education dummies, four dummies for years since last enrolled, dummies for one-parent (non-mother-only) family and for ‘‘other’’ type family, and dummies for missing information on father’s occupational or educational level. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. The sample size is 7709. 15 The R2 ’s come from a regression of eachresidual subtest score on the other nine residual subtest scores. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 565 similarly situated workers (in terms of characteristics and working situation) tend to earn the same no matter their race and gender. The empirical model is related to the familiar Oaxaca decomposition of wage differences. The Oaxaca decomposition entails estimating separate log-wage equations for eachgroup, while a single equation approachconstrains coefficients to be the same across groups. One argument for this latter restriction was suggested by Neumark (1988), in that a pooled model may better reflect how characteristics would affect wages in the absence of discrimination.16 Nonetheless, I was also considering how application of the Oaxaca decomposition affects conclusions. There are also issues related to the appropriate choice of independent variables to include in the model. As noted by Altonji and Blank (1999), many economists object to the inclusion of industry and occupation controls when accounting for gender differences, so I present some results showing how important this consideration might be. Otherwise, I primarily use a fairly extensive set of controls, including schooling, experience, tenure, region, local labor market conditions, union status, and full-time status.17 Many of these variables have raised concerns about endogeneity in previous research, most notably schooling. While there are exceptions, my own sense is that most results do not suggest a sizeable bias for the OLS schooling coefficient in either way, so that use of this coefficient estimate to account for race/gender differences is reasonable. Also, results in Blackburn and Neumark (1995) do not support an endogeneity bias in schooling or experience once test scores are included (although this result does depend on the validity of familybackground variables as instruments). Neal and Johnson (1996) have argued for a parsimonious specification to measure ‘‘premarket discrimination’’, and I will also explore the use of this approach. 5.2. Wage-related data The NLSY is a longitudinal survey, and so wage data are potentially available for eachindividual in the cohort sample at each interview. Beginning with the initial interview in 1979, respondents were asked various questions about their current labor market status, and similar questions were asked at eachsubsequent reinterview. Re-interviews were attempted on an annual basis until 1994, but only occurred every other year after that (I have data through the 2000 round). I attempt to make use of as many wage observations as possible, withsome restrictions. In particular, I follow current practice in excluding individuals enrolled in school at the time of the interview. I also exclude wage observations if the individual is currently employed as a farmer or farm manager, or as a private household service worker, or if the current industry is ‘‘agriculture, forestry, and fisheries,’’ or if the individual is self-employed. Observations withmissing observations for any of the independent variables are also excluded. The NLSY did drop some respondents from the panel in 1990, but these were only in the military and poor Table 7 Correlations between test scores and test score residuals Word Para. Arith. Math Numer Scien. Mech. Elec. Auto Coding Word 0.82 0.73 0.70 0.65 0.82 0.64 0.72 0.61 0.60 Para. 0.75 0.70 0.68 0.65 0.73 0.58 0.63 0.51 0.61 Arith. 0.65 0.61 0.82 0.63 0.74 0.70 0.68 0.60 0.55 Math 0.61 0.58 0.77 0.62 0.71 0.62 0.61 0.49 0.54 Numer 0.53 0.53 0.53 0.51 0.56 0.45 0.47 0.39 0.72 Scien. 0.76 0.64 0.66 0.62 0.44 0.71 0.77 0.68 0.51 Mech. 0.57 0.49 0.65 0.56 0.35 0.66 0.74 0.75 0.41 Elec. 0.65 0.55 0.62 0.55 0.35 0.72 0.71 0.75 0.41 Auto 0.55 0.44 0.55 0.43 0.29 0.64 0.72 0.72 0.33 Coding 0.47 0.49 0.42 0.42 0.64 0.37 0.29 0.28 0.22 Multiple R2 0.72 0.62 0.70 0.64 0.52 0.71 0.65 0.67 0.62 0.45 Note: Numbers above the diagonal are correlation coefficients between the raw scores. Numbers below the diagonal (in italics) are correlations of residues, from estimations of the specifications in Tables 3 and 4 excluding the race/gender variables. The multiple R2 is from a regression of each residual test score on the other nine residual test scores. 16 The exact Neumark decomposition could be obtained by estimating the log-wage equation without the race/gender dummies, and then regressing the residuals on the race/gender dummies to obtain the unexplained component. This approach uses across group variation in the independent variables to help in estimating a3, while my approach uses only withingroup variation in estimating these coefficients. 17 I exclude marital status, as it seem to clearly play a different role for men and women, making it difficult to see what role it could play in accounting for wage differences between men and women. 566 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 non-black, non-Hispanic subsamples, neither of which I use in my analysis. The dependent variable is based on a per-hour wage measure derived from reports on pay on the current or most recent job (the job for which most hours are currently worked for multiple-job holders).18 The data also contain information on the number of weeks worked between interviews (as well as weeks worked in the 4 years prior to the initial 1979 interview), from which an annual experience measure is calculated. Tenure withthe current employer is also available, as well as information on the unemployment rate in the local labor market. The latter variable is provided only in coded form, so I use the midpoint of the intervals to impute a roughestimate of the rate. Table 8 presents sample means and standard deviations for many of the independent variables, for the full wage-data sample to be used later. Results are presented for each race/gender group. Whites have higher average education and experience than blacks and Hispanics, withthe difference in education largest for Hispanics and the difference in experience largest for blacks. On the other hand, unionization rates are higher for both minority groups compared to whites. There is an obvious difference in geographical distribution, withmore blacks in the Southand more Hispanics in the West. Both minority groups are also more likely to live in an SMSA than whites. As with the gender effects in the test scores, differences between the groups are present, but it is not clear that they will on net contribute to explaining group differences in log wages. As there are a number of wage observations for most individuals in the sample, the implied correlation in the regression equation errors makes the usual OLS approachto estimation inappropriate. I assume a nonvarying, person-specific component to the error term, and take this into account in the estimation by using a standard one-way random effects estimator. Fixedeffects estimation is not used because many of the coefficients of interest are for variables that do not vary over time (for example, race/gender group). Year effects are handled by including dummies for the year of observation (this also obviates the need to make inflation corrections). The standard error formulas are consistent in the presence of heteroskedasticity of unspecified form. 5.3. Basic wage equation estimates The initial log-wage equation simply includes the race/gender dummies (along withthe year dummies) to estimate the raw differences in average log wages between the various groups. These results are reported in specification (1) of Table 9. White women have an average log wage that is roughly 0.23 log points lower than that for white men.19 A slightly larger difference Table 8 Means (standard deviations) of independent variables for the wage analysis, by gender/race group Variable White men White women Black men Black women Hispanic men Hispanic women Years of education 12.8 (2.3) 13.1 (2.1) 12.2 (1.9) 12.9 (1.8) 11.8 (2.3) 12.2 (2.3) Experience (in years) 8.4 (4.8) 7.6 (4.6) 7.2 (4.6) 6.7 (4.5) 7.9 (4.8) 7.0 (4.6) Tenure (in years) 3.1 (3.5) 2.8 (3.2) 2.4 (2.9) 2.7 (3.2) 2.9 (3.3) 2.7 (3.1) Age 27.2 (5.2) 27.2 (5.4) 27.4 (5.2) 28.0 (5.3) 27.0 (5.3) 27.3 (5.4) Northeast 0.18 0.18 0.16 0.14 0.16 0.13 NorthCentral 0.35 0.33 0.17 0.16 0.07 0.09 West 0.17 0.17 0.08 0.06 0.49 0.46 Unionized 0.19 0.13 0.27 0.22 0.24 0.17 Unem. rate 7.4 (3.2) 7.4 (3.2) 6.7 (2.7) 6.7 (2.8) 7.8 (3.4) 8.0 (3.5) SMSA 0.75 0.76 0.81 0.83 0.89 0.89 Full-time status 0.93 0.77 0.90 0.81 0.92 0.80 Sample size 19,719 19,420 11,576 10,922 7459 6528 Note: The pooled sample (with multiple observations per individual) is used for the calculations. 18 There are a few unreasonably low and high wage rates in the NLSY data. Following standard practice, I omit a wage observation if the reported wage is less than $1 or greater than $100. 19 In Blackburn (2002) I argue that heteroskedasticity in logwage equations can lead to a bias in interpreting regression coefficients as percentage changes, so I use the log-point interpretation instead. Lagrange multiplier tests easily reject homoskedasticity in every log-wage equation estimated in this paper. However, the error variance does not appear to be strongly correlated withthe primary variables of interest in this paper (the race/gender dummies, and the test scores), so the percentage change interpretation is likely to be roughly accurate in this case. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 567 exists between black men and white men (0.25 log points), and a muchsmaller one between Hispanic and white men (0.12). There is an additional negative impact associated withbeing female for black and Hispanic women, though this additional impact (that is, compared to men of the same race) is slightly smaller than the female/male difference among whites. The second specification in Table 9 estimates a more standard log-wage equation that includes the controls discussed earlier, but excludes industry and occupation. In this specification, about half of the original black/ white and Hispanic/white log wage differences are accounted for, but only a small portion (0.03–0.04 log points) of the gender difference for any race is explained. The estimated coefficients for the other independent variables are pretty muchas expected, withthe biggest surprise being the U-shaped relationship between log wages and age.20 The third specification adds 10 industry and eight occupation dummies, accounting for a small proportion of the female/male difference (for all three races), but almost none of the racial differences among men. The inclusion of these controls also reduces slightly the estimated coefficient for years of education. The final two specifications in Table 9 incorporate one of the two AFQT measures as an independent variable. In bothcases, the AFQT score is a residual from a regression of the AFQT scores on age, education, and potential experience dummies (as discussed in Section 4), withthe residual standardized to a standard deviation of one (in the general sample of test-takers). Inclusion of AFQT1 reduces the black male coefficient estimate substantially (such that, in total, 80% of the original difference with white men is explained). The Hispanic male coefficient estimate is also reduced, such that it is now only marginally statistically significant. While the two minority female coefficient estimates also decline as AFQT1 is added, this is by a magnitude similar to the decline in the male coefficient estimates— Table 9 Random effects estimates of log-wage equations, withand without AFQT scores Independent variable (1) (2) (3) (4) (5) White female 0.228 (0.011) 0.190 (0.008) 0.152 (0.008) 0.150 (0.008) 0.145 (0.008) Black male 0.254 (0.013) 0.133 (0.010) 0.116 (0.009) 0.048 (0.009) 0.054 (0.010) Black female 0.412 (0.013) 0.264 (0.010) 0.216 (0.009) 0.144 (0.010) 0.145 (0.010) Hispanic male 0.120 (0.015) 0.059 (0.011) 0.053 (0.011) 0.019 (0.010) 0.021 (0.011) Hispanic female 0.309 (0.015) 0.185 (0.011) 0.147 (0.011) 0.107 (0.011) 0.104 (0.011) AFQT1 0.065 (0.003) AFQT2 0.061 (0.003) Education 0.062 (0.001) 0.056 (0.001) 0.053 (0.001) 0.052 (0.001) Experience 0.043 (0.002) 0.040 (0.002) 0.038 (0.002) 0.039 (0.002) Experience squared (in hundreds) 0.010 (0.007) 0.009 (0.007) 0.006 (0.006) 0.007 (0.006) Tenure 0.044 (0.001) 0.043 (0.001) 0.043 (0.001) 0.043 (0.001) Tenure squared (in hundreds) 0.237 (0.008) 0.230 (0.008) 0.230 (0.008) 0.230 (0.008) Age 0.064 (0.004) 0.055 (0.004) 0.056 (0.004) 0.055 (0.004) Age squared (in hundreds) 0.116 (0.006) 0.100 (0.006) 0.100 (0.006) 0.100 (0.006) Northeast 0.097 (0.007) 0.103 (0.007) 0.098 (0.007) 0.097 (0.007) NorthCentral 0.003 (0.006) 0.009 (0.006) 0.003 (0.006) 0.004 (0.006) West 0.096 (0.007) 0.103 (0.007) 0.099 (0.007) 0.101 (0.006) Unionized 0.130 (0.003) 0.118 (0.003) 0.120 (0.003) 0.120 (0.003) Unemployment rate 0.005 (0.001) 0.005 (0.001) 0.005 (0.001) 0.005 (0.001) SMSA 0.056 (0.005) 0.059 (0.005) 0.060 (0.005) 0.060 (0.005) Full-time status 0.041 (0.004) 0.009 (0.004) 0.010 (0.004) 0.010 (0.004) Industry and occupation controls No No Yes Yes Yes R2 0.340 0.542 0.580 0.588 0.587 Notes: All regressions also include 18 year dummies, and a constant. Numbers in parentheses are standard errors robust to the presence of heteroskedasticity. Industry and occupation controls include 10 industry dummies and eight occupation dummies. The sample consists of 8796 individuals and 75,624 wage observations. 20 The estimated profile peaks around age 27 (the mean age in the sample). 568 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 suggesting the test scores lower the race impact for both genders by a similar magnitude. The white female coefficient estimate is essentially unaffected by the inclusion of AFQT1. Using AFQT2 instead of AFQT1 leads to a somewhat greater fall in the white female coefficient estimate, but still the suggestion is that the test-score differences explain little of the gender effect on wages. Bothtest-score measures have sizeable coefficient estimates, suggesting a one standard deviation increase in test performance is associated witha log-wage increase of about 6–7 log points. The estimated coefficient for AFQT1 is actually larger than for AFQT2, which is somewhat surprising if the revision of the measure was intended to provide a better measure of skills.21 The inclusion of the test score has only a very small impact on the estimated education coefficient, reducing it by less than 10%—a much smaller reduction than generally found from incorporating test-score measures.22 As mentioned above, the AFQT does not make full use of the information on test-score performance in the NLSY as it does not incorporate many of the subtests in the ASVAB. One measure that does use all of the subtests in a single performance measure is the first principal component of the scores (as suggested by Cawley et al., 1997).23 As withthe AFQT measures, I form the first principal component and then regress it on age, education, and potential experience dummies in 1980 in order to obtain a standardized residual. The first specification of Table 10 reports results if this measure is used in place of AFQT. The coefficient estimate for the first principal component is higher than for either of the two AFQT scores. More importantly, it accounts for more of the racial differences than the AFQT measures, and also has a substantial impact on the estimated gender effects. The second specification of Table 10 adds the second principal component as an additional regressor. The second principal component assigns positive weights to some subtests and negative weights to others, so it is not clearly a measure of ability.24 Nonetheless, it does have a positive coefficient estimate that is statistically significant, suggesting it does measure something ability-related. Adding the second principal component leaves the race/gender coefficient estimates withmagnitudes similar to when the AFQT measures are used as controls. Instead of combining the 10 subtest scores in a somewhat arbitrary fashion inherent in principal components, I estimate a log-wage model in specification (3) of Table 10 that simply incorporates all of the (standardized) subtest score values as separate controls. The results suggest that the different subtest scores do not have the same impact on wages (a test of the restriction that the coefficients on all the subtests are the same is rejected at any conventional significance level), and the final specification of Table 10 shows that the particular linear combination inherent in the first principal component is also rejected.25 Bothmathoriented subtests have positive and statistically significant coefficient estimates, while the two verbal scores are insignificant. Most of the knowledge-based subtests have small and insignificant coefficients, the exception being the auto and shop subtest. While the coding speed coefficient is significant, the most surprising result is the strong impact of the numerical operations subtest. This subtest has the largest impact on log wages—a one standard deviation increase in the numerical operations score increases log wages by 0.028 log points. It is not immediately clear what aspect of productivity this subtest may be capturing— quickness of mind, or perhaps motivation to perform well—as the actual cognitive abilities required to perform are, by design, minimal. Nonetheless, it does appear to measure some personal aspect that affects success in the labor market. Inclusion of the separate subtest scores in the logwage equation does have an impact on the race/gender group coefficient estimates. A little more of the white female difference is accounted for if the separate subtest scores are used instead of the AFQT measures, due to the poorer performance of white women on the math21 In fact, if bothAFQT scores are entered in the same regression (in results not reported here), the AFQT1 coefficient estimate is largely unaffected while the AFQT2 one becomes negative and statistically insignificant. 22 See Blackburn and Neumark (1995), for example. Most previous research has focused on men, or just white men, but a small reduction is also evident if either of these sample restrictions are imposed here. The size of the reduction continues to be small if the industry and occupation dummies are omitted. 23 The first principal component is the weighted sum of subtest scores that varies the most across individuals (under the normalization that the squared weights sum to one). 24 The second principal component is by definition uncorrelated withthe first principal component, so it must assign some positive and some negative weights to the various subtests, given that the first principal component has positive weights for all subtests. In particular, the second principal component in this case has large positive weights for the numerical operations and coding speed subtests, and small but positive weights for the components of AFQT2. Large negative weights are assigned to the mechanics, electronics, and auto and shop subtests. 25 A p-value for a test that the nine included test scores have zero coefficients when the first principal component is included rejects at any conventional significance level. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 569 oriented subtests and the auto and shop subtest. Counteracting this impact is the higher average score of white women on the numerical operations subtest. The separate subtest scores do not allow for as muchof an explanation of the gender gap as the first principal component suggests, which is likely due to the apparent overstatement of the relevance of many subtest scores (mechanical, electronics) by the first principal component. The coefficient estimate for education, however, is largely unaffected by the manner in which the test scores are incorporated in the equation. In sum, inclusion of test scores in the log-wage equation has a substantial impact on the estimated race effect on wages (for bothgenders), but appears to play only a small role in accounting for the gender differences, as about 10% of the otherwise unexplained gender difference is accounted for by these scores. This latter impact primarily arises when the subtest scores are included separately – the summary AFQT measures in particular do not reveal any role for test scores in explaining gender differences in log wages. 5.4. Alternative approaches and extensions The log-wage equations estimated for Table 10 incorporate a large number of independent factors, many of which could themselves be the result of racial or gender discrimination. Controlling for the differences in these independent factors could lead to an understatement of the true discriminatory impact. This logic led Neal and Johnson (1996) to estimate parsimonious specifications that include only clearly exogenous factors along withthe test scores. I follow a similar strategy by reporting in Table 11 estimates of log-wage equations that include only the age of the individual and the year of the observation as additional independent factors. As expected, the unexplained race/ gender differences relative to white men are larger than in Table 10 specifications, even after the test-score controls are included. As before, muchof the race-related differences are accounted for by the test scores. Again, this suggested impact is strongest when the first principal component of the subtests is used as the independent variable, but the restrictions underlying the first principal component appear unwarranted. Interestingly, the math knowledge coefficient estimate is much larger in these results (when the separate subtests are entered). One difference withthe fuller specifications is that there is less evidence that the female/male difference in log wages is accounted for by the test scores. To this point, all estimated models impose the constraint that the effect of the independent variables on log wages are the same for all race/gender groups. The literature explaining race/gender differences commonly uses a decomposition approachthat is based on separate estimated regressions for eachdemographic group. I have also estimated separate regressions for each of the race/gender groups, and selected coefficient estimates are reported in Table 12. The schooling coefficient estimate varies across groups, withblack women and both Hispanic groups receiving a lower return to education Table 10 Log-wage equations withalternative test-score controls Independent variable (1) (2) (3) (4) White female 0.124 (0.008) 0.150 (0.009) 0.135 (0.009) 0.135 (0.009) Black male 0.033 (0.010) 0.043 (0.010) 0.039 (0.010) 0.039 (0.010) Black female 0.116 (0.011) 0.141 (0.011) 0.128 (0.012) 0.128 (0.012) Hispanic male 0.009 (0.011) 0.018 (0.011) 0.014 (0.011) 0.014 (0.011) Hispanic female 0.078 (0.011) 0.107 (0.012) 0.093 (0.013) 0.093 (0.013) Years of education 0.053 (0.001) 0.052 (0.001) 0.052 (0.001) 0.052 (0.001) First principal component 0.068 (0.003) 0.065 (0.003) 0.005 (0.033) Second principal component 0.019 (0.003) Arithmetic reasoning 0.018 (0.005) 0.019 (0.007) Mathematics knowledge 0.015 (0.004) 0.015 (0.006) Word knowledge 0.004 (0.005) 0.005 (0.008) Paragraphcomprehension 0.001 (0.004) General science 0.001 (0.005) 0.002 (0.007) Mechanical comprehension 0.002 (0.004) 0.001 (0.006) Electronics information 0.008 (0.005) 0.009 (0.007) Auto and shop information 0.011 (0.005) 0.012 (0.006) Numerical operations 0.028 (0.004) 0.028 (0.005) Coding speed 0.008 (0.004) 0.009 (0.005) R2 0.588 0.589 0.590 0.590 Note: All regressions include the same additional controls as in specification (3) of Table 9. Coefficient estimates are from a random effects estimator, and the numbers in parentheses are standard errors robust to the presence of heteroskedasticity. 570 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 than the other three groups. There are some differences in the test-score effects as well, though for the most part the pattern is similar across groups. The numerical operations subtest is always important (interestingly, most important for white men) and the math subtests are also generally of importance. Rodgers and Spriggs (1996) argued that the ASVAB may be culturally biased, because the verbal subtests appeared to be important to wages for black men but not for white men. When all the subtests are included (they only used the AFQT subtest), the estimated coefficients on the verbal subtests are larger for black men than for white men, but for neither group are they statistically significant. The group for which these two subtests do seem to be important is black women, but if there is a cultural bias it is difficult to understand why that would be present for one gender and not the other. The race/gender dummy coefficients in the pooled models presented earlier are interpreted as measures of the difference in expected log wages of the particular group (white women in the example) and the reference group (white men) at any value for X, that is D wf ¼ EðlogðwÞjwf ¼ 1;XÞ EðlogðwÞj;wm ¼ 1;XÞ ¼ bwf where wf is a dummy equal to one for white women, wm is a dummy equal to one for white men, and bwf is the coefficient corresponding to the white female dummy. If separate regressions are estimated, this same expectation becomes D wf ¼ EðlogðwÞjwf ¼ 1;XÞ EðlogðwÞj;wm ¼ 1;XÞ ¼ ðawf awmÞ 0 X where awf and awm are the separate coefficient vectors for X for the two groups. This implies that the difference depends on the particular X values, and to evaluate the expectation it is necessary to specify a value for the X vector. I use three choices: one, the mean values of the variables for the overall sample; two, the mean values for the particular race/gender group; and three, the mean values for white men. The first approach could be thought of as evaluating the difference for the average sample member irrespective of race and gender. The latter two are perhaps more familiar, in that they lie behind the two alternative definitions of the ‘‘unexplained’’ component of the Oaxaca decomposition.26 The unexplained part of the wage difference is presented in the top three rows of Table 12. In general, the Table 11 Log-wage equation estimates withreduced specifications Independent variable (1) (2) (3) (4) White female 0.232 (0.011) 0.220 (0.010) 0.182 (0.011) 0.226 (0.013) Black male 0.253 (0.012) 0.123 (0.013) 0.091 (0.013) 0.118 (0.014) Black female 0.416 (0.012) 0.275 (0.013) 0.227 (0.014) 0.278 (0.016) Hispanic male 0.117 (0.015) 0.044 (0.014) 0.025 (0.014) 0.045 (0.015) Hispanic female 0.306 (0.015) 0.214 (0.014) 0.171 (0.015) 0.222 (0.017) Age 0.116 (0.004) 0.116 (0.003) 0.116 (0.003) 0.116 (0.003) Age squared (in hundreds) 0.157 (0.006) 0.155 (0.006) 0.155 (0.006) 0.155 (0.006) AFQT2 0.115 (0.004) First principal component 0.123 (0.005) Arithmetic reasoning 0.019 (0.007) Mathematics knowledge 0.060 (0.006) Word knowledge 0.012 (0.007) Paragraphcomprehension 0.005 (0.006) General science 0.003 (0.007) Mechanical comprehension 0.002 (0.006) Electronics information 0.005 (0.006) Auto and shop information 0.013 (0.006) Numerical operations 0.042 (0.005) Coding speed 0.015 (0.005) R2 0.362 0.387 0.388 0.394 Note: All regressions include 18 year dummies and a constant. Coefficient estimates are from a random effects estimator, and the numbers in parentheses are standard errors robust to the presence of heteroskedasticity. 26 The exact decomposition does not hold here, because (unlike OLS) random effects estimators do not force the regression line through the sample means. Standard errors are calculated using the delta method, assuming the two coefficient vector estimates are independent and treating the sample mean of X as nonstochastic. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 571 level of the unexplained difference is quite similar to that in the pooled regressions with similar controls (the third specification in Table 10). This is especially true when the means for the total sample are used to evaluate the unexplained difference, with only black women having an unexplained difference that varies much from the Table 10 result. The use of other mean vectors does affect the predicted magnitude for some of the groups. White women have a somewhat larger unexplained difference when evaluated at the average white male characteristics than at their own average characteristics. The black male unexplained difference is also sensitive to the choice of the mean vector, with the estimate essentially zero if evaluated at the white male means (the same is true for Hispanic men). These sensitivities often arise in Oaxaca decompositions, though in this case the basic nature of the results is largely unaffected by allowing separate regressions coefficients. The NLSY has a number of family-background variables that might also be related to the development of unobserved ability of the individual. Blackburn and Neumark (1995) assumed that the family-background variables should be omitted from a log-wage regression, with the argument that the impact of these variables on the cognitive development of the child should be reflected through the test scores.27 However, others have argued they should be included as separate controls in the regression. For example, Kiker and Heath(1985) use family-background measures in the Panel Study of Income Dynamics as determinants in wage equations and find that they help in explaining wage differences between black and white men. To study if family-background variables can account for the race/gender wage differences, I include the familybackground measures discussed earlier as controls in a regression estimated withthe pooled data. Results from four specifications are reported in Table 13. If the test scores are excluded, inclusion of family-background variables can help to account for some of the unexplained differences for all groups except white women.28 Even if the test scores are included, addition Table 12 Log-wage equation estimates for individual race/gender groups Independent variable Parameter estimates for race/gender group White male White female Black male Black female Hispanic male Hispanic female Unexplained difference at total means 0.144 (0.012) 0.039 (0.013) 0.145 (0.017) 0.013 (0.015) 0.096 (0.019) Unexplained difference at own group means 0.134 (0.013) 0.053 (0.013) 0.138 (0.017) 0.015 (0.013) 0.104 (0.017) Unexplained difference at white male means 0.154 (0.015) 0.000 (0.016) 0.167 (0.028) 0.001 (0.016) 0.098 (0.032) Years of education 0.057 (0.002) 0.058 (0.002) 0.062 (0.003) 0.045 (0.003) 0.036 (0.004) 0.045 (0.003) Arithmetic reasoning 0.010 (0.010) 0.027 (0.009) 0.019 (0.012) 0.015 (0.012) 0.018 (0.017) 0.008 (0.016) Mathematics knowledge 0.017 (0.009) 0.021 (0.008) 0.013 (0.011) 0.023 (0.011) 0.029 (0.015) 0.003 (0.015) Word knowledge 0.007 (0.011) 0.001 (0.011) 0.008 (0.011) 0.017 (0.011) 0.005 (0.017) 0.006 (0.015) Paragraph comprehension 0.011 (0.009) 0.009 (0.009) 0.001 (0.010) 0.020 (0.009) 0.011 (0.015) 0.013 (0.013) General science 0.001 (0.010) 0.003 (0.009) 0.005 (0.011) 0.005 (0.011) 0.001 (0.016) 0.026 (0.015) Mechanical comprehension 0.005 (0.009) 0.007 (0.009) 0.010 (0.011) 0.006 (0.012) 0.010 (0.014) 0.026 (0.016) Electronics information 0.010 (0.010) 0.005 (0.009) 0.025 (0.011) 0.008 (0.011) 0.019 (0.016) 0.009 (0.015) Auto and shop information 0.015 (0.009) 0.006 (0.010) 0.039 (0.011) 0.005 (0.013) 0.014 (0.015) 0.019 (0.016) Numerical operations 0.044 (0.009) 0.019 (0.008) 0.021 (0.009) 0.019 (0.008) 0.029 (0.013) 0.025 (0.011) Coding speed 0.013 (0.008) 0.022 (0.007) 0.005 (0.009) 0.002 (0.008) 0.028 (0.014) 0.010 (0.011) R2 0.594 0.592 0.558 0.587 0.543 0.569 Notes: These are random effects estimates. The regressions also includes the same controls as in specification (3) of Table 9. Th e numbers in parentheses are standard errors robust to the presence of heteroskedasticity. The ‘‘unexplained differences’’ are the predicted difference in log wages for the cited group relative to white male’s, evaluated at the specified mean values. 27 The family-background variables were used as instruments for schooling, experience, and the ASVAB results. 28 A Wald test for the coefficients on the 16 family-background variables all being equal to zero provides a v2 statistic of 185, which has a p-value less than 0.0001. A v2 of 121 is obtained for this test when the test-score variables are also included as controls (column 4 of Table 13), again rejecting the null of no family-background effects. 572 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 of the family-background variables still provides some additional explanatory power for minority group differences with white men. In fact, these results strongly suggest that the group with the largest unexplained difference is white women, with the minority race effect actually leading to higher wages among women. The most important family-background variables appear to be the parent’s occupational status and education, as well as having magazines and library cards available at home. All of the previous analyses restricted the coefficient estimates to be the same over the entire 21 year history of the NLSY. The underlying models are not likely to have been stable over this time period. The increase in the coefficient on education in the 1980s for more general samples is well-known, and the suggestion has been that the race/gender gaps have not been stable either (see Altonji & Blank, 1999, for example). It may also be the case that there are ‘‘age effects’’ to some of the variables, that is, some variables may naturally become more or less important as a cohort ages.29 To explore whether either of these issues are relevant, I estimate separate models for four different time periods. Table 14 presents the results of these estimations (without family-background variables) for pooledsample models, bothwithand without test-score controls. The results do suggest a growing education coefficient (which is never much affected by the test scores), although whether or not this is associated with a period effect or an age effect is not clear. Including the test scores has a similar impact on the unexplained group differences as in the single model; in fact, the separate models by time period actually suggest that more of the white female difference may be due to test scores than in the earlier tables. These results also suggest that the unexplained differences are either growing over time, or following a natural tendency to increase withage. For example, clear evidence of lower wages for black men (relative to white men) after test-score controls is only evident in the 1990s, and for Hispanic men only in the late 1990s. While the test-score coefficient estimates are not reported in Table 14, the results are fairly similar across the time periods, with the only clear pattern being that the importance of the math-related subtests fall over time. The numerical operations subtest continues to be strongly significant through the 1990s. Table 13 Log-wage equation estimates withfamily background controls Independent variable (1) (2) (3) (4) White female 0.150 (0.008) 0.149 (0.008) 0.134 (0.011) 0.135 (0.010) Black male 0.113 (0.010) 0.085 (0.010) 0.036 (0.011) 0.021 (0.011) Black female 0.214 (0.010) 0.185 (0.011) 0.128 (0.013) 0.114 (0.013) Hispanic male 0.058 (0.012) 0.030 (0.015) 0.020 (0.012) 0.007 (0.015) Hispanic female 0.150 (0.012) 0.118 (0.015) 0.096 (0.014) 0.082 (0.016) Years of education 0.058 (0.001) 0.052 (0.001) 0.054 (0.001) 0.049 (0.001) US born 0.052 (0.013) 0.054 (0.013) Foreign language at home 0.004 (0.011) 0.004 (0.011) Mother-only family 0.012 (0.009) 0.012 (0.009) Mother in professional occ. 0.011 (0.011) 0.010 (0.011) Father in professional occ. 0.038 (0.009) 0.034 (0.009) Magazines in home 0.027 (0.007) 0.020 (0.007) Newspapers in home 0.009 (0.008) 0.004 (0.008) Library card at home 0.029 (0.007) 0.027 (0.007) Mother’s education 0.003 (0.001) 0.002 (0.001) Father’s education 0.004 (0.001) 0.003 (0.001) Number of siblings 0.002 (0.002) 0.002 (0.002) Number of older siblings 0.003 (0.002) 0.003 (0.002) First born 0.005 (0.008) 0.001 (0.008) Test scores Included? No No Yes Yes R2 0.584 0.588 0.593 0.596 Note: All regressions include the same additional controls as in specification (3) of Table 9, as well as dummy variables for one-parent family upbringing, and ‘‘other’’ upbringing (the reference is father–mother family), as well as dummies for missing information on father’s education or profession. Coefficient estimates are from a random effects estimator, and the numbers in parentheses are standard errors robust to the presence of heteroskedasticity. The sample size for all regression is 7437 individuals and 63,927 wage observations. 29 To separate period effects from age effects in the structure of the log-wage model would be a difficult task with a single cohort, so I must consider either as a possible explanation for changes in this structure. M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 573 Table 14 Log-wage equation estimates, by time periods Independent variable 1979–1984 1985–1989 1990–1994 1996–2000 Without test scores With test scores Without test scores With test scores Without test scores Withtest scores Without test scores Withtest scores White female 0.119 (0.010) 0.090 (0.012) 0.153 (0.010) 0.126 (0.012) 0.179 (0.011) 0.166 (0.014) 0.193 (0.016) 0.168 (0.019) Black male 0.088 (0.011) 0.023 (0.013) 0.103 (0.011) 0.022 (0.013) 0.143 (0.013) 0.070 (0.015) 0.135 (0.017) 0.059 (0.019) Black female 0.171 (0.013) 0.093 (0.015) 0.225 (0.012) 0.130 (0.015) 0.246 (0.014) 0.166 (0.017) 0.231 (0.018) 0.138 (0.023) Hispanic male 0.036 (0.013) 0.003 (0.013) 0.036 (0.013) 0.001 (0.013) 0.051 (0.015) 0.018 (0.016) 0.089 (0.020) 0.053 (0.021) Hispanic female 0.140 (0.014) 0.084 (0.016) 0.153 (0.014) 0.091 (0.016) 0.165 (0.016) 0.120 (0.019) 0.149 (0.022) 0.091 (0.025) Years of education 0.033 (0.002) 0.033 (0.002) 0.055 (0.002) 0.051 (0.002) 0.068 (0.002) 0.063 (0.002) 0.069 (0.003) 0.064 (0.003) Sample size 18,567 18,567 25,818 25,818 23,031 23,031 8208 8208 R2 0.383 0.395 0.449 0.462 0.450 0.461 0.490 0.501 Note: There are random effects estimates. The regressions also include the same controls as in specification (3) of Table 10. The numbers in parentheses are standard errors robust to the presence of heteroskedasticity. 574 M.L. Blackburn / Economics of Education Review 23 (2004) 555–576 6. Conclusions Previous researchhas used test scores as measures of ability in wage equations in order to study the impact of these controls on estimates of educationrelated and race-related wage differentials. This paper has made use of the ASVAB test scores in the NLSY to study these relationships among workers early in their career, and also considers whether or not test scores can help in explaining female/male wage differentials. It adds to previous researchby considering the impact of the individual components of the ASVAB, and using the full survey (rather than a single year). Many of the findings are similar to earlier research with these data, most notably that differences in average wages between blacks and whites, and between Hispanics and whites, are partly explained by factors reflected in test scores. Also, I find that the impact of education on wages may be overstated in regressions without test scores, though the degree of this overstatement is quite slight (and less than that suggested in earlier research). The role that test scores might play in explaining gender differences in wages has not been considered previously. The possibility that there may be a connection primarily lies in an examination of the individual ASVAB components, rather than the aggregated AFQT scores primarily used in earlier research. The results do suggest that women perform worse than men on mathknowledge and arithmetic reasoning subtests, bothof whichtend to be positively correlated with wages. Women perform better on the paragraph comprehension subtest, but this skill does not appear to have a reward in the labor market (other than for black women). Yet, overall, test scores explain only a small portion of the female/male difference (especially for whites), because women perform better on a timed ‘‘numerical operations’’ subtest that appears to have an important correlation withwages. Results for separate time periods suggest that both race and gender differences grow over time for this cohort of workers, even when test scores are included as controls. Appendix A. Squared residual regression estimates Independent variable Estimate (standard error) White female 0.007 (0.003) Black male 0.001 (0.003) Black female 0.011 (0.004) Hispanic male 0.008 (0.003) Hispanic female 0.002 (0.004) Years of education 0.0040 (0.0004) Arithmetic reasoning 0.001 (0.001) Mathematics knowledge 0.002 (0.001) Word knowledge 0.000 (0.002) Paragraphcomprehension 0.000 (0.001) General science 0.001 (0.002) Mechanical comprehension 0.001 (0.001) Electronics information 0.002 (0.001) Auto and shop information 0.001 (0.001) Numerical operations 0.001 (0.001) Coding speed 0.001 (0.001) Note: These are random effects estimates, using the squared residuals (divided by 2) from specification (3) of Table 10 as the dependent variable. The squared residual regression also includes the other independent variables in that specification. The standard errors are robust