Income Data for Policy Analysis: A Comparative Assessment of Eight Surveys. Comparable Universe


Even though our estimates of income focus on calendar year 2002, no two surveys among the eight provide this information for populations at the same point in time, so no two sets of estimates refer even nominally to the same universe. We did not attempt to correct for universe differences that were due to survey timing or to the ACS’s exclusion of college students living in dormitories. However, we did adjust for universe differences that arose from differential treatment of six specific subpopulations: (1) decedents, (2) persons living abroad, (3) residents of institutions, (4) active duty armed forces, (5) unrelated children under 15, and (6) exclusion of students temporarily away from home in the PSID. Specific procedures and their impact are described below, followed by a discussion of sample selection issues that we encountered in developing the estimates for MEPS. 

Universe Adjustments

Our income estimates for each survey are restricted to persons who were alive and residing in the U.S. at the time the survey was conducted and not living in an institution. For MEPS this meant that we restricted the sample to persons who were in-scope on 12/31/02. Original sample members who died or entered an institution during the year have sample weights, so it was not sufficient to restrict the MEPS estimates to persons with weights. For SIPP, we restricted our estimates to persons with December 2002 cross-sectional weights. No specific restrictions were required for CPS, ACS, or NHIS, but for PSID and MCBS we had to exclude sample members residing in institutions at the time the survey was fielded, and we also had to remove persons living in Puerto Rico (MCBS) or more generally abroad (HRS and PSID). In addition, for PSID we had to add back students who were away at school.

While four of the five general population surveys are described as representing the civilian non-institutional population, and the ACS recently added residents of institutional and non-institutional group quarters, including military barracks and college dormitories, all five surveys include some members of the armed forces on active duty living in housing units on or off base, as detailed in Chapter II. Coverage of this subpopulation differs among the surveys, however. Furthermore, neither the NHIS nor the MEPS assigns weights to sample members on active duty in the armed forces. For these reasons we have removed members of the active duty armed forces from our comparative estimates. We have also removed all members of their families—largely because of the differential coverage of armed forces members across surveys but also because the removal of the latter often took away their families’ principal source of income. Rather than misrepresent their families’ economic circumstances or attempt to add back their contributions to family income while excluding the members themselves from our estimates, we opted for this simpler solution.

The official definition of poverty excludes unrelated children under 15 because the CPS does not collect income data from such individuals. We have followed suit. Unrelated children under 15 are excluded from all of our estimates. In addition to conforming to the official definition of poverty, this decision to exclude such children from our estimate also reflects the fact that two of the surveys, the NHIS and MEPS, exclude unrelated minors from their sample frames.

As we enumerated in Chapter II, the surveys differ with respect to whether college students who are temporarily away at school are counted where they usually reside (generally at home with their families) or where they are living at the time of the interview.  While this will affect estimates of family income and poverty, it does not affect the comparability of survey universes except for the PSID and ACS.

The PSID excludes from the interviewed family any students who were away at school but, unlike NHIS or ACS, does not attempt to interview them separately.  They are not counted in the family size used to determine an annual poverty threshold, and their incomes during the reference year are excluded from family income.  Nevertheless, records for students are included with those for other family members, so it is possible to add back these students into their respective families and the total population.  We did so and increased the size of the population by about 3 million.

The ACS counts students where they live but did not begin to include college dormitories in its sample frame until 2006.  For 2002, then, college students who were living in dormitories at the time their families were interviewed are excluded from the ACS universe.  Because the ACS uses a rolling sample, the number of students who are excluded from the ACS universe will vary with the survey month. Few students will be excluded in the summer months while many will be excluded during the school year. Over the full calendar year, perhaps three-quarters of the students who attend college and live in dormitories will be excluded from our estimates from the ACS.7

The impact of the ACS exclusion of students in college dormitories, along with other residents of non-institutional group quarters, is evident in Table III.1, which reports survey population estimates, before and after the adjustment to a common universe, for the five general population surveys arrayed in chronological order by the calendar date(s) of their respective population controls. Prior to adjustment, the ACS falls short of the next highest population estimate (for SIPP, five months later) by 3.4 million persons. This difference is unchanged by the exclusion of active duty armed forces members and their families and unrelated children under 15.8 Population growth between July and December would account for about 1.2 million of the difference, based on Census Bureau estimates of the civilian non-institutional population.  This leaves 2.2 million to be attributed to the ACS group quarters exclusion.   


Population Control Date(s) 07/01/02 12/01/02 12/01/021 03/01/03 Quarterly
Survey Total Population 280,717 284,101 284,5693 285,933 286,010
Active duty armed forces and families 1,881 2,595 1,043 2,766 2,055
Unrelated children under 15 1,145 426 230 616 244
Residual Population for Comparisons 277,692 281,080 283,296 282,551 283,711

Source: Mathematica Policy Research, from tabulations of the 2002 ACS, the 2001 SIPP panel, the 2002 Full-year Consolidated MEPS-HC, the 2003 CPS ASEC supplement, and the 2003 NHIS.

1.For post-stratification to population totals, the MEPS sample and family composition were defined as of

December 31, 2002.  MEPS documentation indicates that the sample weights were controlled to population totals "derived by scaling back the population distribution obtained from the March 2003 CPS to reflect the

December 2002 estimated population distribution, employing age and sex data available from the December December 1, 2002.

2. Population controls by calendar quarter refer to February 1, May 1, August 1, and November 1 of 2003.  The

midpoint of these dates is June 15, 2003.

3. The population listed for MEPS corresponds to sample persons identified as in-scope on 12/31/02 and with a

person weight.  Armed forces members with weights add an additional 45 thousand to the population total butare defined as out-of-scope, so their weights would not have been post-stratified to the population controls. The 45 thousand weighted armed forces members are excluded from the count of excluded active duty armed

forces and families.     

In addition to what it tells us about the ACS, Table III.1 also shows that adjusting the surveys for comparable universes actually increased rather than reduced the variation in population estimates among the remaining four surveys. More specifically, while the CPS and SIPP estimates became more similar to each other, and the MEPS and NHIS estimates did the same, the disparity between the first two and the second two grew larger. Initially, the MEPS population total was 0.47 million greater than SIPP, and the NHIS population was just 0.08 million greater than the CPS. As a result of adjustment, the difference between the MEPS and SIPP populations grew to 2.2 million while the difference between the NHIS and CPS populations grew to 1.2 million.

There is a big difference between the CPS and SIPP, on the one hand, and MEPS and NHIS, on the other, in the number of active duty armed forces members and their families who were removed from their respective populations:  2.6 and 2.8 million for SIPP and CPS versus 1.0 and 2.1 million for MEPS and NHIS. This accounts for the bigger difference in population sizes after rather than before adjustment. Active duty armed forces members do not receive weights in MEPS or NHIS whereas they do receive weights in SIPP and the CPS. The MEPS sample was post-stratified to totals constructed from the CPS. If active duty armed forces members had not been removed from the constructed totals, then the initial MEPS population would have been too high, and removing the families of active duty armed forces members would not have removed enough persons. This could explain the MEPS results relative to the two Census Bureau surveys, and the same phenomenon may be at work in the NHIS as well, but we cannot confirm this in either case without more detailed information on each survey’s post-stratification than is readily available. 

Because of these population differences, particularly those between the ACS and the other four general population surveys, we must be aware that when we compare estimates of population subgroups or total dollars, a portion of the difference will be attributable to differences in population size.

Population estimates from the PSID are substantially lower than those reported in Table III.1 for the other surveys. Preliminary PSID cross-sectional weights for 2003 made available for use by the study yield a population estimate of 261.45 million after the exclusion of persons living abroad, in institutions, or in families with active duty armed forces members, and the addition of 3 million students temporarily away at school. This is 21.1 million lower than the CPS estimate even though the PSID was post-stratified to controls obtained from this same CPS file.

The shortfall can be attributed to several aspects of how the PSID weights were post-stratified. First, post-stratifying to CPS families rather than persons introduces a downward bias from the outset because CPS family weights underestimate the population by several million persons. Second, CPS unrelated subfamilies and secondary individuals were excluded from the family level controls to which the PSID was post-stratified. Given that PSID families include unmarried partners, who are counted as secondary individuals or unrelated subfamilies in the CPS, it would be appropriate to exclude a portion of these families and individuals from the controls—but no more than 38 percent.9 Third, because they include unmarried partners, PSID families are somewhat larger than CPS families, so post-stratifying to CPS families by size, with no correction for this size difference, introduces a further downward bias. Fourth, families of size three or greater were combined for post-stratification, so larger families, which are more numerous in the CPS than the PSID, are underestimated in the latter. Fifth, PSID sample members who were outside the CPS universe—specifically, living abroad, in institutions, or in military barracks—were not excluded from post-stratification. When we dropped them from the PSID sample, we reduced the estimated population even further below the CPS. It is possible, too, that the shortfall would be even greater if persons excluded from the PSID universe—students temporarily away at school—had been removed from CPS families when constructing the controls (possibly shifting some CPS families to smaller size categories). Given that PSID staff will be aware of these shortcomings as they work on revisions to the sample weights, it is likely that the 21.1 million person shortfall in the PSID will be reduced when final weights are released in early 2009.

Selection of MEPS Records

Of the 37,015 MEPS sample members who are identified as in-scope on December 31, 2002 and have nonzero person weights, 882 were in families from which one or more members had no records in the public use file. For 382 of these sample members, the missing family members included the family head.10 Despite the missing family members (and their incomes), family incomes and poverty were calculated for the family members who were present, and the resulting ratios of income to poverty thresholds were used to post-stratify the person weights to the distribution of persons by poverty class observed in the March 2003 CPS. Not surprisingly, the members of these “partial families,” as we shall term them, show exceptionally high poverty rates, which we attribute in large part to their incomplete family and income data. Weighted, the sample members from these partial families represent 6.1 million persons or 2.15 percent of the December 31, 2002 MEPS population.

We considered alternative ways to deal with the partial family members in constructing MEPS estimates for comparison with the other surveys. One strategy was to exclude the most troublesome partial families—those with missing reference persons. Another approach was to exclude all partial families. Yet another approach was to use family weights instead of person weights. With the MEPS family weights, missing sample members are not an issue. All of the families with family weights have data on all of their members, and all family members are assigned family weights, regardless of how or when they entered the sample.11

Estimates based on these alternative strategies are presented in Table III.2. By retaining all sample members from partial families we end up with a sample of 36,820 persons after dropping unrelated individuals under age 15 and families with armed forces members on active duty. Weighted, this sample represents 283.3 million persons with an aggregate income of 6,257.7 billion dollars. Excluding the partial family members from families with missing reference persons reduces the estimated population by 2.4 million persons and the estimated aggregate income by $25 billion. Excluding all persons from partial families reduces the sample count by another 499 persons, lowers the population estimate by an additional 2.7 million, and removes $74 billion from aggregate income.

Even more striking is the incidence of poverty among members of partial families. With all sample members with nonzero person weights included, the overall poverty rate is 12.48 percent. Persons in partial families have a poverty rate of 34.45 percent, however. Dropping those individuals in partial families with missing reference persons lowers the overall poverty rate to 12.12 percent. From these changes we can calculate that the poverty rate among the excluded subset of persons in families with missing reference persons is 54 percent. Dropping the remaining persons in partial families—those with a reference person—reduces the overall poverty rate to 11.99 percent.


Weight and Subsample1 Sample Persons Weighted Persons(Millions) Aggregate Income($Billions) Number Poor(Millions) Percent Poor Poverty Rate Among Members Of Partial Families Percent Of All Poor Who Are in Partial Families
Person Weight
All sample members with nonzero person weights 36,820 283.30 6,257.7 35.35 12.48 34.45 5.95
Excluding members of partial families with no data on the family reference person 36,465 280.87 6,232.9 34.04 12.12 21.612 2.34
Excluding all members of partial families 35,966 277.19 6,158.9 33.24 11.99 NA NA
Family Weight
All sample members with nonzero family weights 37,347 278.81 6,000.0 35.16 12.61 NA NA

Source: Mathematica Policy Research, from the 2002 Full-Year Consolidated MEPS HC

1.All estimates are restricted to persons who were in scope on 12/31/02.  Estimates exclude unrelated individuals under age 15 and persons in families with members of the armed forces on active duty.

2.This is the poverty rate for partial families after those with missing reference persons are excluded.  The poverty rate among members of partial families with missing reference persons is 53.98 percent.

An alternative way of dealing with the partial families is to use family weights instead of person weights. MEPS family weights are assigned only to families with complete data. Unlike the person weights, they are assigned to both original sample members and persons who joined MEPS families after the start of the panel and, for that reason, did not qualify for person weights. Family weights in general are problematic for person-level analysis. None of the surveys with which we are familiar reconciles their family and person weights, which means that population estimates obtained using family weights are not consistent with the population estimates obtained from person weights. As a rule, it appears that applying family weights to individual family members yields too few total persons. The shortfall varies substantially by survey, but in our experience the direction is always the same. This holds true even though the methods used to develop family weights vary across the surveys. The MEPS results are consistent with this experience. With the family weight the population estimate is 278.81 million or 4.5 million below the person weight total. Furthermore, aggregate income drops to $6,000.0 billion or $258 billion below its maximum value while the poverty rate rises to its highest level, 12.6 percent.

View full report


"report.pdf" (pdf, 4.33Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®