Income Data for Policy Analysis: A Comparative Assessment of Eight Surveys. Quality and usability of income and poverty data


As a survey that was designed to support policy analysis over a wide range of topics, SIPP would appear to have a number of advantages over the other general population surveys in measuring income and, especially, its distribution. Consistent with this expectation, SIPP performs much better than the other surveys in identifying program participants and capturing their income. SIPP also captures more income from the bottom of the income distribution than the other general population surveys, obtaining the most total dollars from the bottom quintile and finding the fewest persons in poverty. Yet this advantage quickly fades as we move up the income ladder or broaden our examination of poverty to subpopulations. Despite finding the fewest poor, SIPP finds more persons among the near-poor (between 100 and 200 percent of poverty) than any other of the five general population surveys. SIPP also finds more poor children than CPS, ACS, or MEPS. Most importantly, the biggest difference among the five surveys with respect to aggregate income is SIPP’s capturing just 89 percent as much income as the CPS while NHIS, MEPS, and ACS capture 95 to 98 percent. SIPP fares no better on unearned versus earned income, capturing just 90 percent as much unearned income and 89 percent as much earned income as the CPS.

SIPP’s performance raises a number of methodological issues, which are discussed in the next section.

Of the five general population surveys, the CPS remains the most widely used for policy analysis. Yet several limitations of these data are apparent—some of them well known, others not. While the CPS captures the most total income among the five surveys, its greatest advantage is in the top quintile, which is the least relevant for policy analysis. SIPP captures more income from the bottom quintile and finds fewer poor. The ACS captures as much income as CPS from the bottom three quintiles and finds fewer near-poor. MEPS collects more income than CPS from the bottom four quintiles, although the MEPS numbers must be qualified because they are not independent of the CPS estimates.

ACS, SIPP, and MEPS all find more persons and a higher percentage of the population with earnings than does the CPS.68 The higher per capita earnings in the CPS suggest that the shortage of earners in the CPS may be among lower-income workers, who attract more policy interest than higher-income workers. Overall, the ACS captures more unearned income than the CPS, although the CPS estimates exceed SIPP and MEPS by more than 10 percent. Estimates of persons receiving welfare or Food Stamps, covered by SSI, or enrolled in Medicaid during 2002 are about a third lower in the CPS than SIPP. ACS estimates of welfare or Food Stamp recipients are also markedly higher than the CPS estimates, and both MEPS and NHIS estimates of SSI recipients are higher than the CPS as well. The CPS estimates of persons ever enrolled in Medicaid during 2002 are exceeded by SIPP estimates of Medicaid enrollees in a single month (December). This latter observation ties into a well-known problem with CPS estimates of the uninsured, which represent persons who reported no coverage during the prior calendar year but compare to or are exceeded by SIPP, MEPS, and NHIS estimates of persons uninsured at a point in time.

Overall, ACS income data compare favorably to CPS data in a number of respects and appear to capture more income from selected subpopulations. They have low allocation rates, and the survey itself has a very high overall response rate. In short, the ACS income data look remarkably good given that they are collected in large part through a mailback questionnaire, without the benefit of interviewers, and with a small set of questions administered to a massive sample. In view of the expectations that have been set for ACS as a source of household and family income data at the state and local areas, our findings with respect to this survey should be considered very good news.

Yet there are several important limitations of ACS data for policy analysis. The rolling reference period implies that in a time of significant change in the economy, as we are experiencing currently, estimates of employment and income obtained early and late in the survey year may differ significantly. The suppression of survey month on the public use file limits the analyst’s ability to contend with this type of problem as well as other instances of change over the year. The small number of additional variables also restricts the range of policy analyses that can be conducted with ACS data. Counting students where they attend college rather than with their families (where they usually live) will create millions of pseudo-poor. This is not yet evident in our study because college dormitories were not added to the ACS sample frame until a later year. The fact that students will be counted in their parents’ homes during the summer months but not the school year is another reason why survey month would be a useful addition to the public use file.

Post-stratification of the MEPS to the CPS poverty distribution leaves us unable to assess how much income MEPS is actually capturing and how it is distributed. With the post-stratification, MEPS has more income than the CPS between the 20th and 80th percentiles of the income distribution but less in the bottom fifth and, especially, the top fifth, but how would it look without the post-stratification? Comparison of MEPS and CPS poverty rates are even less informative, as they are affected directly by the post-stratification.

MEPS users must determine how to work with certain inconsistencies between reported employment and reported income that derive from the collection of these data in separate parts of the survey instrument coupled with a policy of not imposing consistency edits on these items.  Users whose analyses require person weights must also determine how to handle a subset of sample members, weighting up to more than six million persons, who have missing data on family members and, because of this, exceedingly high measured poverty rates. Different users will choose to handle these cases in different ways, injecting additional variation into their analytic findings beyond what can be attributed to alternative modeling decisions.  

The collection of income data in NHIS has been a low priority for NCHS historically, and restricting the amounts of total family income and personal earnings to an internal file effectively precludes the use of these income data in time-sensitive analysis. Using a single question to collect total family income (albeit not the family used in official poverty measures), NHIS obtains an aggregate amount that approaches 95 percent of the CPS total and displays a broadly similar distribution but does worst in the bottom quintile, which is the most important from a policy-analytic standpoint. The fact that a significant number of respondents report person-level earnings that sum to more than the reported total family income and that total earnings are even more likely to exceed total family income when one or both were imputed suggests that a different strategy might be more effective. Collecting unearned income for each person, to complement earned income, would yield person-level total income for every person and perhaps a more complete accounting of total family income.

Despite a weighted population that falls short of the CPS by 21 million persons, the PSID captures 4 percent more aggregate income. PSID income exceeds the CPS in every quintile, with the biggest margin, nearly 6 percent, occurring in the top quintile, where the CPS holds the greatest advantage over the other four general population surveys. PSID per capita income, which adjusts for differences in population size, exceeds CPS per capita income by 10 to 14 percent in all five quintiles. PSID also finds a higher percentage of the population with earnings than CPS, SIPP, or ACS. Were it not for the uncertainty regarding the representativeness of the PSID after 40 years, we would see these as evidence of better capture of income in the panel survey. Instead the PSID may simply over-represent higher-income families. While this does not detract from the survey’s value for longitudinal analysis, national generalizations from the data are problematic.

Surveys of restricted populations face special challenges in developing representative estimates, owing to the independent selection probabilities of spouse and partners. This was evident for estimates of aggregate income in both the HRS and MCBS, and it would affect the use of these data to develop cost estimates of legislative proposals.

The income data collected from Medicare beneficiaries in the MCBS are limited to a single dollar amount that includes a spouse’s income. For single persons the distribution is consistent with other surveys, but if aggregated, the total income effectively double-counts the incomes of spouses who are also beneficiaries. Limiting the income question to the beneficiary’s income would eliminate this double-counting. Asking separately for the incomes of other family members and obtaining family size would enable users to estimate the poverty status of beneficiaries, which is not currently possible for much of the sample. While MCBS data are not released in a public use file, potential users may apply to obtain access to the data for specified uses at their own computing facilities. Whether such uses could encompass time-sensitive policy analysis, as opposed to analyses requiring advance approval, is not clear.

Comparisons of average family income for persons 51 and older in the HRS and the CPS, ACS, and SIPP reveal substantially higher incomes in the HRS. While RAND’s construction of family income may play a role in findings for persons living with relatives other than a spouse, we found that average incomes for singles were 22 percent higher than the CPS while average incomes for persons with spouses or partners were 28 percent higher than CPS incomes for persons with spouses. Differences are very consistent across most of the income distribution but grow substantially in the top quintile. These findings would require much more study to determine whether HRS is truly capturing substantially more income than the other surveys or whether there is another explanation.

One general finding on income measurement is that the identification of self-employment income is a particularly weak area, which is reflected in widely varying estimates, with MEPS having both the lowest and highest estimates, depending on whether the estimate is based on reported income by source or type of employment. Given that self-employed persons may be the focus of policy initiatives related to health insurance and other areas, this is a glaring weakness of income data collection.

A more general area of weakness in survey income data is the comparatively high level of item non-response to income questions. A useful measure of the overall impact of item non-response is the proportion of total income that was allocated. About one-third of total income in the CPS, SIPP, and NHIS was allocated, making the quality of these data dependent on the quality of the allocation methods used to fill in the missing data.  The ACS fared markedly better with only 18 percent of total income allocated while 43 percent of total income in MEPS was allocated. Allocation rates show no trend by quintile of family income in the CPS, SIPP and NHIS, but they trend downward in ACS and upward in MEPS. The similarity of allocation patterns in SIPP and NHIS, which ask the most and fewest income questions, respectively, suggests that the level of income detail requested of respondents may have little if any impact on how much income must be “made up” to compensate for non-response.  Lastly, SIPP and MEPS are unique among the five surveys in their use of partial information to allocate missing earnings, which dominate total income. SIPP makes extensive use of data collected in prior waves while MEPS predicts earnings from reported wage rates and hours worked or allocates dollar amounts from reported ranges. In both surveys, allocations without partial information account for about 7 percent of total income.
We examined the prevalence of rounding in selected income items in the six surveys that differentiated reported and allocated amounts. Significant rounding was evident in reported earnings at the person level in the CPS, ACS, MEPS, NHIS, and PSID. Between 19 and 40 percent of the amounts below $52,500 were multiples of $5,000. Social Security income exhibited substantially less rounding than earnings in every survey. Yet even total family income, which combines amounts over persons and sources in all but NHIS, had rounded amounts for 11 to 16 percent of families in the CPS, ACS, and MEPS while NHIS had rounded amounts for 36 percent of families. Only SIPP showed no significant degree of rounding on any of the items. All annual amounts in SIPP are sums of monthly values.

View full report


"report.pdf" (pdf, 4.33Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®