Income Data for Policy Analysis: A Comparative Assessment of Eight Surveys. Replicating Deficiencies in Reported Data


Allocation methods that are based on substituting missing items from other, similar records will tend to replicate any reporting patterns. For example, rounding will be repeated in the allocated values if imputation is done by a hot deck procedure, but it will not be repeated if the imputation procedure is model-based, unless it is explicitly added afterwards. Thus we see in Table VI.16 that the patterns of rounding that were evident for reported values in the previous section are repeated in each of the five general population surveys except the NHIS, which uses model-based imputation. In the NHIS, there is no rounding in the allocated values.

These findings underscore that fact that when choosing an allocation method, data producers need to consider whether it is desirable or undesirable to replicate specific weaknesses in the reported data.

The PSID raises an additional issue with respect to the selection of an allocation method. The PSID, which allocates missing data only for selected items, does not use either hot deck or sophisticated model-based methods but relies on simpler approaches, which seem to produce substantial rounding. When the family head’s wage and salary income is allocated, 45 percent of the values are divisible by $5,000, and 44 percent are divisible by $10,000 (data not shown), but the round allocated values are not distributed across the range of allocated values. Instead nearly half of the allocated records are assigned the same value of wages and salaries:  $30,000. The only allocated value below $30,000 that is divisible by $5,000 is $15,000. There are no other round values among the allocated amounts below $30,000. Rounded values do appear above $35,000, but they are infrequent. From this distribution of allocated values it would appear that the PSID may employ two different methods of allocation, one of them being a conditional mean imputation of some sort and the other quite possibly a simple regression model. The substantial heaping at a single value suggests that the latter method is much better suited to the PSID application.


Income Source and Level of Rounding CPS ACS SIPP MEPS NHIS
Percent divisible by $5,000 29.5 19.4 1.4 12.3 0.0
Percent divisible by $10,000 17.1 11.3 0.9 6.8 0.0
Percent of income in range 83.0 86.3 94.1 84.1 78.3
Wages and Salaries
Percent divisible by $5,000 28.3 19.3 1.0 NA NA
Percent divisible by $10,000 16.4 11.2 0.6 NA NA
Percent of income in range 83.2 86.5 95.1 NA NA
Social Security
Percent divisible by $5,000 0.6 4.2 0.3 6.0 NA
Percent divisible by $10,000 0.4 1.7 0.1 2.9 NA
Percent of income in range 100.0 100.0 100.0 100.0 NA
Retirement Income
Percent divisible by $5,000 3.7 6.7 1.1 7.8 NA
Percent divisible by $10,000 2.1 3.5 0.8 3.1 NA
Percent of income in range 96.3 96.3 99.6 100.0 NA
Total Personal Income
Percent divisible by $5,000 7.4 13.7 0.2 5.0 NA
Percent divisible by $10,000 4.1 7.8 0.1 2.6 NA
Percent of income in range 90.3 88.3 96.7 88.3 NA
Total Family Income
Percent divisible by $5,000 6.0 11.2 0.2 5.7 0.0
Percent divisible by $10,000 3.3 6.4 0.1 3.1 0.0
Percent of income in range 79.7 78.6 91.3 73.0 62.9

Source:  Mathematica Policy Research, from tabulations of the 2003 CPS ASEC supplement, the 2002 ACS, the 2001 SIPP panel, the 2002 Full-year Consolidated MEPS-HC, and the 2003 NHIS. Note:Amounts reported by respondents are excluded from each source.  Family income for the NHIS is based on the NHIS family, which is the level at which such income was allocated.

View full report


"report.pdf" (pdf, 4.33Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®