Deriving State-Level Estimates from Three National Surveys: A Statistical Assessment and State Tabulations. A. Accurate Direct Estimates for Every State


Ideally, the CPS, SIPP, and NHIS would be able to provide direct estimates of adequate precision for every state. Direct estimates are the standard survey design-based estimates, such as the sample mean, traditionally produced by government agencies. They are design-based, as opposed to the indirect estimates that are dependent on statistical models (Schaible et al., 1993). As discussed below, these surveys are not large enough to produce accurate direct estimates for every state.

A key factor in producing direct estimates for states is the need to select the sample from strata that respect state boundaries. When strata cross state boundaries, state estimators must either use respondents from other states to represent part of the desired state, or must make assumptions about the relationships across strata within the state. Both of these procedures are problematic. CPS and NHIS use state boundaries in defining sampling strata; however, SIPP does not use state stratification. A project is currently underway at Westat to produce a methodology that will allow the Bureau of the Census to make state estimates from all waves of SIPP and from the SPD for all states. However, except for the largest states, these estimates will be subject to potentially large variances. The methodology is based on a set of assumptions about the strata within each state, and therefore may produce significant bias in the estimates for any state, even large ones.

It should be noted that the precision of state estimates (i.e., standard errors) obtained for these surveys will vary considerably from state-to-state. This is because precision is directly proportional to the square root of the sample size in the state. Thus, estimates will be twice as precise for a state with four times the sample size (assuming the same underlying distribution in both states). While the CPS and NHIS use state stratification, the states are not all allocated the same sample size. Rather, the allocation of sample size to the states is made with the aim of balancing the precision requirements of both state and national estimates. As a result, there are great disparities in sample size by state. The March 1996 CPS interviewed almost 13,000 persons in California, but less than 1,200 in the District of Columbia. The 1993 SIPP panel has over 6,000 and barely 100 persons in the same two jurisdictions. While the 1996 SIPP panel is appreciably larger, it has similar differences. Thus, the precision of CPS estimates for California is 3.5 times greater than for DC, and for SIPP it is 7.5 times greater.

In considering the use of the CPS, NHIS, and SIPP to produce the desired estimates, it needs to be recognized that the estimates produced by the three surveys will differ. These differences are in part due to the different ways the underlying concepts such as poverty and disability are measured and partly due to the differing data measurement procedures. For example, the estimates of percent of households in poverty differ for SIPP and CPS because of the difference in the methods of data collection (SIPP by repeated interviews, CPS by annual recall), particularly for the income data (Ruggles, 1990). Kalton and Mohadjer (1994) examined the differences in disability rates under the distinct definitions used by the three surveys.

View full report


"deriving.pdf" (pdf, 2.83Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®