
A. Accurate Direct Estimates for Every State

Ideally, the CPS, SIPP, and NHIS would be able to provide direct estimates of adequate precision for every state. Direct estimates are the standard survey designbased estimates, such as the sample mean, traditionally produced by government agencies. They are designbased, as opposed to the indirect estimates that are dependent on statistical models (Schaible et al., 1993). As discussed below, these surveys are not large enough to produce accurate direct estimates for every state.
A key factor in producing direct estimates for states is the need to select the sample from strata that respect state boundaries. When strata cross state boundaries, state estimators must either use respondents from other states to represent part of the desired state, or must make assumptions about the relationships across strata within the state. Both of these procedures are problematic. CPS and NHIS use state boundaries in defining sampling strata; however, SIPP does not use state stratification. A project is currently underway at Westat to produce a methodology that will allow the Bureau of the Census to make state estimates from all waves of SIPP and from the SPD for all states. However, except for the largest states, these estimates will be subject to potentially large variances. The methodology is based on a set of assumptions about the strata within each state, and therefore may produce significant bias in the estimates for any state, even large ones.
It should be noted that the precision of state estimates (i.e., standard errors) obtained for these surveys will vary considerably from statetostate. This is because precision is directly proportional to the square root of the sample size in the state. Thus, estimates will be twice as precise for a state with four times the sample size (assuming the same underlying distribution in both states). While the CPS and NHIS use state stratification, the states are not all allocated the same sample size. Rather, the allocation of sample size to the states is made with the aim of balancing the precision requirements of both state and national estimates. As a result, there are great disparities in sample size by state. The March 1996 CPS interviewed almost 13,000 persons in California, but less than 1,200 in the District of Columbia. The 1993 SIPP panel has over 6,000 and barely 100 persons in the same two jurisdictions. While the 1996 SIPP panel is appreciably larger, it has similar differences. Thus, the precision of CPS estimates for California is 3.5 times greater than for DC, and for SIPP it is 7.5 times greater.
In considering the use of the CPS, NHIS, and SIPP to produce the desired estimates, it needs to be recognized that the estimates produced by the three surveys will differ. These differences are in part due to the different ways the underlying concepts such as poverty and disability are measured and partly due to the differing data measurement procedures. For example, the estimates of percent of households in poverty differ for SIPP and CPS because of the difference in the methods of data collection (SIPP by repeated interviews, CPS by annual recall), particularly for the income data (Ruggles, 1990). Kalton and Mohadjer (1994) examined the differences in disability rates under the distinct definitions used by the three surveys.


B. Precision of Estimates

It is impossible to define a single level of precision that is necessary for all estimates. The level of precision that is necessary depends on the use of the estimates. Different Federal agencies have different standards for their data. Some have standards that only determine the level of precision for estimates to be used in analyses, while others have standards for precision for publication. For example, the National Center for Health Statistics has a requirement that coefficients of variation (the standard error of an estimate divided by the mean) not exceed 30 percent. The Center reports and interprets the estimates that have at least this level of precision. Less precise estimates may be reported but are not interpreted.
The precision of a direct estimator is a function of two parameters, the standard deviation of the population distribution and the effective sample size. The precision of an estimate for a characteristic that is highly variable in the population will be less than that for a characteristic that is fairly consistent across the population. The variability of the characteristic is measured by the standard deviation. Similarly, a larger effective sample size will provide more accurate estimates than a smaller effective sample size.
When estimating percentages (as for all four variables examined in Section III of this report), the characteristic is dichotomous, a binomial variable (e.g., in poverty, not in poverty). In this case the standard deviation is a simple function of the percentage with the characteristic. The standard deviation is,
where P is the percentage with the characteristic in the population. The closer the true percentage (e.g., percent in poverty) is to 50 percent, the larger the standard deviation. The closer the percentage is to either 0 or 100 percent, the smaller the standard deviation. For example, the standard deviation when P = 50 percent is 0.50, while the standard deviation when P = 1 percent is 0.10.
The effective sample size is the actual sample size divided by the design effect. The design effect is a factor that reflects the effect on the precision of a survey estimate due to the difference between the sample design actually used to collect the data and a simple random sample of respondents. National inperson household surveys, such as the three considered here, are conducted as stratified, multistage, clustered, areaprobability surveys. By clustering the sampled households in a limited number of geographic areas, the cost of data collection is significantly reduced. However, respondents in the same cluster are likely to be somewhat similar to one another. As a result, a clustered sample will generally not reflect the entire population as "effectively." Before selecting the sample of clusters, the country is stratified based on characteristics believed to be correlated with the survey variables of greatest interest. This stratification produces more precise survey estimates for targeted domains than an unstratified design. The design effect reflects all aspects of the complex sample design. While the design effect is different for each variable, experience with these surveys indicates that the variables under study will have reasonably similar design effects.

View full report
"deriving.pdf" (pdf, 2.83Mb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®