The Census Bureau assigned a sample weight to each of the approximately 50,000 households that responded to the March 1998 CPS. This weight indicates the number of households in the population that each sample household represents. The weight incorporates a number of factors in addition to a sample household's probability of being selected into the sample. These factors include an adjustment for nonresponding households and a series of corrections designed to bring the sample into closer agreement with independent estimates of the size, age and sex structure, and racial/ethnic composition of the population. The population controls used in weighting the CPS include only one total that is specific to each state: the number of persons age 16 and older. There are no state controls for the size or composition of the child population or the composition of the adult population. While several controls are applied at the national level, CPS state estimates of many characteristics are less accurate than they would be if controls were applied at the state level.
Each observation in the CPS sample was selected to represent households in only one state. For example, a sample household from Maryland with a weight of 4,000 represents 4,000 households in Maryland. When we borrow strength across states by the method of reweighting used here, we create 51 new weights for this Maryland household, and we distribute the household weight of 4,000 across the 51 states. The sample household continues to represent 4,000 households nationally, but it may now represent only 400 households in Maryland and 3,600 households spread across the other states. The other 3,600 Maryland households that this sample household previously represented are now represented by similar households from other states. The sample weights of all 50,000 CPS households are allocated across the 51 states in such a way that a set of state-specific control totals defined and constructed for this application is reproduced in each state.^{1}
The step that creates the 51 state weights for each CPS household is in fact the last of 10 steps. While the assignment of 51 weights per sample household may sound complex, the nine steps that precede it constitute the bulk of the work in producing the reweighted database, and most of the detailed description that follows pertains to those nine steps. First, however, we review the basic design decisions that precede these estimation steps.
The specification of control totals is of great importance in determining how well a particular reweighting of a CPS database accomplishes its objective of supporting accurate state estimates. The process of specifying these control totals involves the interaction between what we would like to control, given the tabulations that we intend to produce, and what we are best able to control, given the data that are available.^{2} Control totals developed from external sources with little or no sampling error yield the greatest improvement in the accuracy of state estimates, providing that they are relevant to the tabulations that we wish to produce. Such controls allow us to borrow strength across data sources. Clearly, if we had access to error-free counts of uninsured children in each state we could greatly improve the accuracy of our state tabulations of uninsured children by poverty level and age. That we lack such controls, however, is one reason why we must use other methods of estimation.
The state control totals and the corresponding household-level characteristics to which the controls are applied are displayed in Table A.1. The variables are grouped according to the source of the control totals. For the variables that capture the age, race, and ethnic structure of a state's child population, we use population totals derived from administrative (mainly vital records) and decennial census data. The totals for the class A variables have essentially no sampling error, which, as we said, is a highly desirable property. Unfortunately, there are no such totals for the numbers of children in various poverty categories or the numbers of uninsured children. For those totals, we must rely on sample-based estimates. But rather than using direct sample estimates from the CPS, we improve their precision by using empirical Bayes shrinkage methods to produce totals for the class E variables. These methods average direct sample estimates with predictions from regression models. The dependent variable in such a regression is the direct sample estimate, and the predictors are state characteristics measured by decennial census and administrative records data (e.g., the poverty rate according to the census; the infant mortality rate, obtained from vital records; or the ratio of children enrolled in Medicaid, according to Medicaid administrative data, to the total population of children, derived from a combination of census data and administrative data).
For the last two control variables listed in Table A.1, the class D variables, we use direct sample estimates of their totals. The main purpose for including these two variables is to restrict somewhat the borrowing of strength from reweighting. Specifically, for the one state (the District of Columbia) with no households outside large central cities with substantial black or Hispanic populations, no weight is given to a household if it is not from such a central city.^{3} Likewise, for the 21 states that have no large central cities with substantial black or Hispanic populations, no weight is given to a household from such a central city in another state. For example, no Wyoming weight is given to a household from New York City.
CONTROL VARIABLES/TOTALS USED IN REWEIGHTING | |
---|---|
Household Control Variable | State Control Total |
Class A: Variables for which we use Administrative estimates of totals | |
Number of children age 0 | Population age 0 |
Number of children ages 1-5 | Population ages 1-5 |
Number of children ages 6-13 | Population ages 6-13 |
Number of children ages 14-18 | Population ages 14-18 |
Number of Hispanic children ages 0-18 | Hispanic population ages 0-18 |
Number of non-Hispanic black children ages 0-18 | Non-Hispanic black population ages 0-18 |
Class E: Variables for which we use Empirical Bayes shrinkage estimates of totals | |
Number of children < 50% FPL | Number of children < 50% FPL |
Number of children 50 to < 100% FPL | Number of children 50 to < 100% FPL |
Number of children 100 to < 200% FPL | Number of children 100 to < 200% FPL |
Number of children 200 to < 350% FPL | Number of children 200 to < 350% FPL |
Number of uninsured children < 100% FPL | Number of uninsured children < 100% FPL |
Number of uninsured children 100 to < 200% FPL | Number of uninsured children 100 to < 200% FPL |
Number of uninsured children 200% FPL or greater | Number of uninsured children 200% FPL or greater |
Class D: Variables for which we use Direct sample estimates of totals | |
Indicator that household is in a large central city with a substantial black or Hispanic population | Number of households in large central cities with substantial black or Hispanic populations |
Indicator that household is not in a large central city with a substantial black or Hispanic population | Number of households not in large central cities with substantial black or Hispanic populations |
The steps needed to derive control totals can become complex for at least two reasons. First, empirical Bayes estimation is itself complex and may also include steps that entail elaborate operations--such as smoothing estimated variances. Second, if a state control is obtained from an external source or by using empirical Bayes estimation, its introduction as a control is likely to change--and generally improve--the estimates of other totals to which it is related. For example, the CPS does not control the size of the Hispanic population at the state level, and Hispanic children tend to have higher uninsured rates than non-Hispanic children. By introducing estimates of the state Hispanic population as controls, we may improve the precision of the state estimates of uninsured children that we also want to use as controls. Rather than introducing all of the controls simultaneously in one step, it is desirable to introduce them sequentially so that the controls introduced at one step can allow us to obtain better estimates of controls that can be introduced--along with the earlier controls--at a later step.