In this Technical Appendix we document the methodology employed in creating the state estimates presented in this report. We begin with a brief overview and then present a stepbystep description of the procedures that we followed.

A. Overview

The Census Bureau assigned a sample weight to each of the approximately 50,000 households that responded to the March 1998 CPS. This weight indicates the number of households in the population that each sample household represents. The weight incorporates a number of factors in addition to a sample household's probability of being selected into the sample. These factors include an adjustment for nonresponding households and a series of corrections designed to bring the sample into closer agreement with independent estimates of the size, age and sex structure, and racial/ethnic composition of the population. The population controls used in weighting the CPS include only one total that is specific to each state: the number of persons age 16 and older. There are no state controls for the size or composition of the child population or the composition of the adult population. While several controls are applied at the national level, CPS state estimates of many characteristics are less accurate than they would be if controls were applied at the state level.
Each observation in the CPS sample was selected to represent households in only one state. For example, a sample household from Maryland with a weight of 4,000 represents 4,000 households in Maryland. When we borrow strength across states by the method of reweighting used here, we create 51 new weights for this Maryland household, and we distribute the household weight of 4,000 across the 51 states. The sample household continues to represent 4,000 households nationally, but it may now represent only 400 households in Maryland and 3,600 households spread across the other states. The other 3,600 Maryland households that this sample household previously represented are now represented by similar households from other states. The sample weights of all 50,000 CPS households are allocated across the 51 states in such a way that a set of statespecific control totals defined and constructed for this application is reproduced in each state.^{1}
The step that creates the 51 state weights for each CPS household is in fact the last of 10 steps. While the assignment of 51 weights per sample household may sound complex, the nine steps that precede it constitute the bulk of the work in producing the reweighted database, and most of the detailed description that follows pertains to those nine steps. First, however, we review the basic design decisions that precede these estimation steps.
The specification of control totals is of great importance in determining how well a particular reweighting of a CPS database accomplishes its objective of supporting accurate state estimates. The process of specifying these control totals involves the interaction between what we would like to control, given the tabulations that we intend to produce, and what we are best able to control, given the data that are available.^{2} Control totals developed from external sources with little or no sampling error yield the greatest improvement in the accuracy of state estimates, providing that they are relevant to the tabulations that we wish to produce. Such controls allow us to borrow strength across data sources. Clearly, if we had access to errorfree counts of uninsured children in each state we could greatly improve the accuracy of our state tabulations of uninsured children by poverty level and age. That we lack such controls, however, is one reason why we must use other methods of estimation.
The state control totals and the corresponding householdlevel characteristics to which the controls are applied are displayed in Table A.1. The variables are grouped according to the source of the control totals. For the variables that capture the age, race, and ethnic structure of a state's child population, we use population totals derived from administrative (mainly vital records) and decennial census data. The totals for the class A variables have essentially no sampling error, which, as we said, is a highly desirable property. Unfortunately, there are no such totals for the numbers of children in various poverty categories or the numbers of uninsured children. For those totals, we must rely on samplebased estimates. But rather than using direct sample estimates from the CPS, we improve their precision by using empirical Bayes shrinkage methods to produce totals for the class E variables. These methods average direct sample estimates with predictions from regression models. The dependent variable in such a regression is the direct sample estimate, and the predictors are state characteristics measured by decennial census and administrative records data (e.g., the poverty rate according to the census; the infant mortality rate, obtained from vital records; or the ratio of children enrolled in Medicaid, according to Medicaid administrative data, to the total population of children, derived from a combination of census data and administrative data).
For the last two control variables listed in Table A.1, the class D variables, we use direct sample estimates of their totals. The main purpose for including these two variables is to restrict somewhat the borrowing of strength from reweighting. Specifically, for the one state (the District of Columbia) with no households outside large central cities with substantial black or Hispanic populations, no weight is given to a household if it is not from such a central city.^{3} Likewise, for the 21 states that have no large central cities with substantial black or Hispanic populations, no weight is given to a household from such a central city in another state. For example, no Wyoming weight is given to a household from New York City.
CONTROL VARIABLES/TOTALS USED IN REWEIGHTING TABLE A.1 Household Control Variable State Control Total Class A: Variables for which we use Administrative estimates of totals Number of children age 0 Population age 0 Number of children ages 15 Population ages 15 Number of children ages 613 Population ages 613 Number of children ages 1418 Population ages 1418 Number of Hispanic children ages 018 Hispanic population ages 018 Number of nonHispanic black children ages 018 NonHispanic black population ages 018 Class E: Variables for which we use Empirical Bayes shrinkage estimates of totals Number of children < 50% FPL Number of children < 50% FPL Number of children 50 to < 100% FPL Number of children 50 to < 100% FPL Number of children 100 to < 200% FPL Number of children 100 to < 200% FPL Number of children 200 to < 350% FPL Number of children 200 to < 350% FPL Number of uninsured children < 100% FPL Number of uninsured children < 100% FPL Number of uninsured children 100 to < 200% FPL Number of uninsured children 100 to < 200% FPL Number of uninsured children 200% FPL or greater Number of uninsured children 200% FPL or greater Class D: Variables for which we use Direct sample estimates of totals Indicator that household is in a large central city with a substantial black or Hispanic population Number of households in large central cities with substantial black or Hispanic populations Indicator that household is not in a large central city with a substantial black or Hispanic population Number of households not in large central cities with substantial black or Hispanic populations The steps needed to derive control totals can become complex for at least two reasons. First, empirical Bayes estimation is itself complex and may also include steps that entail elaborate operationssuch as smoothing estimated variances. Second, if a state control is obtained from an external source or by using empirical Bayes estimation, its introduction as a control is likely to changeand generally improvethe estimates of other totals to which it is related. For example, the CPS does not control the size of the Hispanic population at the state level, and Hispanic children tend to have higher uninsured rates than nonHispanic children. By introducing estimates of the state Hispanic population as controls, we may improve the precision of the state estimates of uninsured children that we also want to use as controls. Rather than introducing all of the controls simultaneously in one step, it is desirable to introduce them sequentially so that the controls introduced at one step can allow us to obtain better estimates of controls that can be introducedalong with the earlier controlsat a later step.


B. StepbyStep Procedure

Development of the reweighted database required 10 steps:
 Derive estimates of Class A totals
 Adjust the weights within each state to reproduce the totals derived in Step 1
 Derive direct sample estimates of Class E totals using the weights from Step 2
 Select regression models to predict the Class E totals
 Derive empirical Bayes shrinkage estimates of Class E totals
 Adjust the weights within each state to reproduce the totals from Steps 1 and 5
 Derive direct sample estimates of Class D totals using the weights from Step 6
 Obtain adjusted totals for the Class A variables pertaining to numbers of Hispanic children and nonHispanic black children
 Adjust the weights within each state to reproduce the totals from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables)
 Reweight the March 1998 CPS database from Step 9 to borrow strength across states, using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables
We describe the 10 steps in detail below.
1. Derive Estimates of Class A Totals
The source of Class A totals was the Census Bureau's state population estimates by age, race, sex, and Hispanic origin. These estimates are based on the most recent decennial census and carried forward by a combination of vital statistics and other administrative data.
The population estimates published by the Census Bureau are sometimes described as "censuslevel" estimates because they are intended to represent the population counts that would be obtained if a decennial census were conducted. As is well known, there is a net undercount of the population by the census when it is actually conducted, and there would surely be a net undercount if a census were conducted sometime between 1990 and 2000. The Census Bureau's estimate of what the net undercount would have been had a census been conducted in, say, 1997 is the estimated undercount in the 1990 census. Accordingly, the Bureau has developed and published a "net population adjustment matrix" that contains for each state the estimated undercount by single year of age, sex, race, and Hispanic origin. When the Bureau publishes population estimates, the net undercounts are subtracted from the Bureau's best estimates of the actual population totals to obtain the published totals. To develop adjusted population estimates, we "undid" this last step, adding the net undercounts to the published population totals.
The published estimates refer to July 1 of each year. We averaged successive July 1 estimates to obtain estimates for January 1, the end of the reference period for much of the data collected in the March CPS.^{4}
2. Control the Weights to the Totals Derived in Step 1
Applying the controls from Step 1 may alter individual state estimates of uninsured children and low income children. Before developing empirical Bayes estimates of uninsured children and low income children, therefore, we "raked" the CPS weights to the Class A totals derived in Step 1. Raking is a widely used procedure for adjusting sample weights. For a specified set of characteristics of the sampled population, it brings weighted sums obtained from the sample into agreement with totals obtained from external sources. The raking was done within each statethat is, there was no borrowing of strength across states at this point.
In an effort to avoid extremely large upward adjustments to weights in states with small numbers of Hispanics or blacks, we used four different raking models: (1) rake to all totals except the totals for Hispanics and nonHispanic blacks, (2) rake to all totals except the total for Hispanics, (3) rake to all totals except the total for nonHispanic blacks, and (4) rake to all totals. In general, we used the first model if a state has few Hispanics and nonHispanic blacks. We used the second and third models if a state has relatively few Hispanics or relatively few nonHispanic blacks, respectively. We used the fourth model for the remaining states.^{5}
3. Derive Direct Sample Estimates of Class E Totals
Using the weights from Step 2, we calculated direct sample estimates of the Class E totals, which are needed in Steps 4 and 5.
We estimated percentages rather than counts (that is, the percentage uninsured rather than the total number) in order to standardize for state population size, which is necessary for the next two steps. For each of the four income variables, the denominator of the percentage is the total number of children. For the three uninsured variables, the denominator is the total number of children in the indicated poverty category.
We estimated variances and covariances for the direct sample estimates using a jackknife estimator, treating the CPS rotation groups as replicate samples. These estimates are required for the calculation of empirical Bayes estimates in Step 5.
4. Select Regression Models to Predict the Class E Totals
In developing regression models to predict state income distributions and uninsured rates, we considered a wide range of potential predictors, summarized in Table A.2. We selected models based on their predictive abilities. In addition, we checked for and did not find strong evidence of correctable, persistent bias in the predictions for groups of states defined by diverse characteristics, such as population size, percent Hispanic, and the other variables considered as predictors.
Regression models predicting the Class E totals were estimated for the March CPS samples for 1995, 1996, 1997, and 1998 so that we could evaluate the performance of alternative models in different years and select final model specifications based on their fit across the four years.
TABLE A.2
POTENTIAL PREDICTORS EVALUATED IN REGRESSION MODELS FOR POVERTY LEVEL AND UNINSURED RATE
Characteristics of and Participation in Social Welfare Programs
Participation in:
Food Stamp Program
National School Lunch Program
Supplemental Security Income
Medicaid
Unemployment InsuranceFraction of children eligible for Medicaid by poverty category
Age distribution of children enrolled in MedicaidIncome and Poverty
Poverty rate among federal tax return filers and the nonfiler rate
Per capita income (from National Income and Product Accounts)
Median household income (from census)
Percentage of population by poverty category (census)
Percentage of child population by poverty category (census)Demographic Characteristics of Population
Population total and population growth Racial/ethnic distributionpercentage black, percentage Hispanic
Migrationpercentage noncitizen (census), net international migration rate
Urban/rural distribution (census)Health and Vital Statistics
Immunization rate
Infant mortality rate and low birth weight rate Child death rate and teen violent death rate
Teen birth rateEmployment and Education
Proportion of jobs by sector (e.g., agriculture, manufacturing, services, government)
Proportion of jobs in small establishments
Proportion of adults who are selfemployed (census)
Educational attainment of adults (no HS diploma, at least a BA) (census)Living Arrangements
Percentage of children by number and employment status of parents in household (census)
Percentage of children institutionalized (census)
Percentage of nonelderly persons in nonfamily households (census)
Percentage of households with no children or nonfamily (census)We selected a single best model for all four of the Class E poverty variables (that is, the percentage of children in each of the poverty classes listed in Table A.1). This model included the following predictors:
 the child poverty rate according to individual income tax data, that is, the percent of child exemptions that are claimed on tax returns with income below the poverty level
 the percentage of the population receiving food stamps
 the percentage of children ages 1 to 18 in families with incomes less than or equal to 75% of the federal poverty level (FPL) according to 1990 census data
 the percentage of children ages 1 to 18 > 75% to 100% FPL (from 1990 census)
 the percentage of children ages 1 to 18 > 100% to 185% FPL (from 1990 census)
 median household income (from 1990 census)
 percentage of children ages 0 to 17 who are noncitizens (from 1990 census)
For the three Class E insurance coverage variables (the percentage uninsured in each of three poverty classes, which are listed in Table A.1) we identified separate best models; however, all three models included the following predictors:
 the percentage of the population that is Hispanic
 the nonelderly poverty rate according to individual income tax data, that is, the percentage of nonelderly exemptions that are claimed on tax returns with income below the poverty level
 percentage of children ages 1 to 20 enrolled in Medicaid
The best model for the first insurance coverage variable, the percentage uninsured among children below 100 percent of poverty, also included the following predictors:
 the percentage of children ages 0 to 17 who are living with two parents, only one of whom is in the labor force (from 1990 census)
 the percentage of children ages 0 to 17 who are living with one parent who is in the labor force (from 1990 census)
 the proportion of jobs that are in agricultural services, forestry, or fishing (as of 1996)
The best model for the second insurance coverage variable, the percentage uninsured among children between 100 and 200 percent of poverty, included the following predictors (in addition to the three listed above):
 the percentage of children between 100 and 200 percent of poverty who are incomeeligible for Medicaid
 the percentage of children ages 1 to 18 > 100% to 185% FPL (from 1990 census)
 the proportion of jobs that are in agricultural services, forestry, or fishing (as of 1996)
 the proportion of jobs that are in retail trade (as of 1996)
The best model for the third insurance coverage variable, the percentage uninsured among children at or above 200 percent of poverty, included the following predictor (in addition to the three that were included in all three models):
 the percentage of children ages 0 to 17 who are living with one parent who is in the labor force (from 1990 census)
The models have reasonable face validity; that is, the predictors have plausible relationships, generally, to the variables being predicted.
5. Derive Empirical Bayes Shrinkage Estimates of Class E Totals
While regression models were estimated for each of four years, empirical Bayes estimates were ultimately needed for just March 1998. We estimated the four poverty variables and three uninsured variables as weighted averages of the direct sample estimates calculated in Step 3 and regression predictions from the models selected in Step 4. The relative weighting of the direct sample estimates and regression predictions varied by state, depending on the statespecific variances of the direct sample estimates and the overall fit of the regression models (which does not vary by state).
We obtained estimated counts from the estimated percentages. We used estimates of the population ages 0 to 18 from Step 1 to convert the empirical Bayes estimates of poverty percentages to poverty counts. We then ratioadjusted the state counts so that they would sum to direct sample estimates of national totals (from Step 3) for each of the four poverty variables. A ratio adjustment of this kind is standard practice in small area estimation; it is necessary because the counts derived from the empirical Bayes estimates do not necessarily sum to the national totals. We used the adjusted state poverty counts to convert the empirical Bayes estimates of uninsured percentages to uninsured counts, and we ratioadjusted the state estimates of these three variables to the direct sample estimates of national totals.
6. Rake Weights to Totals from Steps 1 and 5
Using the four raking models that we applied in Step 2, but modified to include the Class E variables, we raked the weights within each state to the totals obtained in Steps 1 and 5. This step was necessary so that the Class D controls calculated as direct sample estimates in Step 7 would be consistent with the Class A and Class E controls.
7. Derive Direct Sample Estimates of Class D Totals
Using the weights obtained from Step 6, we calculated direct sample estimates of the two Class D totals in each state.
8. Obtain Adjusted Totals for the Class A Variables
The Class A controls pertaining to the numbers of Hispanic children and nonHispanic black children could not be applied fully in Steps 2 and 5 because some states had too few sample observations in one or both of these two groups. At this point, then, the weighted sums of Hispanic children and nonHispanic black children do not agree with the Step 1 controls at the national level. To correct this problem, we grouped the states in which the Hispanic control could not be applied earlier, and we ratio adjusted the direct sample estimates of Hispanic children for this group of states as a whole so that the adjusted totals sum to the Step 1 totals for the group. We then repeated the process to obtain adjusted totals of nonHispanic black children.
9. Rake Weights within Each State
Within each state we raked the weights to totals derived from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables). In this step, there is just one raking model. We do not have to treat states with low percentages of Hispanics or blacks differently because we are raking to the state totals created in Step 8 rather than the Step 1 totals for Hispanic children and nonHispanic black children.
10. Reweight the March 1998 CPS Database to Borrow Strength Across States
Using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables, we applied the reweighting procedure to obtain 51 state weights for each sample household. With this procedure there are two constraints on the state weights: (1) all control totals must be satisfied for all states and (2) for each household, the national weight given to the household after reweightingthat is, the sum of the household's state weightsmust equal the weight given to the household at the conclusion of Step 9. These constraints and the maximum likelihood estimation algorithm are described in detail in Schirm and Zaslavsky (1997).
