# State Estimates of Uninsured Children, January 1998. Final Report.. B. Step-by-Step Procedure

Development of the reweighted database required 10 steps:

1. Derive estimates of Class A totals
2. Adjust the weights within each state to reproduce the totals derived in Step 1
3. Derive direct sample estimates of Class E totals using the weights from Step 2
4. Select regression models to predict the Class E totals
5. Derive empirical Bayes shrinkage estimates of Class E totals
6. Adjust the weights within each state to reproduce the totals from Steps 1 and 5
7. Derive direct sample estimates of Class D totals using the weights from Step 6
8. Obtain adjusted totals for the Class A variables pertaining to numbers of Hispanic children and non-Hispanic black children
9. Adjust the weights within each state to reproduce the totals from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables)
10. Reweight the March 1998 CPS database from Step 9 to borrow strength across states, using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables

We describe the 10 steps in detail below.

1. Derive Estimates of Class A Totals

The source of Class A totals was the Census Bureau's state population estimates by age, race, sex, and Hispanic origin. These estimates are based on the most recent decennial census and carried forward by a combination of vital statistics and other administrative data.

The population estimates published by the Census Bureau are sometimes described as "census-level" estimates because they are intended to represent the population counts that would be obtained if a decennial census were conducted. As is well known, there is a net undercount of the population by the census when it is actually conducted, and there would surely be a net undercount if a census were conducted sometime between 1990 and 2000. The Census Bureau's estimate of what the net undercount would have been had a census been conducted in, say, 1997 is the estimated undercount in the 1990 census. Accordingly, the Bureau has developed and published a "net population adjustment matrix" that contains for each state the estimated undercount by single year of age, sex, race, and Hispanic origin. When the Bureau publishes population estimates, the net undercounts are subtracted from the Bureau's best estimates of the actual population totals to obtain the published totals. To develop adjusted population estimates, we "undid" this last step, adding the net undercounts to the published population totals.

The published estimates refer to July 1 of each year. We averaged successive July 1 estimates to obtain estimates for January 1, the end of the reference period for much of the data collected in the March CPS.4

2. Control the Weights to the Totals Derived in Step 1

Applying the controls from Step 1 may alter individual state estimates of uninsured children and low income children. Before developing empirical Bayes estimates of uninsured children and low income children, therefore, we "raked" the CPS weights to the Class A totals derived in Step 1. Raking is a widely used procedure for adjusting sample weights. For a specified set of characteristics of the sampled population, it brings weighted sums obtained from the sample into agreement with totals obtained from external sources. The raking was done within each state--that is, there was no borrowing of strength across states at this point.

In an effort to avoid extremely large upward adjustments to weights in states with small numbers of Hispanics or blacks, we used four different raking models: (1) rake to all totals except the totals for Hispanics and non-Hispanic blacks, (2) rake to all totals except the total for Hispanics, (3) rake to all totals except the total for non-Hispanic blacks, and (4) rake to all totals. In general, we used the first model if a state has few Hispanics and non-Hispanic blacks. We used the second and third models if a state has relatively few Hispanics or relatively few non-Hispanic blacks, respectively. We used the fourth model for the remaining states.5

3. Derive Direct Sample Estimates of Class E Totals

Using the weights from Step 2, we calculated direct sample estimates of the Class E totals, which are needed in Steps 4 and 5.

We estimated percentages rather than counts (that is, the percentage uninsured rather than the total number) in order to standardize for state population size, which is necessary for the next two steps. For each of the four income variables, the denominator of the percentage is the total number of children. For the three uninsured variables, the denominator is the total number of children in the indicated poverty category.

We estimated variances and covariances for the direct sample estimates using a jackknife estimator, treating the CPS rotation groups as replicate samples. These estimates are required for the calculation of empirical Bayes estimates in Step 5.

4. Select Regression Models to Predict the Class E Totals

In developing regression models to predict state income distributions and uninsured rates, we considered a wide range of potential predictors, summarized in Table A.2. We selected models based on their predictive abilities. In addition, we checked for and did not find strong evidence of correctable, persistent bias in the predictions for groups of states defined by diverse characteristics, such as population size, percent Hispanic, and the other variables considered as predictors.

Regression models predicting the Class E totals were estimated for the March CPS samples for 1995, 1996, 1997, and 1998 so that we could evaluate the performance of alternative models in different years and select final model specifications based on their fit across the four years.

TABLE A.2

POTENTIAL PREDICTORS EVALUATED IN REGRESSION MODELS FOR POVERTY LEVEL AND UNINSURED RATE

Characteristics of and Participation in Social Welfare Programs

Participation in:

Food Stamp Program
National School Lunch Program
Supplemental Security Income
Medicaid
Unemployment Insurance

Fraction of children eligible for Medicaid by poverty category
Age distribution of children enrolled in Medicaid

Income and Poverty

Poverty rate among federal tax return filers and the nonfiler rate
Per capita income (from National Income and Product Accounts)
Median household income (from census)
Percentage of population by poverty category (census)
Percentage of child population by poverty category (census)

Demographic Characteristics of Population

Population total and population growth Racial/ethnic distribution--percentage black, percentage Hispanic
Migration--percentage noncitizen (census), net international migration rate
Urban/rural distribution (census)

Health and Vital Statistics

Immunization rate
Infant mortality rate and low birth weight rate Child death rate and teen violent death rate
Teen birth rate

Employment and Education

Proportion of jobs by sector (e.g., agriculture, manufacturing, services, government)
Proportion of jobs in small establishments
Proportion of adults who are self-employed (census)
Educational attainment of adults (no HS diploma, at least a BA) (census)

Living Arrangements

Percentage of children by number and employment status of parents in household (census)
Percentage of children institutionalized (census)
Percentage of nonelderly persons in nonfamily households (census)
Percentage of households with no children or nonfamily (census)

We selected a single best model for all four of the Class E poverty variables (that is, the percentage of children in each of the poverty classes listed in Table A.1). This model included the following predictors:

• the child poverty rate according to individual income tax data, that is, the percent of child exemptions that are claimed on tax returns with income below the poverty level
• the percentage of the population receiving food stamps
• the percentage of children ages 1 to 18 in families with incomes less than or equal to 75% of the federal poverty level (FPL) according to 1990 census data
• the percentage of children ages 1 to 18 > 75% to 100% FPL (from 1990 census)
• the percentage of children ages 1 to 18 > 100% to 185% FPL (from 1990 census)
• median household income (from 1990 census)
• percentage of children ages 0 to 17 who are noncitizens (from 1990 census)

For the three Class E insurance coverage variables (the percentage uninsured in each of three poverty classes, which are listed in Table A.1) we identified separate best models; however, all three models included the following predictors:

• the percentage of the population that is Hispanic
• the nonelderly poverty rate according to individual income tax data, that is, the percentage of nonelderly exemptions that are claimed on tax returns with income below the poverty level
• percentage of children ages 1 to 20 enrolled in Medicaid

The best model for the first insurance coverage variable, the percentage uninsured among children below 100 percent of poverty, also included the following predictors:

• the percentage of children ages 0 to 17 who are living with two parents, only one of whom is in the labor force (from 1990 census)
• the percentage of children ages 0 to 17 who are living with one parent who is in the labor force (from 1990 census)
• the proportion of jobs that are in agricultural services, forestry, or fishing (as of 1996)

The best model for the second insurance coverage variable, the percentage uninsured among children between 100 and 200 percent of poverty, included the following predictors (in addition to the three listed above):

• the percentage of children between 100 and 200 percent of poverty who are income-eligible for Medicaid
• the percentage of children ages 1 to 18 > 100% to 185% FPL (from 1990 census)
• the proportion of jobs that are in agricultural services, forestry, or fishing (as of 1996)
• the proportion of jobs that are in retail trade (as of 1996)

The best model for the third insurance coverage variable, the percentage uninsured among children at or above 200 percent of poverty, included the following predictor (in addition to the three that were included in all three models):

• the percentage of children ages 0 to 17 who are living with one parent who is in the labor force (from 1990 census)

The models have reasonable face validity; that is, the predictors have plausible relationships, generally, to the variables being predicted.

5. Derive Empirical Bayes Shrinkage Estimates of Class E Totals

While regression models were estimated for each of four years, empirical Bayes estimates were ultimately needed for just March 1998. We estimated the four poverty variables and three uninsured variables as weighted averages of the direct sample estimates calculated in Step 3 and regression predictions from the models selected in Step 4. The relative weighting of the direct sample estimates and regression predictions varied by state, depending on the state-specific variances of the direct sample estimates and the overall fit of the regression models (which does not vary by state).

We obtained estimated counts from the estimated percentages. We used estimates of the population ages 0 to 18 from Step 1 to convert the empirical Bayes estimates of poverty percentages to poverty counts. We then ratio-adjusted the state counts so that they would sum to direct sample estimates of national totals (from Step 3) for each of the four poverty variables. A ratio adjustment of this kind is standard practice in small area estimation; it is necessary because the counts derived from the empirical Bayes estimates do not necessarily sum to the national totals. We used the adjusted state poverty counts to convert the empirical Bayes estimates of uninsured percentages to uninsured counts, and we ratio-adjusted the state estimates of these three variables to the direct sample estimates of national totals.

6. Rake Weights to Totals from Steps 1 and 5

Using the four raking models that we applied in Step 2, but modified to include the Class E variables, we raked the weights within each state to the totals obtained in Steps 1 and 5. This step was necessary so that the Class D controls calculated as direct sample estimates in Step 7 would be consistent with the Class A and Class E controls.

7. Derive Direct Sample Estimates of Class D Totals

Using the weights obtained from Step 6, we calculated direct sample estimates of the two Class D totals in each state.

8. Obtain Adjusted Totals for the Class A Variables

The Class A controls pertaining to the numbers of Hispanic children and non-Hispanic black children could not be applied fully in Steps 2 and 5 because some states had too few sample observations in one or both of these two groups. At this point, then, the weighted sums of Hispanic children and non-Hispanic black children do not agree with the Step 1 controls at the national level. To correct this problem, we grouped the states in which the Hispanic control could not be applied earlier, and we ratio adjusted the direct sample estimates of Hispanic children for this group of states as a whole so that the adjusted totals sum to the Step 1 totals for the group. We then repeated the process to obtain adjusted totals of non-Hispanic black children.

9. Rake Weights within Each State

Within each state we raked the weights to totals derived from Step 1 (for the first four Class A variables), Step 8 (for the last two Class A variables), Step 5 (for the Class E variables), and Step 7 (for the Class D variables). In this step, there is just one raking model. We do not have to treat states with low percentages of Hispanics or blacks differently because we are raking to the state totals created in Step 8 rather than the Step 1 totals for Hispanic children and non-Hispanic black children.

10. Reweight the March 1998 CPS Database to Borrow Strength Across States

Using the control totals from Step 1 for Class A variables, Step 5 for Class E variables, and Step 7 for Class D variables, we applied the reweighting procedure to obtain 51 state weights for each sample household. With this procedure there are two constraints on the state weights: (1) all control totals must be satisfied for all states and (2) for each household, the national weight given to the household after reweighting--that is, the sum of the household's state weights--must equal the weight given to the household at the conclusion of Step 9. These constraints and the maximum likelihood estimation algorithm are described in detail in Schirm and Zaslavsky (1997).