# Studies of Welfare Populations: Data Collection and Research Issues. Population-Based Adjustments

The term ''calibration" is used in the literature to cover a variety of techniques used in benchmarking the weights to known external totals. In this paper, we focus our attention on the two procedures most commonly used in general surveys: poststratification and raking.

Poststratification

Poststratification is a popular estimation procedure in which the weights of the respondents are adjusted further so that the sums of the adjusted weights are equal to known population totals for certain subgroups of the population. For example, take the case where the population totals of subgroups (referred to as poststrata) defined by age, gender, and race/ethnicity are known from the sampling frame (or other external sources), and they also can be estimated from the survey. Poststratification adjusts the survey weights so that the distribution by subgroups (when weighted by the poststratified weights) is the same as the population distribution from the survey frame or external sources.

Letdenote the population count in the poststratum denoted by g as obtained from the sampling frame or an external source, and let be the corresponding survey estimate obtained by using the nonresponse-adjusted weights. Then the ratiois the poststratification adjustment factor for subgroup g .

The main advantage of poststratification is that the procedure reduces the bias from some types of noncoverage and nonresponse. An additional advantage of poststratification is the improvement in the reliability of the survey estimates for variables that are highly correlated with the variables used for poststratification. Generally, the poststratified weights are the final survey weights, and these would be used to tabulate the survey results. Occasionally, an additional weighting factor, called a ''trimming factor," is used to protect against extremely high variances. A brief description of trimming procedures used in practice is provided in a later section. If a trimming factor is calculated for a survey data file, it should be incorporated into the final weight as another multiplication factor.

Earlier, we illustrated the nonresponse adjustment procedure by assuming that the number of families in the population was 41,000 and that there was no noncoverage. We continue the FIS example, assuming that the number of families in the population was actually 46,000 and that the sampling frame contained only 41,000 families because information necessary for locating respondents was missing for 5,000 families. However, some limited demographic and other socioeconomic information was available in the data files for all 46,000 families. Suppose further that the noncoverage rate varies within the four cells defined by the cross-classification of employment status (employed/not employed) and education (high school diploma/no high school diploma) of the head of the family. Poststratification adjustment can be applied to reduce the bias arising from noncoverage.

The poststratification adjustment factor for a poststratification cell is the ratio of the known family count within the poststratification cell to the corresponding estimate of the family count from the survey. The estimate of the family count within a poststratification cell is obtained by summing the nonresponse-adjusted weights of the families (as shown in Table 5-5) in the poststratification cell. Because the base weights were adjusted to account for the nonresponse (as given in Table 5-5), these adjusted weights would vary by poststratified adjustment classes. Therefore, Table 5-6 gives the count and the adjusted weight for the 16 cells defined by the cross-classification of nonresponse adjustment classes (4 classes) and poststrata (4 cells).

Column 2 is the nonresponse adjusted weight for each family in the gender/race/ employment/education class. The initial estimate of total number of families in each class (taking nonresponse into account) is the product of columns 1 and 2 and is given in column 3. The total of the nonresponse-adjusted weights (column 3) can be used to estimate the number of families by poststrata defined by employment status and education of the head of the family. Table 5-7 provides the estimates of the family count and the corresponding known family count from external sources by poststrata. The table also gives the poststratification adjustment factors, defined as the ratio of the known family count and the survey estimate.

TABLE 5-6
Distribution of Nonresponse-Adjusted Weights by Gender, Race, Employment, and Education for the FIS* Example
(1)
(2)
Initial Estimated No. of Families
(3)
Gender and Race Employment Education
Male
White Employed HS *** 38 104 3,948
Non White  Employed HS  15 73 1,093
White Employed No HS 11 104 1,143
Non White Employed   No HS 6 73 437
White No HS diploma HS 12 104 1,247
Non White  No HS diploma HS  5 73 364
White Unemployed No HS 16 104 1,662
Non White  Unemployed  No HS 9 73 656
Female
White Employed HS 101 60 6,065
Non White Employed  HS  158 55 8,649
White Employed No HS 30 60 1,801
Non White  Employed  No HS 47 55 2,573
White Unemployed HS 33 60 1,982
Non White  Unemployed  HS 51 55 2,792
White Unemployed No HS 45 60 2,702
Non White  Unemployed No HS  71 55 3,887
Total 648   41,000
NOTES:
* Family Income Study
** For presentation purposes, adjusted weights are rounded to whole numbers. The calculations, however, carry all the decimals.
*** HS =High school diploma
TABLE 5-7
Poststratification Adjustment Factors for the FIS* Example
Poststratum Initial Survey Estimate* Known Auxiliary Total Adjustment Factor**
Employed
HS *** 19,757 22,125 1.12
No HS 5,955 6,313 1.06
Unemployed
HS 6,385 6,966 1.09
No HS 8,908 10,596 1.19
NOTES:
* Family Income Study
** For presentation purposes, we have rounded up the adjustment factors (to two decimals) and the adjusted weights (to whole numbers). The calculations, however, carry all the decimals.
*** HS =High school diploma

The final survey weights are defined as the product of the base weight and the adjustment factors for nonresponse and poststratification. Table 5-8 includes the final weights for the FIS example. The final weight in column 5 is equal to the product of the base weight in column 1 and the nonresponse adjustment in column 3 and the poststratification factor in column 4.

It is not always possible to use poststratification because it requires data on the cross-classification of categorical variables that are used to define poststrata. Either the cell-level population counts may not be available or the sample sizes for some of the cells in the poststrata may not be adequate (for a discussion of adequate cell sample sizes, refer to the following section entitled ''Balancing Bias and Variance When Adjusting for Nonresponse"). In such situations, survey practitioners frequently use a more complex poststratification method, referred to as a raking procedure, which adjusts the survey estimates to the known marginal totals of several categorical variables.

Raking Procedure

This methodology is referred to as raking ratio estimation because an iterative procedure is used to produce adjustment factors that provide consistency with known marginal population totals. Typically, raking is used in situations where the interior cell counts of a cross-tabulation are unknown or the sample sizes in some cells are too small for efficient estimation (refer to the following section for more information about sufficient cell sample size).

Raking ratio estimation is based on an iterative proportional fitting procedure developed by Deming and Stephan (1940). It involves simultaneous ratio adjustments of sample data to two or more marginal distributions of the population counts. With this approach, the weights are calculated such that the marginal distribution of the weighted totals conforms to the marginal distribution of the targeted population; some, or all, of the interior cells may differ.

The raking procedure is carried out in a sequence of adjustments. The base weights (or nonresponse-adjusted weights) are first adjusted to produce one marginal distribution, the adjusted weights are used to produce a second marginal distribution, and so on, up to the number of raking dimensions. One sequence of adjustments to the marginal distributions is known as a cycle or iteration. The sequence of adjustments is repeated until convergence is achieved, meaning that the weights no longer change with each iteration. In practice, the raking procedure usually converges, but the number of iterations may be large when there are many marginal distributions involved in raking.

TABLE 5-8
Final Poststratified Weights for the FIS* Example
Poststratum Base
Weight(1)
Respondents
(2)
Non-Response
Post- Strat.
Final Weight
(5)**
Final Estimate of
No. of Families (6)
Gender and Race
Employment Education
Male
White Employed HS *** 50 38 2.08 1.12 116 4,422
Non white Employed  HS  50 15 1.46 1.12 82 1,224
White Employed No HS 50 11 2.08 1.06 110 1,212
Nonwhite Employed  No HS  50 6 1.46 1.06 77 463
White Unemployed HS 50 12 2.08 1.09 113 1,360
Non white Unemployed   HS 50 5 1.46 1.09 80 397
White Unemployed No HS 50 16 2.08 1.19 124 1,978
Non white Unemployed  No HS  50 9 1.46 1.19 87 780
Female
White Employed HS 50 101 1.20 1.12 67 6,793
Non white Employed  HS  50 158 1.10 1.12 62 9,687
White Employed No HS 50 30 1.20 1.06 64 1,910
Non white Employed  No HS  50 47 1.10 1.06 58 2,728
White Unemployed HS 50 33 1.20 1.09 65 2,162
Non white  Unemployed HS  50 51 1.10 1.09 60 3,046
White Unemployed No HS 50 45 1.20 1.19 71 3,215
Non white  Unemployed No HS  50 71 1.10 1.19 65 4,624
Total       648       46,000
NOTES:
* Family Income Survey
** For presentation purposes, we have rounded up the adjustment factors (to two decimals) and the adjusted weights (to whole numbers). The calculations, however, carry all the decimals.
*** HS =High school diploma

The final weights are produced automatically by the software that implements raking. The raking procedure only benchmarks the sample to known marginal distributions of the population; it should not be assumed that the resulting solution is ''closer to truth" at the cross-classification cell level as well. The final solution from a raking procedure may not reflect the correlation structure among different variables. For a more complete discussion of raking, refer to Kalton and Kasprzyk (1986).

As noted earlier, raking is one of a range of related methods known as calibration methods. One specific calibration method is GREG (Generalized REGression). GREG is not as commonly used as poststratification and raking because of its rather complex application and some of its limitations. Refer to Särndal et al. (1992) and Valliant et al. (2000) for a description of GREG.(6) For information about calibration techniques, refer to Deville and Särndal (1992) and Theberge (2000).

The weighting system is implemented by assigning weights to each person (or family) in the sample, inserting the weight into the computer record for each person, and incorporating the weights in the estimation process using software created for survey data analysis.

#### View full report

"01.pdf" (pdf, 472.92Kb)

#### View full report

"02.pdf" (pdf, 395.41Kb)

#### View full report

"03.pdf" (pdf, 379.04Kb)

#### View full report

"04.pdf" (pdf, 381.73Kb)

#### View full report

"05.pdf" (pdf, 393.7Kb)

#### View full report

"06.pdf" (pdf, 415.3Kb)

#### View full report

"07.pdf" (pdf, 375.49Kb)

#### View full report

"08.pdf" (pdf, 475.21Kb)

#### View full report

"09.pdf" (pdf, 425.17Kb)

#### View full report

"10.pdf" (pdf, 424.33Kb)

#### View full report

"11.pdf" (pdf, 392.39Kb)

#### View full report

"12.pdf" (pdf, 386.39Kb)