Section 5 Table of Contents Section 7

Assessment of Major Federal Data Sets for Analyses of Hispanic and Asian or Pacific Islander Subgroups and Native Americans: Extending the Utility of Federal Data Bases

6.  Survey Supplementation

Contents

  1. Surveys Requiring Sample Supplementation
  2. Designs for Sample Supplementation
  3. ACS as a Sampling Frame

6.1  Surveys Requiring Sample Supplementation

Sample supplementation involves planning the sample design, carrying out household screenings, and data collection. It will require substantial efforts and costs and obviously should be used only as a last resource, that is when combining years is inadequate. We will use the results from Section 4 to identify surveys needing supplementation. However, we emphasize that such decisions cannot be fully made without careful consideration of the precision needed for specific policy purposes, which is beyond the scope of this report. Consequently, the surveys below and the subgroups for which supplementation is required should be thought of as suggestive rather than as actual recommendations.

We will mostly use the information in Section 4 of this report to identify surveys requiring supplementation. For periodic, rather than annual surveys, Section 3 describes the ability of each survey to provide useful data on minority subgroups.

Supplementation obviously is not required for the National Vital Statistics data sets, Census 2000, and the ACS. Averaging over time in CPS and NHIS, up to 5 years, would provide reasonable statistics for most of the minority subgroups. However, the samples for a few of the smaller subgroups (e.g., Cubans and Hawaiians) would still be quite small, and if those populations are of particular interest, then supplementation is needed. The NHIS Hispanic sample is even larger than CPS so that with the exception of Cuban-Americans, multi-year averages would provide data for adequate precision. The NHIS samples of American Indians or Alaska Natives and APIs are about the same as CPS and multi-year averages will be sufficient for most analytic purposes.

Five-year averages in NIS, NHSDA, and MEPS are sufficient for all Hispanic subgroups, with the possible exception of Cubans, but even 5 years will satisfy only minimal standards for the API subgroups. The American Indian or Alaska Natives sample will permit moderately detailed analysis in NIS but not in NHSDA or MEPS.

The surveys requiring supplementation for all, or almost all, minority subgroups are: NSFG, NHANES, MCBS, NHES, ECLS-B, and ECLS-K.

[ Go to Contents ]

6.2 Designs for Sample Supplementation

Sample supplementation for small subgroups of the population is generally very expensive. This is not due just to the additional interview and data processing costs, but more so to the effort and cost involved in identifying a probability sample of each subgroup of interest. For example, Cuban-Americans constitute one-half of one percent of the U.S. population, so that with purely simple random selection about 200 households have to be screened to locate a single Cuban household. Most of the API subgroups are even smaller and will require even greater screening. This implies screening of hundreds of thousands of households to locate samples of 1,000 or so supplemental cases. Such an effort could cost several millions of dollars for each survey, depending on the amount of supplementation and the desired level of reliability for the smallest subgroups such as Hawaiian and Vietnamese.

Under some circumstances, it is possible to avoid, or reduce, the very great screening effort. The conditions that permit such reductions are described below.

The samples for two surveys are drawn from sampling frames that show race/ethnicity for each person on the frame. ECLS-B is selected from birth records. The vital statistics records contain the detailed race/ethnicity for almost all births. A few of the smaller API subgroups are only identified as "other API" in states that contain only a small percentage of these subgroups, but Chinese, Japanese, Hawaiian, and Filipinos are reported everywhere, as are all of the Hispanic subgroups. Thus, there would be relatively little additional cost to identify a supplemental sample for ECLS-B, although interviewing and data processing costs might still be substantial, depending on the size of the sample supplementation.

A little more effort would be required to supplement MCBS, but it could be done reasonable efficiently. The MCBS sampling frame consists of Medicare beneficiaries in HCFA files. Race and ethnicity are recorded on this file, but not in the detail required. There is a single code for Hispanics and one code for API. Sample supplementation would require selecting a sample of Hispanics and API, screening the sample (possibly by telephone when listed numbers are available) and subsampling persons within each subgroup. More work is involved than for the ECLS-B, but it can be carried out without excessive cost. American Indians or Alaska Natives are also identified on the MCBS frames so supplementation of this population would be similar to that for ECLS-B.

The sampling frames for the other surveys are mostly area segments, although CPS and SIPP are based on census address lists and NHES and NIS use random digit dialing. In these surveys, the race/ethnicity of the sample households are not known in advance of the household contact, and a screening operation is necessary to identify the units eligible for the supplemental sample. Research on possible methods of reducing screening for samples of relatively rare population subgroups was carried out as part of the development of NHANES III procedures. No single procedure appeared to be universally applicable, but substantial gains in efficiency in sampling for Hispanics was possible by oversampling areas with heavy concentrations of Hispanics reported in the most recent census.1 Further research carried out jointly by Westat and NCHS statisticians confirmed these results and indicated the oversampling rates that would provide the lowest sampling errors.2 Unfortunately, the research indicated that only trivial improvements were possible through geographic oversampling for APIs or American Indians or Alaska Natives, since relatively high proportions of these populations reside in homes that are scattered throughout the general population. The research described above dealt with the broad race/ethnic groups—Hispanics, APIs, and American Indians or Alaska Natives—and did not explore the detailed subgroups. It is likely that geographic oversampling will be almost as effective for most Hispanic subgroups as for total Hispanics. It is possible that a few of the API subgroups are sufficiently clustered for this kind of a sample to be effective, but a more detailed examination would be necessary to determine this fact. In any case, important gains are not possible for most of the API subgroups, or for American Indians or Alaska Natives. For the Hispanic subgroups, even with the gains in efficiency, a sizeable amount of screening would still be necessary.

Members of subpopulations identified through the NIS screener could be asked question modules addressing topics of interest to ASPE. This is the plan formulated by NCHS for the proposed state and local area integrated telephone survey (SLAITS). The NIS annual screening sample is so large that sufficient sample sizes of each subpopulation can be identified yearly; screening costs would be minimal for such data collection efforts. The respondents, of course, would be limited to households with telephones.

The sample design and estimation method used in the Hispanic Health and Nutrition Survey (HHANES) is a useful precedent to consider for sample supplementation. HHANES did not attempt to sample the entire target population that consisted of Mexican-Americans, Cubans, and Puerto Ricans. The HHANES sample was restricted to geographic areas (counties and blocks) containing high concentrations of these subgroups. The sampling frame used for sample selection of PSUs in the Mexican-American sample was restricted to counties with moderate or large numbers of Mexican-Americans or where they constituted reasonably large percentages of the total population. Similarly, the within-PSU sample excluded census block groups or enumeration districts with small numbers of Mexican-Americans. Similar exclusions applied to the Cuban and Puerto Rican samples. The areas in the sampling frames contained well over 80 percent of each subgroup. A model was used to extrapolate the results of the surveys to the total region the data were intended to represent (Southwest for Mexican-Americans, Dade County for Cuban-Americans, and New York City and selected surrounding counties for Puerto Ricans.) The model assumed similar health characteristics for persons inside and outside the areas of heavy concentration of minorities, within specific economic and demographic classes.3

The HHANES estimates appeared plausible, and users did not report any problems with the data. Of course, the modeling accounted for less than 20 percent of the total so that it was unlikely that even important problems with the model would introduce serious errors in the results. Use of models would be much more uncertain for API subgroups or American Indians or Alaska Natives. In 1990, 37 percent of APIs and 47 percent of American Indians or Alaska Natives lived in areas that were under 10 percent minority. Some years after a census, these percentages will be even greater. A procedure similar to HHANES that avoided excessive screening would probably be restricted to no more than 50 percent of APIs and about 40 percent of American Indians or Alaska Natives. The validity of data from models that account for the remaining 50 or 60 percent of the total is open to question.

The sampling research for NHANES III mentioned earlier also explored the use of other kinds of sampling frames, in particular, telephone listings of households with Spanish surname, or distinctive names for other minority groups, and subscribers to foreign language newspapers or magazines. None had high enough coverage to be useful.

[ Go to Contents ]

6.3 ACS as a Sampling Frame

At this point, it seems appropriate to note the importance of the ACS as a potential source of information for any or all of the individual population groups of interest. Each is identified, recorded, and entered on the ACS data file. If, as expected, the ACS becomes operational in 2003, it will completely -- and virtually immediately—obviate any need for "other" sources of information for those characteristics regularly included in the ACS as core items. To the extent that the ACS also includes periodic, supplementary modules covering the full ACS sample, those data will provide ample sample sizes for each of the population subgroups of interest. Finally, if the need for specialized data exists and cannot be met by any of the approaches described in this report, the ACS lends itself as an efficient and timely source of sample for the subpopulations to be included in a new inquiry, either through supplementary questions added to the core, or the inclusion of a full module in one or more months of interviewing. If those approaches prove infeasible, a separate inquiry can be initiated, using a recent sample previously included in the ACS.

The U.S. Census Bureau has strict confidentiality rules, and it will be difficult, if not impossible, for other statistical agencies to gain access to names and addresses in the ACS. The U.S. Census Bureau has stated that, under its authorizing legislation (Title 13, U.S.C.), it cannot legally make available any personal information collected under Census authority, including names or addresses. In effect, this means that only the U.S. Census Bureau can conduct the interviews (or carry out the measurements) for which the ACS provides a sampling frame. Thus, in attempting to supplement samples for existing inquiries, the statistical agencies responsible for the various surveys will have to decide whether joint responsibility—one contractor conducting most of a survey and the U.S. Census Bureau carrying the same functions for the sample supplement—is operationally feasible.


Footnotes:

1 "Evaluation of Design Options in HHANES '97," report prepared by Westat, May 31, 1994.

2 "Geographic Oversampling in Demographic Surveys of the U.S.," report reppared by Westat, May 31, 1994

3 Estimation in the Southwest Component of the Hispanic Health and Nutrition Examination Survey," 1982-84, by Gonzalez, Ezzati, Lago, and Waksberg, Proceedings of the SRMS of the American Statistical Association, 1985.


Section 5 Table of Contents Section 7


Where to?

Top of Page
Table of Contents of Report

Home Pages:
Human Services Policy (HSP)
Assistant Secretary for Planning and Evaluation (ASPE)
U.S. Department of Health and Human Services (HHS)

Last updated 9/14/00