Assessment of Major Federal Data Sets for Analyses of Hispanic and Asian or Pacific Islander Subgroups and Native Americans: Inventory of Selected Existing Federal Databases. Particular Issues Relating to Content of the Inventory


There are a number of issues that will affect the ability of the surveys to provide statistical data on the minority subgroups, or in some cases to permit data from several surveys to be combined for improved reliability. A detailed discussion of these issues will be included in the Task 3 report, but it seems useful to call attention to them now.

  • Some of the surveys do not ask Hispanic or Asian respondents to identify themselves or, in some cases, do not record the specific subgroups. In other cases, the question wordings vary among some of the surveys, although usually within a survey the questions are asked in a consistent way over long periods of time. It seems unfortunate that Government statistical agencies cannot agree on a common set of questions. The proposed revisions in the standards for the federal collection of race and ethnicity, developed by the Office of Management and Budget (OMB), should help resolve this issue. It is our understanding that the revised race/ethnic standards will be in use for virtually all federal surveys by mid decade. The revised standards, as currently proposed, are set forth in Appendix C.

    Alternative wordings probably tend to identify race/ethnicity reasonably consistently for most members of the subpopulations, but there are important exceptions. For example, in the 1980 Census, many Hispanics in New Mexico, Arizona, and to a lesser extent in Texas and California, reported themselves as "Hispanic – Other" rather than Mexican-Americans , presumably because their ancestors resided in these areas at the time they were annexed by the United States in the early 19th century, and they do not consider themselves of Mexican ancestry. (Further, until recently, the New Mexico birth certificate used the category, "U.S. Southwest", which referred to very long-time residents who had immigrated from Mexico.) We assume similar reporting in CPS and SIPP, since the questions essentially are the same as in the Census questionnaire. However, Hispanic HANES and NHANES III probed more intensively and classified such persons as Mexican-Americans. Another example of differences in the manner of identification exists in the questions on race/ethnicity in the National Vital Statistics System–Natality, as contrasted with those asked in the Census and most current surveys. Natality data are formulated according to the race of the mother; information is not obtained on the race of the child, while the surveys collect race/ethnicity for each individual.

  • Surveys in which the subgroups are not identified are useless for subgroup analysis (e.g., Asian and Pacific Islander subgroups are not identified in SIPP). If the subject matter of a particular survey could shed light on important social, economic, or health conditions of minority groups, it might prove very useful and desirable to add the appropriate questions identifying the subpopulations to the survey instrument and, thus, record the full detail for the subgroups. Such a decision would add only trivially to the length of interview or the complexity of the survey, although even slight changes in survey content can create complications in software and take time to implement.
    For purposes of this study, we have developed approximations of the number of sample persons for the race/ethnic subgroups not identified on the data tapes and included them in Tables A-1 to A-3. These approximations also should prove informative in any consideration of the desirability of asking the sponsoring agency to add such identification.
  • We mentioned earlier that the sample size for the survey is a dominant factor in determining the applicable sampling errors, but it is not the only one. Some of the household surveys now oversample Hispanics in order to reduce sampling errors for this minority; the methods of oversampling are not the same, and they have varying effects on reducing the sampling errors. The NHANES focuses on providing specified sample sizes for a group of age-sex-race/ethnic groups, which sharply reduces the effective sample size for the total population. Whenever applicable, such features of the sample are described in the accompanying survey description. With the exception of the two Early Childhood Longitudinal Studies, none of the surveys oversamples Asian and Pacific Islanders.
  • An important feature of the sample design that influences the sampling error is the survey design effect. Design effects reflect increases in the variances arising both from the clustering and from departures from an equal probability sample, and decreases from the use of stratification and estimation procedures. (The increases usually dominate.) Varying sampling rates are sometimes used among geographic areas, age groups, or income class. The discussion in the Task 3 report on the precision of the survey estimates and actions that could be taken to obtain statistically reliable results for the designated subgroups will include detailed information on design effects, as well as on certain other features of the various designs which also influence the sampling errors.
  • When the sample size for a survey does not provide reasonable reliability for subgroup analyses, and combining several years of data (or cycles if it is not an annual survey) does not improve the statistics sufficiently, a solution is a large increase in the survey sample size, probably at least doubling or tripling the sample, in addition to introducing a massive field screening operation to locate a sufficient number of members of the subgroup. A model of how this can be done efficiently is the Hispanic Health and Nutrition Examination Survey (HHANES), carried out by Westat for the National Center for Health Statistics in 1982-84. It should be noted that a number of the identified surveys oversample blacks as well as Hispanics and other groups and the supplementation for the race/ethnic subpopulations would have to be superimposed on the sample design in use. There are no technical difficulties in such an expansion of the sample, but it is not likely that any part of the supplementation can be compensated for by a reduction of the black or white population.
  • There has been considerable discussion in the press, in Congress, and by statisticians and other social scientists, about the Bureau of Census' proposed plans to conduct some of the follow-up on nonrespondents in the Year 2000 Census on a sample basis and, also, to adjust the Census data for the expected undercounts on the basis of a sample survey. The proposal for a sample follow-up of nonrespondents has been dropped as a consequence of a Supreme Court ruling that a full count was necessary for apportionment of Congressional seats among the states. The Bureau proposes to conduct a sample survey to measure undercoverage, but the results will not be used in the official population counts used for apportionment. However, adjusted figures may be prepared and both adjusted and nonadjusted counts made available for use by state officials, and, more generally, by social scientists. It is unlikely such an adjustment procedure will take undercounts for the specific subpopulations into account; in the past, adjustments were for all Hispanics, or all Asian and Pacific Islanders as a group. The undercounts are only one source of quality problems in the Census. We note that many of the current surveys use census figures adjusted for undercoverage as the population controls for poststratification, but this practice has only a minor effect on the quality of the data for subpopulations.