Assessment of Major Federal Data Sets for Analyses of Hispanic and Asian or Pacific Islander Subgroups and Native Americans: Inventory of Selected Existing Federal Databases. Content of Report


This Task 2 report is the first of the two substantive reports in the study to assess the capability of a number of federal surveys: (1) to provide data on the major subgroups of Hispanic, Asian or Pacific Islanders (API), and on American Indian or Alaska Natives, in order to analyze the health, education status, and social and economic well being of these groups; (2) to identify barriers to developing such data; and (3) to identify options for improving the capacity to obtain statistically reliable data about these populations. The report contains information on the applicable sample sizes, and an inventory of existing Federal databases for most of the major demographic, social, economic and health-related surveys carried out by or for U.S. Government agencies. Most of the databases consist of surveys that are carried out annually, or at other regular intervals, so that they provide reasonably current statistical information. However, two of the databases are somewhat different, and do not, strictly speaking, fall into the category of surveys. One is the decennial census; the other is the National Vital Statistics System, which contains data from the birth and death registration systems. These two databases are such important sources of information on demographic characteristics, economic status, and selected health items that it seemed appropriate to include them.

The ability of a survey to provide data on population subgroups with reasonable precision depends on two factors:

  1. The questionnaire or other instruments used for data collection must identify the subgroups and record the information. In turn, the detail also must appear in the microdata file. This is obviously essential and Appendix B describes both the specific questions on race and ethnicity used in each survey and the detailed race/ethnicity codes which are recorded; and
  2. The sampling errors on estimates of the characteristics of the subgroups need to be low enough for the statistics to be reasonably reliable. The sampling errors are mostly, but not exclusively, dependent on the sample size in each survey. Some of the surveys oversample Hispanics, which reduces sampling errors for the Hispanic subgroups. However, since the surveys operate with fixed budgets, the increased Hispanic samples result in a reduction in sample size for other population groups, which increases the sampling errors for Asian and Pacific Islanders and American Indians. In addition, the survey designs need to be taken into account in considering the appropriate sample sizes. For example, although labor force information is obtained monthly in the Current Population Survey (CPS) conducted by the Bureau of the Census, supplemental items designed to collect a wide variety of other social and economic information are added to individual months during the year. Consequently, only the monthly sample size applies to such information as income, family status, migration, school enrollment, etc.; in the case of the labor force data, on the other hand, information for different months can be combined to increase the sample size and produce quarterly, semi annual, or annual labor force estimates with improved reliability. In another example, the sample design for the National Health and Nutrition Examination Survey (NHANES) is focused on the need to analyze health conditions for rather narrow age-sex groups for Mexican-Americans, blacks, and all other groups. The precision of estimates for the total population is considered of secondary importance. The requirement for approximately equal sample sizes in the various age-sex domains influences the sampling errors for the statistics on the total population, and on data for all Hispanics and all Asians and Pacific Islanders.

The attached tables (Appendix Tables A-1 through A-3) provide detailed information on the sample sizes. The inventory (Appendix B) contains a concise description of the purpose of the survey, the kinds of data obtained, interview methods, and publication policy, as well as the agency website address for those desiring additional detail. Note that the inventory description is limited to what is collected and what is available on the micro-data file, since these are most relevant to the assessment. We also have included information on whether and how the subpopulations are identified, and whether bilingual interviewers are used. Table A-4 describes the availability of information on citizenship, year of immigration, and whether foreign born.

Subgroups and Databases Examined

The subgroups of interest are:

  1. Hispanic:
    • Mexican-American
    • Puerto Rican
    • Cuban
    • Central or South American
    • Other Hispanic
  2. Asian or Pacific Islander: (Note that the new OMB standards (Appendix C) splits this category into "Asian" and "Native Hawaiian or Other Pacific Islander.")
    • Chinese
    • Filipino
    • Japanese
    • Asian Indian
    • Korean
    • Vietnamese
    • Hawaiian
    • Other Asian or Pacific Islander
  3. American Indian or Alaska Native


The databases examined and the appropriate reference dates are:
Data set Reference date
Census 2000 April 1, 2000
American Community Survey 2003, proposed
Current Population Survey-March March 1998
Current Population Survey-Monthly Average month, 1998
Survey of Income and Program Participation Wave 1, 1996 Panel
National Health Interview Survey 1998
National Vital Statistics System-Natality 1997
National Vital Statistics System-Mortality 1997
National Survey of Family Growth 1995
National Immunization Survey 1999
National Health and Nutrition Examination Survey 1999
Medical Expenditure Panel Survey 1999
Medicare Current Beneficiary Survey Early 1998, 4 panels
National Household Survey on Drug Abuse


National Household Education Survey


Early Childhood Longitudinal Survey - Birth Cohort

Year 1, 2000

Early Childhood Longitudinal Survey - Kindergarten Cohort

Year 1, Fall 1998

Since both sample sizes and designs are subject to changes over time as a result of budget actions, congressional or programmatic initiatives, or baseline revisions, it is important that users or interested parties refer to current documentation or inquire of the appropriate agency whether any important changes in sample size or design have been made.

It is important to note that these reports are to serve as a general reference to a potential audience of analysts and policy makers seeking information on the possible uses of these databases as a source of data on race/ethnic groups of interest, rather than as technical handbooks. We would urge users to seek appropriate professional assistance or expertise, either from the relevant agency or from other sources, to deal with specific technical issues.

1  See page 2 of NCHS Report, Sample Design:  Third National Health and Nutrition Examination Survey, Series 2, No. 13, for a more detailed discussion.