Assessment of Major Federal Data Sets for Analyses of Hispanic and Asian
or Pacific Islander Subgroups and Native Americans:
Extending the Utility of Federal Data Bases
1. Introduction
-
Content of Report
-
Subgroups and Databases
-
Task Findings
-
Limitations of Report
This Task 3 report has two objectives:
-
To examine the ability of the selected statistical databases to provide data
on detailed Asian and Hispanic subgroups and on American Indians and Alaska
Natives with adequate precision for most practical uses; and
-
To suggest and evaluate methods that could be used to enhance this ability
for surveys with insufficient sample size for the provision of reasonably
reliable statistics on these minorities.
The report begins with a brief summary of the Task 2 findings; Task 2 inventoried
the major characteristics of the surveys and other databases covered in this
report which relate to their ability to provide data for the subpopulations.
It continues in Section 2 with a discussion of
the methodological and statistical issues that need to be taken into account
in determining the applicable standards for accuracy, and possible courses
of action to achieve this action when current samples are too small to provide
estimates with the desired reliability.
Section 3 indicates which surveys meet standards
of reliability for at least some of the race/ethnic subgroups. To assist
ASPE in determining the conditions for which the data would be useful, we
have used several alternate levels of reliability. (In
Section 2.1, we suggest some guidelines for
choosing what could be considered adequate precision for specific kinds of
analyses.) Most of the discussion of adequacy or inadequacy of the data refers
to the size of the sampling errors. The sampling errors in a survey depend
on both the sample size and the surveys design effects; average design
effects are reported in Section 3.3. However,
we also point out surveys for which only some of the subgroups are identified
on the data file, and consequently, complete analyses of all subgroups are
not possible. Idiosyncratic features of some of the surveys that complicate
the discussion also are noted.
Sections 4, 5, and
6 describe methods of overcoming the small sample
sizes for the detailed race/ethnic groups in most of the surveys. Sections
4 and 5 cover relatively
inexpensive methods; combining several years of data for surveys conducted
annually in Section 4, and combining results of
different surveys for items collected in common, in
Section 5. When these two procedures for enhancing
the quality are not sufficient, sample supplementation is required.
Section 6 provides information on the sample designs
that are efficient for minority supplementation and other statistical procedures
that could be used.
Section 7 briefly summarizes the material in this
report.
[ Go to Contents ]
For ease of reference, we repeat key items from the Task 2 report, specifically
the subgroups of interest, the databases and the appropriate reference dates
for the databases. The subgroups of interest are:
|
Hispanics
|
Asian or Pacific Islanders
|
American Indian or Alaska Natives
|
Mexican Americans
Puerto Ricans
Cubans
Central or South Americans
Other Hispanics
|
Chinese
Filipinos
Japanese
Asian Indian
Korean
Vietnamese
Hawaiian
Other Asian or Pacific Islander
|
American Indians or Alaska Natives
|
The databases and the appropriate reference dates are:
|
|
|
Database
|
Reference dates
|
|
|
Census
|
Census 2000
|
April 2000
|
|
|
American Community Survey
|
2003, proposed
|
|
|
Current Population Survey
|
|
|
|
March
|
March 1998
|
|
|
Monthly
|
Average month 1998
|
|
|
Survey of Income and Program Participation
|
Wave 1, 1996 Panel
|
|
|
|
|
|
NCHS/CDC
|
National Health Interview Survey
|
1998
|
|
|
National Vital Statistics System
|
|
|
|
Natality
|
1997
|
|
|
Mortality
|
1997
|
|
|
National Survey of Family Growth
|
1995
|
|
|
National Immunization Survey
|
1996
|
|
|
National Health and Nutrition Examination Survey
|
1999
|
|
|
|
|
|
AHRQ
|
Medical Expenditure Panel Survey
|
1999
|
|
|
|
|
|
HCFA
|
Medicare Current Beneficiary Survey
|
Early 1998, 4 Panels
|
|
|
|
|
|
SAMHSA
|
National Household Survey of Drug Abuse
|
1997-1998
|
|
|
|
|
|
NCES
|
National Household Education Survey
|
1996
|
|
|
Early Childhood Longitudinal Survey
|
|
|
|
Birth Cohort
|
Year 1, 2000
|
|
|
Kindergarten Cohort
|
Year 1, Fall 1998
|
|
[ Go to Contents ]
The Task 2 report contains an inventory of selected major Federal databases,
with particular emphasis on information relating to the ability of each database
to provide reasonably reliable statistics on the race/ethnic subgroups of
interest. The crucial information consists of:
-
The population coverage for each survey, the main focus of the content of
the questionnaire, and the publication policy.
-
Whether the survey or other data collection system currently obtains each
respondents race/ethnic background in the detail required and the question
wording used (the question wording and classification detail are expected
to change in the next few years to reflect OMBs revised race/ethnic
reporting system.)
-
The approximate sample sizes for the race/ethnic subgroups of interest.
-
Interview policy for the
survey.(1)
Appendix B of the Task 2 report contains
a detailed description of the way race/ethnicity is currently obtained in
each database. The information is summarized below:
-
American Indian or Alaska Native. All of the databases identify this
group. In addition, the census, the U.S. Census Bureau surveys, and many
(though not all) of the surveys sponsored by the NCHS distinguish between
American Indians and Alaskan natives, but the vital statistics systems do
not, nor do those sponsored by NCES. (We note that although the question
wording on this racial group has not changed significantly in the last few
decades, there are problems of historical comparability. The 1990 Census
reported a sharp increase in the number of American Indians or Alaska Natives
over the number in the 1980 Census, much greater than can be accounted for
by natural increase. Most demographers attribute this change to a heightened
interest of persons of American Indian ancestry to acknowledge an affiliation
with this racial category.)
-
Hispanic subgroups. All databases except MCBS and NHES identify each
Hispanic person as Mexican-American, Puerto Rican, Cuban, or other Hispanic.
In the census, ACS, CPS, and SIPP the "other Hispanics" are further classified
as Central American, South American, or other Hispanic. A combined Central
and South American classification is used in NVS.
-
Asian and Pacific Islanders. Considerable variation exists in the
way APIs are asked to describe themselves and in the detailed groups that
are identified. The decennial census, ACS, NHIS, NHANES, MEPS, NHSDA, and
ECLS-K obtain the full level of detail. In the National Vital Statistics
Systems (both natality and mortality), all states classify Chinese, Japanese,
Hawaiian and Filipinos separately. Vietnamese, Asian Indian, Korean, Samoan,
and Guamanian are also separately identified in states that contain about
two-thirds of the population in these groups; in the remainder of the U.S.
they are combined into an "all other API" category. Since the ECLS-B sample
is based on birth registrations, the same classifications are available.
CPS, SIPP, NSFG, NIS, and NHES simply identify APIs as a single group, without
any further detail. MCBS separates Native Hawaiian or Pacific Islanders from
Asians, but does not obtain any further detail. This breakdown used by MCBS
is consistent with the recent OMB Guidelines (see
Appendix C of the Task 2 Report) that will
be adopted by all surveys over the next few years. For simplicity we continue
to use the term API in this report.
[ Go to Contents ]
This Task 3 report is essentially limited to the effect of sampling errors
on the reliability of the various databases, and possible methods of improving
precision when sample sizes are inadequate. There are, of course, other factors
that affect the quality of surveys and data files. A complete discussion
of these factors is beyond the scope of this report but we wish to call
particular attention to several specific issues:
-
The reports are intended to be a general reference to a potential audience
of analysts and policy makers seeking information on the possible use of
these data bases as a source of data on the race/ethnic group of interest,
rather than as a technical handbook. We suggest that users, who are not
thoroughly familiar with the content of the database being considered and
the procedures involved in data collection and data processing, seek appropriate
technical assistance from the staff of the relevant agency or from documentation
of the survey methods. There are two items that should be particularly examined.
-
Are the sample sizes shown in tables 3-3,
3-4, and 3-5
still applicable, or have there been important modifications made in the
surveys sample design. We note that small changes in sample size of
the order of 10 or 15 percent will have only a negligible effect on the
conclusions drawn in this report, and they can be ignored. Important changes
in the sample, however, should be taken into account.
-
What is known about sources of errors in the data, including those arising
from possible problems in identifying the race/ethnic groups, respondents
lack of information on some of the subject matter items or misunderstanding
of various questions, and potential effects of nonresponse. For example,
NCHS studies indicate there may be important issues in death rates for Hispanics,
Asian and Pacific Islanders, and American Indians and Alaskan Natives due
to misunderstanding of the race question on death certificates or in the
censuses and surveys used as the denominators of the death rates. Similar
reporting errors and differential nonresponse could affect other statistics.
-
It is possible to think of sampling errors in a somewhat broader sense than
the term is used in this report. Statisticians distinguish between descriptive
and analytic uses of survey data. Descriptive uses provide a profile of a
finite population, the population that existed during the period of data
collection. Analytic uses occur when survey results examine a process, frequently
a "cause and effect" relationship, with the population at the time of data
collection considered as a sample of an infinite population. The particular
year for the time periods of the study can be considered a single observation
from a stochastic process, with neighboring years reflecting additional
observations, (for a few years, before long term trends disrupt this model
of behavior). NCHS views birth rates as subject to stochastic variation.
Similarly, analytic uses would include examination of the effect of educational
attainment on income, the relationship of obesity to various health conditions,
etc.
Stochastic processes are subject to sampling errors arising from the erratic
variations over time of the statistics studied. In most of the data bases
examined for this report, the effect of this source of variation will be
trivial compared to the sampling errors due to the sample sizes for data
collection. However, the NVS and the Census short forms do not have any sampling
errors, but their analyses are subject to a small amount of stochastic variation.
NCHS has carried out studies of their effects on birth and death rates, and
more detailed information can be obtained from the agency. We note that this
Task 3 report is restricted to limitations of the data due to sampling error.
-
Information from several sources, each of which is subject to sampling errors
and/or other limitations, often is combined for analysis. For example although
the numerators of birth and death rates come from vital statistics records
that are not subject to sampling errors, the denominators are derived from
census reports; some of the census data are based on sample surveys, and
others on extrapolation of census data to intercensal time periods. This
report does not deal with such special situations, but users who anticipate
such analyses should take the more complex sampling into account and, if
necessary, seek advice from the agency technical staff.
1. The sample sizes are used to estimate the sampling
errors that are applicable to subgroup analysis, and thus to a determination
of whether subgroup data for each survey can be obtained with a reasonable
degree of precision. The effective sample size, in which the actual
sample sizes as shown in Tables 3-3 to
3-5, are deflated by the design effect, is
a better guide to the sampling error. Section
2.3 of this report contains a discussion of design effects and Tables
3-6 to 3-8
show average effective sample sizes.
[ Go to Contents ]
Where to?
Top of Page
Table of Contents of Report
Home Pages:
Human Services Policy (HSP)
Assistant Secretary for Planning and Evaluation
(ASPE)
U.S. Department of Health and Human Services
(HHS)
Last updated 9/14/00