Assessment of Major Federal Data Sets for Analyses of Hispanic and Asian
or Pacific Islander Subgroups and Native Americans: Extending the Utility
of Federal Data Bases
7. Summary of Findings
-
Most of the databases show the race/ethnic identification of each person
in sufficient detail to permit subgroup analysis, but the full detail is
missing in a few surveys. All statistical agencies are expected to convert
to the new race/ethnic classifications within the next few years. Thus, this
would be the appropriate period in which to attempt to get uniformity in
the detailed race/ethnicity codes to be entered in the data records, if ASPE
believes this would permit useful improvements in Federal statistics.
-
The National Vital Statistics data sets and the 100 percent data from the
decennial censuses are not subject to sampling errors for descriptive analyses,
and there are therefore no impediments to subgroup analysis. The long form
data in Census 2000 and the ACS are based on such large samples that analyses
could be carried out on even very small subgroups with the results subject
to only trivial sampling errors.
-
None of the other surveys provide sufficient precision to permit sophisticated
analysis of all subgroups. The larger data sets CPS-March, NHIS,
and NHES contain adequate samples of Mexican-Americans, and analyses
based on cross-classification are possible. However, only simple distributions
could be carried out reliably for most of the other race/ethnic subgroups.
CPS-Monthly, SIPP, NIS, and MEPS also provide satisfactory data for
Mexican-Americans, but even simple distributions for most of the other subgroups
would have poor reliability. In the other surveys, only limited analysis
of some of the larger subgroups could be carried out with any confidence.
-
Multi-year averages, of course, would improve the precision. Five-year averages
will provide samples large enough to satisfy analytic needs for most Hispanic
subgroups for the larger data sets, i.e., CPS and NHIS. Three-year averages
in the current NHANES would provide reasonably precise data for
Mexican-Americans, and 5 or 6-year averages would permit analyses of detailed
age-sex classes. However, in all surveys the Cuban sample and the samples
of most of the API subgroups would still be too small for anything but simple
analyses. The other data sets would also be improved by averaging over time,
but the effective sample sizes of many subgroups would still be small.
-
It is probably not practical to obtain multi-year averages for the periodic
(as distinct from annual) surveys. These comprise NSFG, SIPP, ECLS-B, and
ECLS-K. We also include NHES in this category since, although it is annual,
the main data content varies from year to year.
-
Annual averages of unemployment rates for each subgroup in CPS would have
reasonable precision and could be obtained with relatively little effort.
Annual averages for other labor force items would be only a little better
than monthly statistics.
-
There are a few items that appear on more than one survey, and combining
the results would improve precision. However, this is a fairly rare occurrence
and can satisfy only limited data needs.
-
If the U.S. Census Bureau goes ahead with its plans for the ACS (currently
scheduled to start in the year 2003), it could be a major resource for subgroup
analysis. First, the ACS will be able to supply annual statistics on a variety
of demographic, social, and economic characteristics for each subgroup. Secondly,
it could become the vehicle for obtaining much needed information for these
groups, either through the addition of questions to the ACS, or through a
special effort which used the ACS as a source of sample. Finally, it could
become the sampling frame for the selection of supplemental samples for other
surveys, substantially reducing the cost of sample supplementation. However,
in such cases, a number of bureaucratic hurdles would have to be overcome.
Whether this could be done to the satisfaction of both the U.S. Census Bureau
and the sponsoring agencies is uncertain.
-
Sample supplementation for most surveys will be quite expensive if use of
the ACS is not practical. Statisticians have developed devices for reducing
the sampling and screening costs for small population groups, but a considerable
amount of screening would still be required. Also, it is unlikely that the
devices would be effective for all subgroups.
-
We would like to repeat the caveats mentioned earlier in this report:
-
The sample sizes provided in Tables 3-3 to 3-5 which were used to estimate
effective sample sizes and to ascertain whether surveys achieved reasonable
standards of precision, refer to specific time periods (reported in
Section 1.2). The samples in most Federal
surveys are fairly stable, but changes are made from time to time. Although
small changes in sample size in the order of 10 or 15 percent will have only
a negligible effect on the conclusions drawn in this report, much larger
revisions occasionally occur. Before going ahead with a study of a subgroup
in a particular survey, the analyst should refer to the documentation for
the survey to see whether the sample sizes in Tables 3-3 to 3-5 are still
applicable. Any important changes in the sample should be taken into account.
-
The sample sizes in this report refer to each surveys total sample
for the race/ethnic subgroup. When the analysis is restricted to a subclass
of the total (e.g., all males, or females, persons in a specific age group,
etc.) the sample size should be adjusted accordingly.
-
In a few surveys, a subsample is used for some variables. For example, NHIS
frequently collects selected information from only one person in each sample
household. Similarly, NHANES uses random subsamples of the full sample for
some items. An analyst should ascertain whether or not the full sample is
used for the variables of interest, and determine whether the sample sizes
in Tables 3-3 to 3-5 are appropriate.
-
The design effects reported in Table 3-2, which are necessary for the estimation
of effective sample sizes, are averages over a broad set of items, and reflect
variables for which correlations among household members are not excessive.
There are some items for which almost all household members have the same
value, e.g., presence of health insurance, poverty status, urban-rural residence,
region of residence. The design effects are much larger for such items.
Section 3.5 discusses methods of dealing with
such situations.
-
Finally, it is important to recognize that considerable "noise" is to be
found in the statistics. For example, small differences in reporting of
race/ethnicity among some of the databases, minor variations in sample size
from year to year even when there are no changes in sample design, and the
use of average design effects, which do not reflect the variation among items,
are all sources of "noise." As a result, the conclusions drawn in this report
should be considered as approximations, but are sufficiently accurate as
to be a useful guide on the kinds of analyses of race/ethnic subgroups that
are possible with the various databases.
Where to?
Top of Page
Table of Contents of Report
Home Pages:
Human Services Policy (HSP)
Assistant Secretary for Planning and Evaluation
(ASPE)
U.S. Department of Health and Human Services
(HHS)
Last updated 9/14/00