| Section 3 | Table of Contents | Section 5 |
It is clear that with the exception of the National Vital Statistics data sets, the Census 2000, and the ACS, the surveys can provide only limited information on race/ethnic subpopulations. The Mexican-American samples are adequate in most of the surveys but cross-classifications will rarely be possible for the other groups. Sections 4, 5, and 6 describe ways of enhancing the samples. In this section we discuss what is probably the simplest and least costly way of doing this, that is combining several years of data. The discussion, of course, omits the NVS, Census 2000, and the ACS, since the existing sample sizes are fully adequate.
[ Go to Contents ]
Combining years of data is only practical for surveys that are carried out one or more times per year. Some of the surveys are conducted at periodic intervals. Although it would be possible to combine several cycles of such surveys, the length of time covered probably 10 years or more would make the results of doubtful utility. Also, SIPP uses the same households over a number of years, so that combinations of years do not provide much additional information.
The annual surveys for which combinations of years are practical are the CPS (March and monthly), NHIS, NHANES, NIS, MEPS, MCBS, and NHSDA. NHES has been omitted since there is a different emphasis in subject matter each year, so that it falls closer to periodic than annual surveys.
The plans for current NHANES implicitly assume that the detailed analyses of the survey data will be based on averages over a number of years. Each year of current NHANES is based on a representative sample of about 5,000 persons in total, far too few to provide acceptable data for the many age-sex-race/ethnicity domains NCHS considers important to study. Combinations of years will be used for analyses of these domains, probably up to 6-years for the most detailed groups. In some ways, this can be considered a model for annual averages for other surveys.
[ Go to Contents ]
Section 2.6 of this report pointed out that the maximum number of years for which combined data would be meaningful depended on the specific item. Most health related items and fertility patterns change rather slowly over time, and the most recent 3 to 5-year averages will generally reflect current conditions reasonably well. In fact, the NHIS has published 3-year average data for Asian and Pacific Islanders (as a combined group), so a precedent exists. Economic statistics, however, are likely to be much more volatile; thus the time period should be considerably shorter. (However, in the absence of any other data, even somewhat outdated information such as a 3-year average, will be better than relying on the decennial census as the source of information for the full intercensal period. It is interesting to note that the ACS is planning to combine up to 5 years of data in order to produce reliable, small area data.)
To provide the greatest flexibility for users of this report, we will examine the improvement in precision for three combinations of years 2, 3, and 5 years.
[ Go to Contents ]
The effective sample sizes for combined years are shown in Tables 4-1 to 4-3. It can be seen that except for NHANES, the effective sample sizes for 2 years are a little less than twice the sample for a single year; similarly the 3 and 5 year effective samples are not quite 3 or 5 times the annual sample sizes. All of the surveys use clustered sample designs and a sequence of several years samples are mostly in the same clusters, or in neighboring ones. The lack of independence among several years samples tends to reduce the effective sample size. We have estimated that the reduction in effective sample size over a 2-year interval is about 17 percent; the reduction for a 3-year period is 25 percent; and the reduction for 5 years is about 35 percent. These come from estimated year-to-year correlations in the sample: year-to-year correlations are expected to average .20, 2 years apart correlations are .10, 3 years apart are .07, and 4 years are .05. The current NHANES samples are independent across years, and, therefore, there is no reduction in effective sample size.
The effective sample sizes in Tables 4-1 to 4-3 are approximations based on even more assumptions and averages than the numbers in Tables 3-6 to 3-8. The sample sizes in each year are subject to sampling errors, and to the vagaries of erratic response rates. This is especially true for the minority subgroups with very small samples; the samples for Cubans or Hawaiian could differ in neighboring years by 10 or 20 percent from the year reflected in our tables. Also, the year-to-year correlations, resulting from the similarities in characteristics in neighboring households, are average values expected over a set of items, similar to the use of average design effects. Nevertheless, the numbers shown in Tables 4-1 to 4-3 indicate the order of magnitude of effective sample sizes and reveal whether useful analyses are possible from each of the data sets.
One feature of the monthly CPS sample should be noted. The monthly CPS includes two kinds of data sets: (1) labor force information and critical demographic items (e.g., age, sex, household relationship, etc.) obtained each month; and (2) supplemental items covered in months other than March. The supplemental items (based on the monthly CPS sample sizes) that are likely to be of greatest interest are number of children ever born, related fertility information, and school enrollment. Voting registration and behavior in the most recent election is obtained every second year, but it is doubtful that combining pairs of years would be meaningful. Voting patterns in presidential and non-presidential years are very different, and such combinations would probably not be analytically revealing. The entries for CPS in Tables 4-1 to 4-3 are restricted to the supplemental items. The annual sample sizes for labor force information, of course, are much larger than the numbers shown, since they are comprised of 12 monthly samples (see Section 4.5.) The supplemental items included in the March interview are based on the same sample as the other monthly supplements, except for Hispanics for whom the sample is doubled.
Since the MEPS and NSFG samples are taken from NHIS respondents, it would be possible to supplement their samples with additional names and addresses from the NHIS. These names and addresses would be a few years old, and thus it may be more convenient to simply combine multiple years of MEPS respondents. The exact timing of these surveys, and the associated costs, would have to be examined before a decision is made on which approach would be preferable.
[ Go to Contents ]
Section 3.6 discussed the ability of the surveys to produce reasonable precision in the analysis of the subpopulations or for crossclassifications within these subpopulations. ("Reasonable precision" is based on a subjective judgment of the importance of meeting the various standards described earlier, i.e., CVs of 30 percent, 20 percent, and 10 percent for prevalence rates of .01, .05, .10, .15, and .20.) We will use the same criteria to evaluate the analytic ability of combinations of several years of survey data. As in the case of data for a single year, some studies may need greater precision and others less, and analysts should consider whether they need to modify the summary below.
Three or 5-year averages for CPS supplemental items collected in a single month for a given year would provide sample sizes large enough to satisfy analytic needs for most Hispanic subgroups, although only limited cross-classifications would be possible for Cuban-Americans. Fairly detailed analyses would be possible for American Indians or Alaska Natives, and for Chinese and Filipinos. Less detailed cross-classifications would be available for most of the other API subgroups, and only simple distributions of Hawaiian would have reasonable reliability.
The NHIS Hispanic sample is quite large, and a 2 or 3-year combination will provide quite reliable data, including cross-classifications, for all Hispanic subgroups, and moderately detailed cross-classifications for Cuban-Americans. A 5-year average will permit quite detailed analysis. A 5-year average of the American Indian or Alaska Native data set will satisfy almost all the requirements. A 3-year average could be used for Chinese and Filipinos, but 5 years are probably necessary for the other API subgroups.
NHANES has a very large sample of Mexican-American and averaging over time will permit fairly detailed cross-classification analyses. The sample was deliberately set up with multi-year averages in mind. None of the other minority subgroups would be helped enough for even simple prevalence rates to have adequate precision.
| Data set | Total |
Mexican- American |
Puerto Rican |
Cuban |
Central or South American |
Other Hispanic |
|---|---|---|---|---|---|---|
| CPSMarch | ||||||
| 2 years | 12,537 | 37,727 | 1,324 | 523 | 1,875 | 1,086 |
| 3 years | 16,891 | 10,411 | 1,784 | 704 | 2,527 | 1,463 |
| 5 years | 24,773 | 15,269 | 2,617 | 1,033 | 3,706 | 2,145 |
| CPSMonthly | ||||||
| 2 years | 6,274 | 3,863 | 663 | 262 | 940 | 546 |
| 3 years | 8,453 | 5,204 | 893 | 353 | 1,267 | 736 |
| 5 years | 12,398 | 7,633 | 1,310 | 518 | 1,858 | 1,079 |
| NHIS | ||||||
| 2 years | 24,654 | 15,441 | 2,620 | 1,298 | 3,495 | 1,802 |
| 3 years | 33,217 | 20,804 | 3,530 | 1,748 | 4,709 | 2,428 |
| 5 years | 48,718 | 30,512 | 5,178 | 2,564 | 6,907 | 3,561 |
| NIS | ||||||
| 2 years | 6,232 | 4,534 | 511 | 127 | 676 | 386 |
| 3 years | 8,397 | 6,109 | 689 | 171 | 911 | 520 |
| 5 years | 12,316 | 8,960 | 1,010 | 251 | 1,337 | 762 |
| NHANES | ||||||
| 2 years | 3,164 | 3,000 | 48 | 20 | 64 | 32 |
| 3 years | 4,746 | 4,500 | 72 | 30 | 96 | 48 |
| 5 years | 7,910 | 7,500 | 120 | 50 | 160 | 80 |
| MEPS | ||||||
| 2 years | 7,480 | 5,080 | 835 | 314 | 1,064 | 187 |
| 3 years | 10,078 | 6,844 | 1,125 | 423 | 1,433 | 252 |
| 5 years | 14,781 | 10,039 | 1,650 | 620 | 2,103 | 370 |
| MCBS | ||||||
| 2 years | 705 | 386 | 63 | 102 | 78 | 75 |
| 3 years | 950 | 520 | 86 | 137 | 106 | 101 |
| 5 years | 1,393 | 762 | 125 | 201 | 155 | 149 |
| NHSDA | ||||||
| 2 years | 3,796 | 2,406 | 401 | 160 | 548 | 282 |
| 3 years | 5,114 | 3,242 | 540 | 216 | 738 | 380 |
| 5 years | 7,501 | 4,755 | 792 | 317 | 1,082 | 558 |
|
NOTE: The sample cases for each data set reflect the population coverage of the respective surveys. For example, CPS-March covers all persons in the civilian noninstitutional population, whereas NSFG covers women 15 to 44 years of age. The descriptions of the respective data sets note the appropriate population coverage. |
||||||
| Data set | Total API | Chinese | Filipino | Japanese | Asian Indian | Korean | Vietnamese | Hawaiian | Other |
|---|---|---|---|---|---|---|---|---|---|
| CPSMarch | |||||||||
| 2 years | 5,072 | 1,107 | 947 | 573 | 551 | 539 | 418 | 139 | 630 |
| 3 years | 6,833 | 1,492 | 1,276 | 792 | 743 | 727 | 563 | 187 | 848 |
| 5 years | 10,022 | 2,188 | 1,871 | 1,132 | 1,089 | 1,066 | 825 | 274 | 1,244 |
| CPSMonthly | |||||||||
| 2 years | 5,072 | 1,107 | 947 | 573 | 551 | 539 | 418 | 139 | 630 |
| 3 years | 6,833 | 1,492 | 1,276 | 772 | 743 | 727 | 563 | 187 | 848 |
| 5 years | 10,022 | 2,188 | 1,871 | 1,132 | 1,089 | 1,066 | 825 | 274 | 1,244 |
| NHIS | |||||||||
| 2 years | 4,063 | 934 | 800 | 441 | 396 | 423 | 441 | 139 | 489 |
| 3 years | 5,474 | 1,258 | 1,078 | 594 | 533 | 569 | 594 | 187 | 659 |
| 5 years | 8,029 | 1,848 | 1,581 | 871 | 782 | 835 | 871 | 274 | 967 |
| NIS | |||||||||
| 2 years | 1,506 | 341 | 292 | 175 | 169 | 165 | 129 | 43 | 192 |
| 3 years | 2,030 | 459 | 394 | 236 | 227 | 222 | 173 | 59 | 259 |
| 5 years | 2,977 | 673 | 578 | 347 | 333 | 327 | 254 | 86 | 380 |
| NHANES | |||||||||
| 2 years | 226 | 54 | 43 | 26 | 25 | 25 | 19 | 6 | 28 |
| 3 years | 340 | 81 | 65 | 39 | 38 | 38 | 29 | 10 | 42 |
| 5 years | 566 | 134 | 108 | 64 | 63 | 63 | 48 | 16 | 69 |
| MEPS | |||||||||
| 2 years | 596 | 120 | 135 | 50 | 89 | 77 | 35 | 13 | 79 |
| 3 years | 804 | 162 | 182 | 67 | 107 | 92 | 43 | 18 | 107 |
| 5 years | 1,178 | 237 | 267 | 99 | 176 | 152 | 69 | 26 | 156 |
| MCBS | |||||||||
| 2 years | 229 | 52 | 43 | 27 | 25 | 25 | 20 | 7 | 28 |
| 3 years | 308 | 70 | 59 | 36 | 34 | 34 | 27 | 9 | 38 |
| 5 years | 452 | 102 | 86 | 53 | 50 | 50 | 40 | 13 | 56 |
| NHSDA | |||||||||
| 2 years | 531 | 120 | 102 | 62 | 58 | 58 | 45 | 15 | 68 |
| 3 years | 716 | 162 | 137 | 83 | 79 | 79 | 61 | 20 | 92 |
| 5 years | 1,049 | 238 | 201 | 122 | 116 | 116 | 89 | 30 | 135 |
|
NOTE: The sample cases for each data set reflect the population coverage of the respective surveys. For example, CPS-March covers all persons in the civilian noninstitutional population, whereas NSFG covers women 15 to 44 years of age. The descriptions of the respective data sets note the appropriate population coverage. |
|||||||||
| Data set | American Indian or Alaska Native |
|---|---|
| CPSMarch | |
| 2 years | 1,782 |
| 3 years | 2,401 |
| 5 years | 3,521 |
| CPSMonthly | |
| 2 years | 1,782 |
| 3 years | 2,401 |
| 5 years | 3,521 |
| NHIS | |
| 2 years | 1,089 |
| 3 years | 1,467 |
| 5 years | 2,152 |
| NIS | |
| 2 years | 591 |
| 3 years | 797 |
| 5 years | 1,168 |
| NHANES | |
| 2 years | 47 |
| 3 years | 71 |
| 5 years | 118 |
| MEPS | |
| 2 years | 299 |
| 3 years | 403 |
| 5 years | 591 |
| MCBS | |
| 2 years | 38 |
| 3 years | 52 |
| 5 years | 76 |
| NHSDA | |
| 2 years | 125 |
| 3 years | 169 |
| 5 years | 248 |
|
NOTE: The sample cases for each data set reflect the population coverage of the respective surveys. For example, CPS covers persons in the civilian noninstitutional population, whereas NSFG covers women 15 to 44 years of age. The descriptions of the respective data sets note the appropriate population coverage. |
|
The Mexican-American samples in the NIS, MEPS, and NHSDA, are fairly large and even 2-year combinations will permit fairly detailed cross-classifications. Five-year combinations are necessary for most of the other Hispanic subgroups. Five years will permit simple analyses of NIS in most of the API subgroups and for American Indians or Alaska Natives. However, even 5 years is not sufficient for the API subgroups and American Indians or Alaska Natives for MEPS and NHSDA. The MCBS sample of minorities is so small that 5 years fails to satisfy most of the precision requirements, except for Mexican-Americans, for whom simple distributions are possible, but not detailed cross-classifications.
[ Go to Contents ]
The sample sizes shown for CPS in Tables 4-1 through 4-3, both March and monthly, apply to data obtained in a single month of the year. They include the March supplements income, mobility, work experience, and several other items and the supplemental information covered in other months, particularly school enrollment and fertility, and voting and registration, which is included every other year. However, CPS collects labor force status each month with the sample size shown for CPS Monthly.
Estimates of annual averages of such items as employment, unemployment, occupation, industry, and related labor force items can be produced by combining data for the 12 months of each year. There is a precedent for such annual averages; for many years CPS has produced annual unemployment rates for the larger states.
The number of observations for annual averages are 12 times the numbers for CPS monthly shown in Tables 3-3 to 3-5, but the effective sample size is lower. The CPS rotation pattern retains households in the sample for a sequence of 4 months, drops them for the next 8 months, and then reinstates them for another 4-month period. As a result, over the course of a year there are multiple observations on most of the sample persons. Furthermore, in the months when a group of sample persons is dropped, most of the sample replacements are neighboring households whose characteristics are usually correlated with the households they replace.
The correlations vary greatly among the labor force items. They are very high for items that tend to persist for most persons over the course of a year, e.g., whether or not in the labor force or employed and for occupation. They are more moderate for unemployment. The U.S. Census Bureau has estimated both the correlations and the effective sample sizes for CPS annual averages.1 The results indicate that the effective sample size for annual estimates of the unemployment rate is five times the monthly sample. For most of the other labor force items, the effective sample size is only twice the monthly sample. Estimates of average annual unemployment rates, thus, will be based on effective sample sizes five times as large as the numbers in Tables 3-6 to 3-8. Estimates of unemployment rates will satisfy reasonable precision requirements for almost all the minority subgroups. The cost of obtaining annual averages will be quite low since public use files are available.
| Section 3 | Table of Contents | Section 5 |
Top of Page
Table of Contents of Report
Home Pages:
Human Services Policy (HSP)
Assistant Secretary for Planning and Evaluation
(ASPE)
U.S. Department of Health and Human Services
(HHS)
Last updated 9/14/00