| Section 2 | Table of Contents | Section 4 |
Sections 2.1 and 2.2 discussed ways of focusing on the precision levels required for various analytic uses. Since most U.S. Government surveys cover a broad array of data items, it is clear that no single standard of precision is likely to satisfy all potential uses of the data and that some compromises are necessary. It is particularly difficult to create standards of precision for a group of unrelated surveys whose specific analyses are yet to be developed at some future time. Under these circumstances, it seems sensible to use the standards for "generic" prevalence rates that were established for the study of the feasibility of producing state data from the NHIS. However, we reiterate our caveat that the standards may not satisfy all requirements, and if some new and critically important needs for statistical information on the subpopulations arise, the standards should be reviewed.
The precision levels that were examined for NHIS state level generic estimates were described in Section 2.2. We repeat them below:
In order to remove these prevalence rates from the abstract, we show some examples reported in recent U.S. Government sponsored surveys:
| Percent of U.S. population age 15 and over with earnings under $10,000 in 1996 | 24.9%1 |
| U.S. poverty rate in 1998 | 12.7%2 |
| Percent of U.S. population without health insurance coverage during all of 1998 | 16.3%3 |
| Percent of persons of Hispanic origin without health insurance in 1998 | 35.3%3 |
| Cocaine use by adults employed full time | 0.7%4 |
| Cocaine use by adults employed part time | 0.9%4 |
| Cocaine use by unemployed adults | 2.4%4 |
|
1 Source: March 1997 CPS, U.S. Census Bureau Report
P60, No. 206. 2 Source: March 1999 CPS; U.S. Census Bureau Report P60, No. 207. 3 Source: March 1999 CPS; U.S. Census Bureau Report P60, No. 208. 4 Source: National Household Survey of Drug Abuse |
|
It may be useful to convert the generic coefficients of variation to standard errors and confidence intervals for a clearer view of the effects of the sampling errors on the statistics. They are shown below.
| CV and prevalence rate | Standard error | 66% confidence interval | 95% confidence interval |
|---|---|---|---|
| 30% CV | |||
| .01 | .003 | .007-.013 | .004-.016 |
| .05 | .015 | .035-.065 | .020-.080 |
| .10 | .030 | .070-.130 | .040-.160 |
| .15 | .045 | .105-.195 | .060-.240 |
| .20 | .060 | .140-.260 | .080-.320 |
| 20% CV | |||
| .01 | .002 | .008-.012 | .006-.014 |
| .05 | .010 | .040-.060 | .030-.070 |
| .10 | .020 | .080-.120 | .060-.140 |
| .15 | .030 | .120-.180 | .090-.210 |
| .20 | .040 | .160-.240 | .120-.280 |
| 10% CV | |||
| .01 | .001 | .009-.011 | .008-.012 |
| .05 | .005 | .045-.055 | .040-.060 |
| .10 | .010 | .090-.110 | .080-.120 |
| .15 | .015 | .135-.165 | .120-.180 |
| .20 | .020 | .180-.220 | .160-.240 |
Section 2.2 also notes the types of variables that are likely to have design effects beyond the average values used in this report. We have left them out of our discussion because we use a single average design effect (or, in a few cases, two) for each survey covered in this report.
Our examination assumes the prevalence rates are applied to the total of all persons in the subpopulation. In practice, analyses are frequently desired for subsets, e.g., adults or children, each sex separately, families rather than persons, persons below the poverty level, etc. Examining all possible uses of data would lead to such a wide variety of possibilities that no clear-cut decision could be made, and it seems sensible to restrict the alternatives. Basically, if subset analysis is considered of crucial importance for a survey, the sample size implied by each precision level can be thought of as applying to the subset, and the implications for the total sample for the survey can be calculated. For example, if a 30 percent CV requires a sample of 200 persons, then a sample of 400 persons is necessary if the same CV is desired for males and females separately.
Some examples of subsets, and associated prevalence rates are shown below.
| Males 15 years and over: Percent with income under $10,000 | 17.2%1 |
| Hispanic males 15 years and over: Percent with income under $10,000 | 21.4%1 |
| Males, 15-24 years of age: Percent with income under $10,000 | 40.8%1 |
| Hispanic low-income persons (under 125% of poverty): percent without health insurance | 44.0%2 |
| Hispanic children under 18 years of age: percent without health insurance | 30.0%2 |
| Persons 65 years and over: percent without health insurance | 1.1%2 |
| Hispanic males, 65 years and over: percent with income below 50 percent of poverty | 4.7%3 |
|
1 Source: U.S. Census Bureau Report P60 No. 206. 2 Source: U.S. Census Bureau Report P60 No. 208. 3 Source: U.S. Census Bureau Report P60 No. 207. |
|
[ Go to Contents ]
The effective sample sizes needed to provide the precision levels for the various prevalence rates are shown in Table 3-1. They were derived from the simple formula for simple random sampling,
| n = | 1 - p | ; |
| p(CV)2 |
where p is the prevalence rate.
| Prevalence rate | Precision level (CV) | ||
|---|---|---|---|
| .30 | .20 | .10 | |
| 0.01 | 1,100 | 2,475 | 9,900 |
| 0.05 | 211 | 475 | 1,900 |
| 0.10 | 100 | 225 | 900 |
| 0.15 | 63 | 142 | 567 |
| 0.20 | 44 | 100 | 400 |
[ Go to Contents ]
Section 2.4 of this report briefly described features of sample designs that affect the sampling errors and thus contribute to design effects. It was also noted that design effects can differ among statistics gathered in a survey, sometimes dramatically. As a basis for decision-making, we have chosen to use an average design effect for each survey, one that is approximately midway between the high and low values. In a few cases, we have indicated an additional design effect that applies to specific race/ethnic groups. However, an analyst who is concerned with a specific subject in a survey might prefer to use a different design effect that is more appropriate to the items to be studied. This report cannot take into account all possible analyses that could be carried out. We have tried to include enough information to permit modifications of the results for special subpopulations or items.
The design effects shown in Table 3-2 come from a number of sources (see Appendix A). Wherever possible, we have used published reports. These reports usually do not show design effects, as such, but the information on sampling errors makes it possible to calculate average design effects. When published reports with the required information were not available, the design effects were estimated from the descriptions of the sample design or through discussions with statisticians at the agencies sponsoring the surveys.
[ Go to Contents ]
Tables 3-3, 3-4, and 3-5 show the sample sizes for all of the race/ethnic subgroups and are the same numbers reported in Tables A-1 to A-3 of the Task 2 report. As noted, these data represent approximations of the number of sample cases for each subpopulation, and were obtained either from published reports of the Federal agencies sponsoring the survey, provided by the agencies, or derived by Westat. We refer to these numbers as the "nominal sample sizes" to distinguish them from the effective sample sizes. We note that we have included all race/ethnic subgroups, including those that are not currently identified in the data set. The sources used to provide estimates of design effects are shown in Appendix A.
| Survey | Average design effect |
|---|---|
| Census | |
| Census 2000 | 1.0 |
| ACS | 1.0 |
| CPS2 | 1.5 |
| SIPP Hispanic | 2.4 |
| SIPP API and American Indians or Alaska Natives | 1.6 |
| NCHS/CDC | |
| NHIS Hispanics and American Indians or Alaska Natives | 1.5 |
| NHIS API | 1.3 |
| NSFG Hispanics and American Indians or Alaska Natives | 1.7 |
| NSFG API | 1.4 |
| NIS | 1.3 |
| NHANES3 Mexican-American | 2.2 |
| NHANES3 Other minorities | 1.8 |
| AHRQ | |
| MEPS Hispanic | 1.2 |
| MEPS API and American Indians or Alaska Natives | 2.1 |
| HCFA | |
| MCBS | 1.1 |
| SAMHSA | |
| NHSDA | 2.2 |
| NCES | |
| NHES4 | 1.4 |
| ECLS-B | 1.2 |
| ECLS-K5 | 2.5 |
|
1 Most of the surveys are based on household samples.
The design effects apply to statistics that do not cluster strongly within
households, e.g., health conditions, educational attainment, and labor force
status. Items like poverty status, availability of health insurance,
urbanrural residence, etc. generally are identical for all members
of a household, and the design effects for such items are much larger, usually
two to three times the ones shown in the table. 2 The design effects are approximately the same for the March CPS and for other months. 3 The design effects are those of statistics on data for the total of each race/ethnic group. Design effects for individual agesex groups are lower. 4 The design effects shown apply to statistics on children who constitute the main focus of NHES. Data for adults are sometimes included in the survey, and they are subject to higher design effects. 5 The design effect shown, 2.5, applies to most social, economic, and related items. The design effect for test scores is about 5. |
|
Many of the U.S. Government surveys are repetitive, that is either carried out every year, conducted several times a year, or as in the case of CPS, conducted every month. In most cases, the sample sizes shown in this report describe the annual sample as it was in the time period noted. The reader should be aware that sample sizes are sometimes changed because of budgetary restrictions or other causes. For analysis of a data set, it would be useful to ascertain whether there is an important difference in the sample design between the time period analyzed and the reference date shown in Section 1.2. If so, the sample sizes should be modified accordingly. There are a few cases in which there may be some ambiguity in the sample size. A brief discussion of these cases follows:
The sample size in each of the 11 months (excluding March) is referred to as the CPS-Monthly sample. The March sample is similarly referred to as CPS-March. In analyzing a CPS data set for Hispanic subgroups, it therefore is important to identify the month in which the information was obtained. The March sample sizes for Asian and Pacific Islanders and American Indians or Alaska Natives are the same as in other months, so that month of data collection does not affect the sample size.
Since labor force data are collected each month in CPS, it is possible to obtain yearly averages by pooling the data sets for all 12 months of a year. However, the CPS sample retains the same sample units in a 4-month cycle, and there is about a 75 percent overlap in the sample from one month to the next. The effectiveness of the annual sample is thus very much less than 12 times the monthly sample. Section 4.5 of this report discusses the sampling errors of annual averages in CPS.
Currently, a single panel is used. The current panel was introduced in 1996 and will continue through 1999. The sample sizes for SIPP shown in this report are those of the current panel. In earlier years, a rotating panel structure was used, with several panels operating in each year. Before proceeding with a study based on SIPP, an analyst should check the number and size of panels used in the time period of interest. It should also be noted that with the current lack of rotation of the panel, there is very little to be gained by combining years (except, of course, for longitudinal analyses of changes over time.)
The sample sizes shown in this report refer to a single years sample. Section 4.4 discusses the effect of combining years of data.
Currently, each years MEPS sample consists of two panels; one introduced for the first time that year, and the second carried over from the preceding year. For this reason it is important that sample sizes be verified before attempting to utilize these data. The MEPS sample sizes shown in this report refer both to the new panel introduced in 1999 and the panel carried over from 1998. The MCBS also has a panel structure and we report the sample size for the four panels included in early 1998.
| Data set | Total Hispanic |
Mexican- Americans |
Puerto Ricans | Cubans | Central or South American | Other Hispanic |
|---|---|---|---|---|---|---|
| Census | ||||||
| Census 20001 | 4,508,000 | 2,850,000 | 475,000 | 190,000 | 650,000 | 335,000 |
| ACS | 900,000 | 570,000 | 95,000 | 38,000 | 130,000 | 67,000 |
| CPS-March | 11,260 | 6,940 | 1,190 | 470 | 1,685 | 975 |
| CPS-Monthly | 5,635 | 3,470 | 595 | 235 | 845 | 490 |
| SIPP | 10,845 | 7,181 | 1,172 | 372 | 1,306 | 814 |
| NCHS/CDC | ||||||
| NHIS | 22,145 | 13,869 | 2,353 | 1,165 | 2,093 | 4,758 |
| NSFG | 2,097 | 1,330 | 221 | 88 | 302 | 156 |
| NIS | 4,852 | 3,529 | 398 | 99 | 526 | 300 |
| NHANES | 1,582 | 1,500 | 24 | 10 | 32 | 16 |
| AHRQ | ||||||
| MEPS | 5,375 | 3,650 | 600 | 225 | 766 | 134 |
| HCFA | ||||||
| MCBS | 464 | 254 | 42 | 67 | 52 | 50 |
| SAMHSA | ||||||
| NHSDA | 5,000 | 3,170 | 527 | 211 | 721 | 372 |
| NCES | ||||||
| NHES | 18,804 | 13,675 | 1,541 | 385 | 2,040 | 1,162 |
| ECLS-B | 1,979 | 1,367 | 160 | 35 | 137 | 280 |
| ECLS-K | 2,957 | 2,150 | 242 | 61 | 321 | 183 |
|
1 Long form data
NOTE: |
||||||
| Data set |
Total Asian and PI |
Chinese | Filipinos | Japanese | Asian Indian | Korean | Vietnamese | Hawaiian | Other |
|---|---|---|---|---|---|---|---|---|---|
| Census | |||||||||
| Census 20001 | 1,580,000 | 375,000 | 300,000 | 180,000 | 175,000 | 175,000 | 135,000 | 45,000 | 195,000 |
| ACS | 316,000 | 75,000 | 60,000 | 36,000 | 35,000 | 35,000 | 27,000 | 9,000 | 39,000 |
| CPS-March | 4,555 | 995 | 850 | 515 | 495 | 485 | 375 | 125 | 565 |
| CPS-Monthly | 4,555 | 995 | 850 | 515 | 495 | 485 | 375 | 125 | 565 |
| SIPP | 3,293 | 745 | 637 | 386 | 370 | 362 | 280 | 95 | 421 |
| NCHS/CDC | |||||||||
| NHIS | 3,284 | 755 | 647 | 356 | 320 | 342 | 356 | 112 | 396 |
| NSFG | 327 | 74 | 63 | 38 | 37 | 36 | 28 | 9 | 42 |
| NIS | 1,172 | 265 | 227 | 137 | 131 | 129 | 100 | 33 | 150 |
| NHANES | 113 | 27 | 22 | 13 | 12 | 12 | 10 | 3 | 14 |
| AHRQ | |||||||||
| MEPS | 750 | 152 | 170 | 62 | 111 | 96 | 45 | 17 | 97 |
| HCFA | |||||||||
| MCBS | 151 | 34 | 29 | 18 | 17 | 17 | 13 | 4 | 19 |
| SAMHSA | |||||||||
| NHSDA | 700 | 158 | 135 | 82 | 78 | 77 | 59 | 20 | 90 |
| NCES | |||||||||
| NHES | 4,420 | 999 | 855 | 517 | 495 | 486 | 376 | 128 | 566 |
| ECLS-B | 2,483 | 705 | 467 | 134 | 282 | 278 | 217 | 74 | 325 |
| ECLS-K | 1,870 | 423 | 362 | 219 | 209 | 206 | 159 | 54 | 239 |
|
1 Long form data
NOTE: |
|||||||||
| Data set | American Indian and Alaska Native |
|---|---|
| Census | |
| Census 20001 | 330,000 |
| ACS | 67,000 |
| CPS-March | 1,600 |
| CPS-Monthly | 1,350 |
| SIPP | 1,200 |
| NCHS/CDC | |
| NHIS | 978 |
| NSFG | 77 |
| NIS | 460 |
| NHANES | 24 |
| AHRQ | |
| MEPS | 375 |
| HCFA | |
| MCBS | 25 |
| SAMHSA | |
| NHSDA | 166 |
| NCES | |
| NHES | 1,675 |
| ECLS-B | 50 |
| ECLS-K | 364 |
|
1 Long form data
NOTE: |
|
The effective sample sizes are simply the nominal sample sizes divided by the design effects. They are shown in Tables 3-6 to 3-8. The effective sample sizes will be used to identify data sets that satisfy minimum standards of reliability.
| Data set | Total Hispanic |
Mexican- American |
Puerto Rican | Cuban | Central or South American | Other Hispanic |
|---|---|---|---|---|---|---|
| Census | ||||||
| Census 20001 | 4,508,000 | 2,850,000 | 475,000 | 190,000 | 650,000 | 335,000 |
| ACS | 900,000 | 570,000 | 95,000 | 38,000 | 130,000 | 67,000 |
| CPS-March | 7,507 | 4,627 | 793 | 313 | 1,123 | 650 |
| CPS-monthly | 3,757 | 2,313 | 397 | 157 | 563 | 327 |
| SIPP | 4,519 | 2,992 | 488 | 155 | 544 | 339 |
| NCHS/CDC | ||||||
| NHIS | 14,763 | 9,246 | 1,569 | 777 | 2,093 | 1,079 |
| NSFG | 1,234 | 782 | 130 | 52 | 178 | 92 |
| NIS | 3,732 | 2,715 | 306 | 76 | 405 | 231 |
| NHANES | 727 | 682 | 12 | 6 | 18 | 9 |
| AHRQ | ||||||
| MEPS | 4,479 | 3,042 | 500 | 188 | 637 | 112 |
| HCFA | ||||||
| MCBS | 422 | 231 | 38 | 61 | 47 | 45 |
| SAMHSA | ||||||
| NHSDA | 2,273 | 1,441 | 240 | 96 | 328 | 169 |
| NCES | ||||||
| NHES | 13,431 | 9,768 | 1,101 | 275 | 1,457 | 266 |
| ECLS-B | 1,649 | 1,139 | 133 | 29 | 114 | 233 |
| ECLS-K | 1,183 | 860 | 97 | 24 | 128 | 73 |
|
1 Long form data
NOTE: |
||||||
| Data set | Total Asian and PI | Chinese | Filipinos | Japanese | Asian Indian | Korean | Vietnamese | Hawaiian | Other |
|---|---|---|---|---|---|---|---|---|---|
| Census | |||||||||
| Census 20001 | 1,580,000 | 375,000 | 300,000 | 180,000 | 175,000 | 175,000 | 135,000 | 45,000 | 195,000 |
| ACS | 316,000 | 75,000 | 60,000 | 36,000 | 35,000 | 35,000 | 27,000 | 9,000 | 39,000 |
| CPS-March | 3,037 | 663 | 567 | 343 | 330 | 323 | 250 | 83 | 377 |
| CPS-Monthly | 3,037 | 663 | 567 | 343 | 330 | 323 | 250 | 83 | 377 |
| SIPP | 2,058 | 466 | 398 | 241 | 231 | 226 | 175 | 59 | 263 |
| NCHS/CDC | |||||||||
| NHIS | 2,433 | 559 | 479 | 264 | 237 | 253 | 264 | 83 | 293 |
| NSFG | 234 | 53 | 45 | 27 | 26 | 26 | 20 | 6 | 30 |
| NIS | 902 | 204 | 175 | 105 | 101 | 99 | 77 | 26 | 115 |
| NHANES | 63 | 15 | 12 | 7 | 7 | 7 | 5 | 2 | 8 |
| AHRQ | |||||||||
| MEPS | 357 | 72 | 81 | 30 | 53 | 46 | 21 | 8 | 46 |
| HCFA | |||||||||
| MCBS | 137 | 31 | 26 | 16 | 15 | 15 | 12 | 4 | 17 |
| SAMHSA | |||||||||
| NHSDA | 318 | 72 | 61 | 37 | 35 | 35 | 27 | 9 | 41 |
| NCES | |||||||||
| NHES | 3,157 | 714 | 611 | 369 | 354 | 347 | 269 | 91 | 404 |
| ECLS-B | 2,069 | 588 | 389 | 112 | 235 | 232 | 181 | 62 | 271 |
| ECLS-K | 748 | 169 | 145 | 88 | 84 | 82 | 64 | 22 | 96 |
|
1 Form data
NOTE: |
|||||||||
| Data set | Effective sample size |
|---|---|
| Census | |
| Census 2000(1) | 330,000 |
| ACS | 67,000 |
| CPS-March | 1,067 |
| CPS-Monthly | 1,067 |
| SIPP | 1,000 |
| NCHS/CDC | |
| NHIS | 652 |
| NSFG | 45 |
| NIS | 354 |
| NHANES | 12 |
| AHRQ | |
| MEPS | 179 |
| HCFA | |
| MCBS | 23 |
| SAMHSA | |
| NHSDA | 75 |
| NCES | |
| NHES | 1,196 |
| ECLS-B | 148 |
| ECLS-K | 146 |
|
1 Long form data
NOTE: |
|
[ Go to Contents ]
A comparison of the effective sample sizes in Tables 3-6 to 3-8 with the numbers needed to meet alternate levels of precision shown in Table 3-1 indicate which race/ethnic subgroups meet these standards for each of the surveys.
We should like to reiterate the caveats mentioned earlier in the discussion of these standards. The sample sizes in Table 3-1 will provide the coefficient of variation for the indicated estimate of prevalence of the total population in the race/ethnic subgroup (or of the total target population of the survey; e.g., females 15-44 for NSFG, person 65 or older for MCBS, etc.) If the contemplated analysis includes examining subsets of the total, such as individual age groups, urban-rural residence, or low-income vs. higher-income persons, much larger sample sizes are needed; essentially each subset would require approximately the sample sizes shown in Table 3-1. Since the specific studies to be carried out have not yet been developed, this report does not contain a provision for subset analysis, but the possibility of the need for such statistical breakdowns and their implications should be kept in mind.
Most of the surveys use an identical sampling rate for all persons in each race/ethnic group. In these surveys, the sample size for any subset can be estimated by taking the proportion of the sample equal to the proportion of the relevant population in that subset. For example, for analysis of data by gender, the male (and female) sample will be equal to about one-half the total sample. Similarly, for an age group containing about 20 percent of the relevant population, the sample will be 20 percent of the total sample in the race/ethnic subgroup. Similar relationships hold for other subsets, such as regional breakdowns, income classes, etc. For subset analyses, the nominal and effective sample sizes in the tables, which follow, should be adjusted to reflect the portion of the subgroup to be analyzed.
There are a few exceptions to the use of a common sampling rate for all members of a subgroup. NHANES focuses on 52 age-sex-race/ethnicity subsets, and uses approximately the same sample sizes for each. The 52 groups are described in several reports on the methodology of NHANES, and analysts concerned with subsets of the race/ethnicity subgroups should refer to the NHANES publications for appropriate methods of estimating the sample sizes. SIPP oversamples persons in poverty. For subset analyses comprising persons in poverty (or items correlated with poverty), the analyst should obtain a description of the current SIPP sample and use it to estimate the sample size.
Secondly, the design effects in Table 3-2 that were inputs to the calculation of the effective sample sizes basically apply to data that are not heavily clustered within households. Examples of statistics that are not clustered, or only moderately clustered are: smoking status, presence of specific chronic illnesses such as hypertension or arthritis, occupation, and very large expenditures for medical care during the year. For such items, members of a household are unlikely to have the same characteristics. On the other hand, as is indicated in footnote 1 of Table 3-2, items such as poverty status, health insurance, urban-rural residence, etc. tend to be identical for all members in a household, and the design effects are usually two to three times as large as those in Table 3-2. Other examples of items with high clustering effects are: mobility status, whether or not foreign born, and income class. Such items will tend to be identically reported within a household so that obtaining the statistics from all members of a household is no more useful than an interview with only one household member. In such instances, the design effect is increased by a factor equal to the average household size, that is by a factor of about 3.5 for Asian and Pacific Islanders, 4.3 for American Indians and Alaska Natives and 3.6 for Hispanics. The average household size (and consequently the design effects) can differ among the subgroups that are the focus of this report. For example, the average household size for Hispanic subgroups varies from a low of 2.6 for Cubans to 3.9 for Mexicans. An analyst should check the household sizes of the subgroups to be studied if highly clustered items are important variables, and modify the design effects accordingly. An alternate way of accomplishing the same goal for highly clustered items is to treat the sample size as the number of households in the sample rather than the number of persons. The nominal and effective sample sizes in the various tables should then be divided by the average household size.
Results of the comparisons of Tables 3-6 to 3-8 with Table 3-1 are summarized below. Table 3-1 indicates the sample size cut-offs for various levels of confidence in the data. Thus, an effective sample size of 500 satisfies requirements for a 20 percent CV for all prevalence rates except very rare ones (i.e., p = .01); an effective sample of 1,000 will provide a CV of .10 on prevalence rates greater than or equal to .10, as well as satisfying the criteria mentioned for a sample of 500; and a sample size of about 2,000 to 2,500 will produce CVs of .20 or better for prevalence rates as low as .01.
For prevalence levels of .05 or greater, the March sample of Central or South Americans satisfies almost all of the requirements. The March samples of Puerto Ricans and "other" Hispanics is satisfactory for prevalence rate of .05 or greater when CVs of .20 or more are required, and for rates of .15 or greater when a CV of .10 is needed.
Other than for March, the monthly CPS samples for all Hispanic groups except Mexican-Americans and Central or South Americans are fairly small and only provide the sample needed for CVs of .20 or greater with rates of .05 or more. This is also true of the March Cuban sample; the monthly Cuban sample produces even less reliability.
The CPS American Indian or Alaska Native sample is sufficient for CVs of .20 or greater with prevalence rates of .05 or greater. It is also large enough to provide a CV of .10 when the prevalence rates are .10 or greater.
The Chinese and Filipino samples are inadequate when the rate is as low as .01, but will provide CVs of .20 for rates .05 or greater, and CVs of .10 for rates above .15. The Japanese, Asian-Indian, Korean and "other" API samples are quite similar and are mostly sufficient to provide CVs of .20 for rates of .10 or greater and a CV of .30 for a .05 rate. The Hawaiian sample is quite small and satisfies hardly any of the requirements.
The Central and South American annual sample satisfies the criteria for precision for prevalence rates of .05 or greater. The Puerto Rican and "other" Hispanic annual samples meet all the requirements when the prevalence rate is .05 or greater, except the goal of a CV of .10 for a prevalence rate of .05. The smaller Cuban sample is still large enough to obtain a CV of .20 or better for prevalence rates of .05 or greater, and to provide a CV of .10 for prevalence rate of .15 or more.
The annual American Indian or Alaska Native sample is close to that of Cubans, and will achieve approximately the same levels of precision.
The Chinese and Filipino samples are a little smaller than the American Indian or Alaska Native sample, but they still will satisfy similar goals, that is, they will provide CVs of .20 or better for prevalence rates of .05 or greater. The other Asian and Pacific Islander groups will only meet the most modest criteria, a .30 CV for rates of .05 or greater and a .20 CV when the rate is .10 or more.
The analysis above can be summarized as follows. The vital statistics records, Census 2000 and the ACS will permit detailed and complex analyses of all race/ethnic subpopulations. The March CPS, the NHIS, and NHES can produce quite accurate statistics for Mexican-Americans, moderately good data for Puerto-Ricans and Central or South Americans, and acceptable data for the other Hispanic subgroups, with the possible exception of Cubans. Data for Chinese, Filipinos, and American Indian or Alaska Native would be fairly reliable. Only limited analysis could be made of data for the remaining API subgroups. The monthly CPS and SIPP would be weaker for Hispanics, but mostly still acceptable. For the other surveys, acceptable precision is only possible for Mexican-Americans, and MCBS would not even be acceptable for that subgroup.
It is important to remember that the above analyses apply to the ability of the surveys to provide acceptable accuracy on prevalence rates (or percentage distributions) of total persons in each subpopulation. Many surveys require examination of important subsets of the population, as well as the total. For example, NHANES concentrates on age-sex-race/ethnicity subgroups, MEPS examines low-income persons as well as the total population, and an analytic group in the NSFG is teenagers, by race/ethnicity. For such analyses, the survey needs to have each subset have the sample sizes in Table 3-1. Thus, a simple four-way breakdown of the population, such as persons under or over 25 years by sex, would require a sample four times as great as the numbers in Table 3-1.
Table 3-9 contains guidance on the ability of the various databases to provide acceptable precision levels, as follows:
The classifications are subjective, and it is possible to reach different conclusions on the levels of precisions that are reasonable. An analyst should determine how much error can be tolerated before reaching a conclusion on the detailed analysis to be carried out. Once again, given the possible changes in sample size or design, as well as the use of overlapping samples, we urge that, prior to using a particular data file, the current sample sizes and design effects be verified.
| Database | Hispanic |
American Indian or Alaska Native |
|||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mexican-American | Puerto Rican | Cuban | Central & South American | Other | |||||||||||||||||
| Census | |||||||||||||||||||||
| Census 2000 | A | A | A | A | A | A | |||||||||||||||
| ACS | A | A | A | A | A | A | |||||||||||||||
| CPS-March | A | C | C | B | C | B | |||||||||||||||
| CPS-Monthly | B | C | D | C | C | B | |||||||||||||||
| SIPP | B | C | D | C | C | B | |||||||||||||||
| NCHS/CDC | |||||||||||||||||||||
| NHIS | A | B | C | B | B | C | |||||||||||||||
| NSFG | C | D | D | D | D | D | |||||||||||||||
| NIS | B | C | D | C | C | C | |||||||||||||||
| NHANES | C | D | D | D | D | D | |||||||||||||||
| AHRQ | |||||||||||||||||||||
| MEPS | B | C | D | C | C | D | |||||||||||||||
| HCFA | |||||||||||||||||||||
| MCBS | C | D | D | D | D | D | |||||||||||||||
| SAMHSA | |||||||||||||||||||||
| NHSDA | B | C | D | C | D | D | |||||||||||||||
| NCES | |||||||||||||||||||||
| NHES | A | B | C | B | C | B | |||||||||||||||
| ECLS-B | B | D | D | D | C | D | |||||||||||||||
| ECLS-K | C | D | D | D | D | D | |||||||||||||||
|
|||||||||||||||||||||
| Data set | Chinese | Filipino | Japanese | Asian Indian | Korean | Vietnamese | Hawaiian | Other | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Census | |||||||||||||||||||||||
| Census 2000 | A | A | A | A | A | A | A | A | |||||||||||||||
| ACS | A | A | A | A | A | A | A | A | |||||||||||||||
| CPS-March | C | C | C | C | C | C | D | C | |||||||||||||||
| CPS-Monthly | C | C | C | C | C | C | D | C | |||||||||||||||
| SIPP | C | C | C | C | C | D | D | C | |||||||||||||||
| NCHS/CDC | |||||||||||||||||||||||
| NHIS | C | C | C | C | C | C | D | C | |||||||||||||||
| NSFG | D | D | D | D | D | D | D | D | |||||||||||||||
| NIS | C | D | D | D | D | D | D | D | |||||||||||||||
| NHANES | D | D | D | D | D | D | D | D | |||||||||||||||
| AHRQ | |||||||||||||||||||||||
| MEPS | D | D | D | D | D | D | D | D | |||||||||||||||
| HFCA | |||||||||||||||||||||||
| MCBS | D | D | D | D | D | D | D | D | |||||||||||||||
| SAMHSA | |||||||||||||||||||||||
| NHSDA | D | D | D | D | D | D | D | D | |||||||||||||||
| NCES | |||||||||||||||||||||||
| NHES | C | C | C | C | C | C | D | C | |||||||||||||||
| ECLS-B | C | C | D | C | C | D | D | C | |||||||||||||||
| ECLS-K | D | D | D | D | D | D | D | D | |||||||||||||||
|
|||||||||||||||||||||||
The ability to produce acceptable data also depends on whether the survey collects the detailed race/ethnicity description of each sample person and enters the code in the data set. The Task 2 report indicated a few cases in which not all subpopulations were identified. Many of the surveys simply ask whether the sample person is an Asian or Pacific Islander without obtaining additional detail. The NVS, both natality and mortality, record the identification of Chinese, Japanese, Hawaiian, and Filipinos in all 50 states, but identify the other ethnic groups -- Vietnamese, Asian-Indian, Korean, Samoans, and Guamanians -- in only nine states which contain about two-thirds of the U.S. population in each of these groups. Obviously, the identifications and coding in the surveys and the NVS would need to be expanded to make tabulations possible.
| Section 2 | Table of Contents | Section 4 |
Top of Page
Table of Contents of Report
Home Pages:
Human Services Policy (HSP)
Assistant Secretary for Planning and Evaluation
(ASPE)
U.S. Department of Health and Human Services
(HHS)
Last updated 9/14/00