Extending the Utility of Federal Data Bases. Limitations of Report


This Task 3 report is essentially limited to the effect of sampling errors on the reliability of the various databases, and possible methods of improving precision when sample sizes are inadequate. There are, of course, other factors that affect the quality of surveys and data files. A complete discussion of these factors is beyond the scope of this report but we wish to call particular attention to several specific issues:

  • The reports are intended to be a general reference to a potential audience of analysts and policy makers seeking information on the possible use of these data bases as a source of data on the race/ethnic group of interest, rather than as a technical handbook. We suggest that users, who are not thoroughly familiar with the content of the database being considered and the procedures involved in data collection and data processing, seek appropriate technical assistance from the staff of the relevant agency or from documentation of the survey methods. There are two items that should be particularly examined.
    1. Are the sample sizes shown in tables 3-3, 3-4, and 3-5 still applicable, or have there been important modifications made in the surveys sample design. We note that small changes in sample size of the order of 10 or 15 percent will have only a negligible effect on the conclusions drawn in this report, and they can be ignored. Important changes in the sample, however, should be taken into account.
    2. What is known about sources of errors in the data, including those arising from possible problems in identifying the race/ethnic groups, respondents lack of information on some of the subject matter items or misunderstanding of various questions, and potential effects of nonresponse. For example, NCHS studies indicate there may be important issues in death rates for Hispanics, Asian and Pacific Islanders, and American Indians and Alaskan Natives due to misunderstanding of the race question on death certificates or in the censuses and surveys used as the denominators of the death rates. Similar reporting errors and differential nonresponse could affect other statistics.
  • It is possible to think of sampling errors in a somewhat broader sense than the term is used in this report. Statisticians distinguish between descriptive and analytic uses of survey data. Descriptive uses provide a profile of a finite population, the population that existed during the period of data collection. Analytic uses occur when survey results examine a process, frequently a "cause and effect" relationship, with the population at the time of data collection considered as a sample of an infinite population. The particular year for the time periods of the study can be considered a single observation from a stochastic process, with neighboring years reflecting additional observations, (for a few years, before long term trends disrupt this model of behavior). NCHS views birth rates as subject to stochastic variation. Similarly, analytic uses would include examination of the effect of educational attainment on income, the relationship of obesity to various health conditions, etc.

    Stochastic processes are subject to sampling errors arising from the erratic variations over time of the statistics studied. In most of the data bases examined for this report, the effect of this source of variation will be trivial compared to the sampling errors due to the sample sizes for data collection. However, the NVS and the Census short forms do not have any sampling errors, but their analyses are subject to a small amount of stochastic variation. NCHS has carried out studies of their effects on birth and death rates, and more detailed information can be obtained from the agency. We note that this Task 3 report is restricted to limitations of the data due to sampling error.

  • Information from several sources, each of which is subject to sampling errors and/or other limitations, often is combined for analysis. For example although the numerators of birth and death rates come from vital statistics records that are not subject to sampling errors, the denominators are derived from census reports; some of the census data are based on sample surveys, and others on extrapolation of census data to intercensal time periods. This report does not deal with such special situations, but users who anticipate such analyses should take the more complex sampling into account and, if necessary, seek advice from the agency technical staff.


1.  The sample sizes are used to estimate the sampling errors that are applicable to subgroup analysis, and thus to a determination of whether subgroup data for each survey can be obtained with a reasonable degree of precision. The effective sample size, in which the actual sample sizes as shown in Tables 3-3 to 3-5, are deflated by the design effect, is a better guide to the sampling error. Section 2.3 of this report contains a discussion of design effects and Tables 3-6 to 3-8 show average effective sample sizes.