Deriving State-Level Estimates from Three National Surveys: A Statistical Assessment and State Tabulations. VII. GLOSSARY

Bias -- The difference between the sample statistic and the population statistic caused by factors other than random error. If a sample statistic is biased, then repeating the survey many times would produce a distribution of sample statistics that would be centered around something other than the population value for the statistic. Thus, a biased sample statistic would have a tendency to be either too small or too large as an estimate of the population statistic. One common source of bias in all surveys occurs when the nonrespondents have different characteristics from the respondents.

Cluster -- A naturally occurring unit like a school (which has many classrooms, students, and teachers). Other clusters include universities, hospitals, cities, states, Census blocks, and living quarters. The clusters are randomly selected, and all members, or a random sample, of the selected cluster are included in the sample.

Coefficient of variation -- The standard error of an estimate divided by the mean.

Composite estimation -- Use of an estimator that is a weighted average of two other estimators. Frequently a composite is constructed from a direct sample-based estimator and a model-based estimator.

Confidence interval -- A range of values used to predict the location of the true population parameter. The probability of the true parameter values falling within the intervals is specified.

Design effect -- The sampling variance of the actual complex design used to select a sample divided by the sampling variance of a simple random sample of the same size. This measure reflects the effect on the precision of a survey estimate due to the difference between the sample design actually used to collect data and a simple random sample.

Effective sample size -- The actual sample size divided by the design effect that reflects the effect of the deviations from simple random sampling.

Estimator (biased, unbiased) -- A random variable used to estimate the value of a population parameter from sample data. Its value depends on the particular sample involved. If the expected value of the estimator over all possible samples is equal to the quantity it estimates, the estimator is unbiased. If it does not, it is biased.

Mean square error -- Measure of accuracy computed by squaring the individual errors (error is the difference between an actual value in a dataset and its expected value) and taking the mean of these squared values.

Multi-stage probability sample -- A sample drawn in successive stages. The population is first divided into primary groups (called primary sampling units or PSUs), some of which are selected (for example, with a probability proportional to their population size). Selected PSUs are then divided into clusters (e.g., of blocks), from which a sample (e.g., of households) is drawn.

Nonsampling error -- The discrepancy between a sample statistic and the true population parameter that results from factors other than the sampling process. Common sources of nonsampling errors include noncoverage of certain subpopulations, questionnaire wording, and recall errors.

Panel survey -- A survey that follows a given sample of individuals over time, thus providing multiple observations on each individual in the sample.

Precision -- The precision is the inverse of the amount of random error in an estimate. It indicates how close an estimate is likely to be to the true population value (see standard error).

Primary sampling unit (PSU) -- Groups selected as the first stage of a multi-stage sample. For example, for the CPS sample, the United States is divided into approximately 1,900 geographic areas, or PSUs, of which 729 are selected for the sample.

Ratio adjustment -- Potentially biased indirect state-level estimates can be ratio adjusted to regional totals so that the sum across states matches regional estimates. This eliminates bias at the regional level and attempts to remove bias from the state-level indirect estimator.

Sampling error -- The discrepancy between a sample statistic and the true population parameter that results from the sampling process. Sampling error can have a random component (sampling variance) and fixed component (bias).

Sampling variance -- Random error (discrepancy between a sample statistic and the true population parameter) that arises because a random process is used to select the survey sample. If the sampling process is repeated several times, a different group of respondents would be selected each time and the sample distributions of answers to the survey questions would be somewhat different in each sample.

Standard deviation -- Common measure of dispersion or spread of data about the mean.

Standard error -- The most commonly used measure of the precision of an estimate. A gauge of how close an estimate is likely to be to the population value in the absence of any bias.

Strata, State stratification -- Stratification is a sampling method whereby the population is divided into subgroups (or "strata"), based on characteristics believed to be correlated with the survey variables of greatest interest, and a sample is then selected from each subgroup. Stratification produces survey estimates of a desired precision within the chosen subgroups, which cannot be assured with an unstratified design. State stratified samples will allow for unbiased state-level estimates and estimates of precision.

Synthetic estimates -- A class of model-dependent estimates generally formed by dividing the population into subgroups (e.g., by age/race/sex) and assuming that national estimates for each subgroup can be applied to the local populations.

View full report

"deriving.pdf" (pdf, 2.83Mb)