Because the statistics from NSRCF are based on a sample, they differ from the data that would have been obtained if a complete census had been taken using the same definitions, instructions, and procedures. However, the probability design of the NSRCF sample permits the calculation of estimates and sampling errors. The standard error of a statistic is primarily a measure of sampling variability that occurs by chance because only a sample, rather than the entire population, is surveyed. The standard error also reflects part of the variation that arises in the measurement process but does not include any systematic bias that may be in the data, or any other nonsampling error. The changes are about 95 in 100 that an estimate from the sample differs by less than twice the standard error from the value that would be obtained from a complete census.
Standard errors can be calculated for facility and resident estimates by using any statistical software package, as long as clustering within facilities and other aspects of the complex sampling design are taken into account. Software products such as SAS (15), Stata (16), and SPSS (17) have these capabilities. Statistics presented in NCHS publications are computed using the linearized Taylor series method of approximation as applied in SUDAAN software (18), which produces standard error estimates for statistics from complex sample surveys. Both of the NSRCF public-use files (facility and resident) include design variables that designate each record’s stratum marker and the first-stage unit (or cluster) to which the record belongs.
In the facility public-use file, the variable STRATUM indicates the sampling stratum for bed size group and region, and the facility indicated by the variable FACILID is the primary sampling unit. POPFAC represents the total number of facilities for calculating the finite population correction in a stratum. The survey weight is indicated by FACFNWT. The data dictionary for the facility public-use file has a ‘‘Technical Notes’’ section that provides an example of the syntax for using these design variables to describe the sampling design in SUDAAN. The NSRCF data dictionary for the facility public-use file is available from the NSRCF website: http://www.cdc.gov/nchs/nsrcf/nsrcf_questionnaires.htm.
The resident public-use file has two stages. The stratum in the first stage is indicated by the variable RSTRATUM, in which the primary sampling unit is the facility indicated by the variable FACID. The variable for total facilities needed to calculate the finite population correction at the first stage is RPOPFAC. In the second stage, the sampling unit is the resident indicated by the variable RESNUM. In the resident public-use file, the second stage is treated as if sampling was done with replacement. In SUDAAN, to treat the second stage as if sampling was with replacement, the variable POPRES is used and has a value of -1. Many other statistical packages assume sampling with replacement if no variable for the total population at the second stage is provided. The variable for the survey weight is RESFNWT. The data dictionary for the resident public-use file has a ‘‘Technical Notes’’ section that provides an example of the syntax for using these design variables to describe the sampling design in SUDAAN. The NSRCF data dictionary for the resident public-use file is available from the NSRCF website: http://www.cdc.gov/nchs/nsrcf/nsrcf_questionnaires.htm. The resident sample represents residents living in residential care communities on any given day between March and November 2010.
Because NSRCF is a sample survey, data analyses must include survey weights to inflate the sample numbers to national estimates. The weight associated with each sampled facility and each sampled resident is constructed to account for the multistage sampling design. An estimator for any given population total X can be expressed as a weighted sum over all sampled units, defined as
= Σu x(u) W(u)
where u represents a sampled unit, x(u) is the characteristic or response of interest for unit u, and W(u) is the final survey weight for sampled unit u. The final weight W(u) for each sampled unit is the product of two components:
- Inverse of the probability of selection.
- Nonresponse adjustment.
The first component of the weight for each sampled unit (facility or resident) is the inverse of the unit’s selection probability. For the current resident, the selection probability is the product of two selection probabilities: the probability of selecting the facility to the NSRCF sample and the probability of selecting the current resident within the sampled NSRCF facility. The inverse of the product of these probabilities is used for weighting.
The first component was corrected to account for duplicate listings of sampled facilities in the sampling frame when duplicates were identified after the start of field work. To the extent that all duplicates of sampled facilities were identified, the corrected weights produce unbiased estimates (i.e., estimates that would be obtained if no facilities were duplicated in the sampling frame).
The second component for calculating the weight is adjustment for nonresponse. This adjustment is made for three types of nonresponse. The first two types are at the facility level, and the third is at the resident level. The first type occurs when in-scope facilities do not respond to NSRCF. In NSRCF, the second type occurs when an in-scope facility does not provide the number of current residents within the respective facility. The third type occurs when the facility does not provide information requested in the survey about the sampled resident.
Finally, the weights described above were smoothed within groups defined by census region, size, and MSA status if there were outlier sampling units whose survey weights were somewhat larger than those for the remaining sample in the same group. In smoothing, total estimates for each group were preserved.