National Invitational Conference on Long-Term Care Data Bases: Conference Package. V. DATA PROCESSING


Each of the public use files was derived from one or more data base masterfiles, which, in turn, were created and maintained according to a standard set of procedures. These procedures transformed source data from data entry files into a structured data set, edited the data, and created a set of constructed variables.

A. Interview Data Procedures

Because these data were collected over a period ranging from one to two-and-a-half years, depending upon the data source, data were regularly added to the data base. Each cycle of data processing included the following steps: quality control checks of hard-copy interview forms; data entry; transmission of data to the research data base; quality control checks of computerized data; and the updating of both the status file and the existing masterfiles. Figure 1 summarizes these components.

FIGURE 1. Standard Data Manipulation Procedures: unavailable at the time of HTML conversion--will be added at a later date.

Quality Control and Data Entry. Completed instruments were manually edited and coded by trained quality control staff. This included checking the legibility of contact information, assigning codes to open-ended and "other, specify" responses, and reviewing key questions to ensure they were properly recorded. If necessary, project staff or respondents were contacted to resolve problems.

After the documents had been read, they were entered into the computer. Automated skip logic, range, and consistency checks were performed as part of the key entry program. When errors were found, a trained data cleaner reviewed the instrument (if necessary, the cleaner telephoned the respondent or interviewer) and corrected the error in the instrument and on the file. Finally, once a batch of interviews had passed the skip logic, range, and consistency checks, the batch was verified by reentry.

Data Transmission and Initial Processing. Every month, all data-entered instruments were transmitted to the mainframe computer in Extended Binary-Coded-Decimal Interchange Code (EBCDIC) format. During the initial processing of the newly transmitted data, this file was transformed into a structured intermediate SAS data set. The project status file was also updated at this time with new status information. In addition, the intermediate file picked up the data base ID and randomization information from the status file for each record. Finally, confidential variables (such as Medicare and Medicaid numbers) were added to a separate Medicare/Medicaid status monitoring file.

Frequency distributions and other descriptive statistics were generated for each variable. In addition, range checks and further checks on consistency which were beyond the capacity of the data entry program were performed. For example, in processing each file, we printed selected variables for cases which appeared to have more than one interview (such as a complete and an incomplete interview).

Data Entry. Potential errors identified through a review of descriptive statistics were resolved by reviewing the hard-copy questionnaire and/or consulting with the quality control staff (who recontacted the interviewer or respondent when necessary). Some "errors" (for example, some out-of-range responses) proved to be correct, and the values were retained in the data base. The frequency and nature of each type of error were documented, as was its resolution. In this way, resolution decisions could be consistent and based on precedent, where applicable.

Masterfile Maintenance and Updating. After inconsistencies in the intermediate file were resolved, the current masterfile was updated with the new observations. For each of the new observations added to the masterfile, certain descriptive variable values were converted into standard binary codes. Once the masterfile had been updated with the new observations, frequencies of the new masterfile were produced, reviewed, and distributed periodically to the research staff.

Once-Only Procedures. After all the completed research sample interviews had been processed and added to a masterfile via the process outlined above, a final review of the complete masterfile was undertaken. The same range and consistency checks used in initial processing were applied to the complete masterfile. In addition, descriptive statistics of all variables in the final masterfile were closely reviewed and distributed to research staff.

Other Data. Comparable procedures were followed in processing secondary data into masterfiles.

B. Developing Analysis Files

One of the products generated from the data base masterfiles was a set of analysis files--files which contain only the sample and variables of interest for a particular analysis. Because analysis files are smaller than masterfiles, their contents can easily be accessed for use in statistical analysis procedures. Some of the public use files were generated from analysis files. This section describes three steps in the creation of the research data base analysis files.

Selecting the Samples. A set of standard samples was defined in order to facilitate consistency across analyses. Once a standard sample was defined, a binary variable, or "sample flag," was created and permanently stored in the status file, facilitating the selection of the standard sample for use with any masterfile. In addition, since some analyses used subsets of the standard samples, many individual analysis files contained sample flags and data for several samples, allowing analyses of several samples.

Specifying and Programming Constructed Variables. Initial specifications of constructed analysis variables (both dependent and independent) were prepared by the analysts. These preliminary specifications were reviewed and modified by the research data base staff, in consultation with the analysts. Constructed variables were programmed, variable labels defined, and descriptive statistics produced and reviewed for each variable.

Extracting and Merging Data from Masterfile. Analysis files were generally "extracts" (i.e., sub- sets) of masterfiles. These extracts were based on defined samples that were selected using standard sample flags. However, some analysis files required data from more than one masterfile (for example, client tracking and status change masterfiles and caregiver and sample member masterfiles). In these cases, extracts of each file were merged together, so that a single case contained the correct information from each file.