The Census Bureau constructed a full-panel, longitudinal research file by linking the data collected for each sample person over the life of the panel. Unlike the individual core wave files that contain one record per person-month, the longitudinal file contains one record per person. The longitudinal sample that this research file represents consists of all primary sample members who have complete data (either reported or imputed) for every month of the panel (excluding months of ineligibility). This longitudinal sample contains 55,484 people and is the main sample that was used for the analysis.
The 1996 longitudinal file contains a smaller percentage of all primary sample members than in previous SIPP panels, for several reasons. First, sample attrition was higher in the 1996 panel than in earlier panels because the 1996 panel was longer (12 waves, compared to 8 waves in previous panels). For example, the sample loss rate was 35.5 percent by the end of wave 12 in the 1996 panel, but it was 26.9 percent by the end of wave 8 in the 1993 panel.(3) Second, in creating the final data files, the Census Bureau typically performs imputations for missing responses to individual questions or to entire wave interviews (see U.S. Census Bureau 2003, SIPP Data Editing and Imputation), thereby increasing the sample size in the analysis files. In creating the 1996 SIPP data files currently available, however, the Census Bureau has performed fewer imputations than in previous panels.(4)
The longitudinal research file is available online using the FERRET system. As the Census Bureau specifies, however, this system is efficient (practical) only for downloading a small number of variables, because variable requests must be performed separately for each variable using a series of menus and because downloading even a few variables takes considerable time. Our study employs a large number of variables, so we did not use the FERRET system to obtain the longitudinal data needed for the analysis.
Instead, we downloaded (from the SIPP Web page) the entire ASCII database for each of the 12 individual core wave files and constructed our own longitudinal file following the same procedures the Census Bureau used to construct its longitudinal file. Specifically, we "flattened" each core file to obtain one record per person (rather than per person-month) and merged these 12 flattened files using the unique person identification code (LGTKEY). We compared key selected variables (such as earnings and hourly wage rates) in our constructed longitudinal file to those in the longitudinal file on the FERRET system and found the variables to be identical in both data files.
Finally, to take into account nonresponse, sample attrition, and the complex sample design of the 1996 SIPP (including the oversampling of poor households), the longitudinal research file contains panel weights (which we downloaded using the FERRET system). These weights make the SIPP longitudinal sample representative of the noninstitutionalized, resident population of the United States as of March 1996 (the only month common to all four rotation groups in wave 1).(5) We used weights throughout the statistical analyses and adjusted the standard errors of our estimates to account for design effects due to weighting and clustering.