Daniel Kasprzyk, Bureau of the Census
Robert Friedland, Ph.D., Employee Benefit Research Institute
MARY HARAHAN: Our next two speakers will be discussing the data base that is a little bit different. It is certainly not strictly a health care or long term care data base. But we think it has tremendous possibilities for those of you who are interested in looking at the income and asset characteristics of the population.
DANIEL KASPRZYK: This survey actually began about 10-15 years ago in the Office of the Assistant Secretary for Planning and Evaluation (ASPE) in the Department of Health and Human Services (DHHS). It was a combined effort of Census at that time, ASPE, and the Social Security Administration (SSA).
ASPE was extremely instrumental in the design and content of the survey. In fact, I am quite certain that left to our own designs the survey would look quite different now without the kind of broad governmental information we have received from various agencies.
I would like to just make it clear that this survey is not a long term care data base. It is a household survey principally designed to provide information about income and program participation. It does, however, have some questions designed by working groups of employees from government agencies which had an interest in long term care.
The Survey of Income and Program Participation (SIPP) is a nationally-representative household survey program. It was intended primarily to provide information on cash and noncash income, eligibility and participation in various government transfer programs, disability, labor force status, assets and liabilities and many more items.
SIPP arose in recognition that the best source of information on the distribution of household and personal income in the United States, the March Income Supplement to the Current Population Survey (CPS), had limitations that could only be rectified by a total redesign or a change in the instrument and procedures.
These deficiencies in the March income supplement in the CPS became especially apparent in the early 1970's when many public assistance programs were expanded and reorganized.
It was in response to these deficiencies that a development program arose. This development program was called the Income Survey Development Program. It was funded, principally, by ASPE and DHHS.
The purpose of this survey was to develop methods to overcome some of the shortcomings of the CPS, namely the under-reporting of property income and other irregular sources of income, the under-reporting and misclassif ication of various participation in federal programs and to provide information to assist at the analysis of program participation and eligibility.
During the period 1977-1981, this development program was in operation. Various tests, including a feasibility test that was essentially a survey of 8,000 households, was conducted in 1979. All these tests led to the design of the SIPP.
The kinds of data that SIPP provides are personal, household, and family income data for each month of the calendar year.
The monthly income data are based on a wide variety of cash and noncash sources, monthly data on most government income transfer programs and detailed data on assets, liabilities, and a number of special topics which I will describe later.
SIPP began in October 1983. It is an ongoing survey program for Census. That sample, which began in October 1983, consisted of approximately 21,000 households in 174 areas around the country. It is designed to represent the noninstitutional population of the U.S.
Each household is interviewed once every 4 months for 2 1/2 years to produce sufficient data for short term longitudinal analyses, while attempting to provide a relatively short recall period for reporting monthly income.
The reference period for the principal survey items, namely the income and program participation data, is the 4 months preceding the interview.
We have started a panel in October 1983. We began a new sample in February 1985, the same characteristics, namely a national household sample in about the same number of areas in the country, extending for 2 1/2 years, with interviews every 4 months. Similarly, in 1986 and 1987.
Several panels run concurrently. By looking at the timing of these data collections, you can combine samples to produce estimates from a larger sample.
Each sample is divided into four approximately equal sub-samples. We call these the "rotation groups." One rotation group is interviewed in each month. When I say the interviews were conducted between October 1983 and January 1984, one-fourth of the sample was interviewed in October 1983. The next fourth was interviewed in November, the next fourth December and then finally in January. Then we repeat again.
The purpose is that this design creates manageable interviewing workloads, hand processing workloads each month, instead of one large workload every 4 months. The real problem with that design is that it results in each rotation group or each sub-sample getting a slightly difference period, by one month.
If you are interviewing in January, you ask questions about labor force participation, hours, earnings that you have received over the 4 month period for that 4 month period. It would be September, October, November and December.
Then the next rotation group comes in in February and they have their reference period for the principal survey items is the 4 months preceding the interview month. The reference period is October, November, December and January and so on. In order to get monthly data for the full sample, you have to understand that each rotation group has a slightly different reference period.
For this survey, the important feature of panels, these are basically new samples, and they are initiated each year. There are waves of interviews and each wave of an interview is every 4 months. A wave is an interview.
Each panel consisted of eight interviews, except for the first one, which consists of nine. It got started a little earlier than we anticipated. Then within each wave of interviewing are rotation groups. That is just the sub-samples.
The data collection for the survey is handled through the Census Regional Office. The interviewers that are assigned to these offices conduct personal interviews with each sample household every 4 months.
At the time of the interviewer's visit, each person who is 15 years of age and older, who is present is asked to provide information about himself or herself. We do take proxies. We have not done much telephone interviewing at this point. Telephone interviewing is only used as a stopgap measure to get information if it was otherwise not able to be obtained in a personal interview.
For the interview, the median interview time was about 43 minutes overall. For a one person household, it was about 29 minutes.
We had planned on 30, so I guess we did all right there. It is a rather burdensome interview, or so most people think.
We do try to get mostly self-interview. We do try give instructions for self-interviews. However, the rate is not all that terrific, in my opinion.
One feature of SIPP is that at the time the sample is drawn, we have an address. We go to the address at the time of the first interview and we enumerate everyone who lives at that address.
From that point on, the sample is no longer an address sample, but rather a person-based sample. We follow those individuals whom we identify at the first interview for the next 2 1/2 years. It actually becomes the cohort of people identified at the time of the first interview.
For cost and operational reasons, these personal visit interviews are only conducted at the new addresses when people move, if that new address is within 100 miles of one of our sampling areas.
I am told that the interviewers have taken it upon themselves to ignore our advice and follow-up when they move beyond the hash marks. They follow-up through telephone interviews.
When you are designing the survey you try to think about how you get the best data and how not to burden our interviewers. I must have spent I do not know how many meetings talking about how far we should follow people. We really did not know whether we ought to have the interview or just make these phone calls and track them down. When we finally make the momentous decision, say yes, the interviewers should do this, we found out that, in fact, they had been doing it. Our interviewers are quite flexible.
It is a person-based sample after the time of the first interview, so you can get some rates on how the respondents change over time. That may or may not effect the quality of the data.
There are four components to the SIPP data collection, first is a control card, then the core set of questions that are repeated in every interview. Then there are modules which we call fixed modules that are assigned to specific ways of interviewing. Then, finally, variable modules that are added from time to time.
The control card, the first method of collecting data, is used to obtain and maintain information on the basic characteristics associated with households and persons, and to record some information for operational control purposes.
The characteristics on this control card are recorded by the interviewer and includes the basic demographics: age, race, sex, ethnic origin, marital status, educational level for each member of the household, some information on the housing unit, and relationships to the householder.
A household respondent, typically, provides this information. The control card is also used as a way of keeping track of the employment and income information that are reported in each interview.
Although the data are recorded on a questionnaire, the methodology is such that the interviewer refers to previously reported income types in each succeeding wave. This control card is a vehicle for writing down after the interview what exactly was reported and then the interviewer at the next interview refers to that.
Finally, the last reason for the control card is that it provides us as a way of keeping track of information for following people, the basic method that we use is just to ask at the time of the first interview whether there is one or more people within the household or outside the household who will always know where you will be at. That turns out to be fairly good in terms of being able to use that data for tracking people.
The other way we track people, of course, is through the ingenuity of the field staff. They can find more ways of nosing around the neighborhood to find out where our sample respondents move.
In the core section of the questionnaire is the principal reason for the survey. I mean, the content of SIPP was developed around these core data. It was designed to measure the economic situation of persons in the U.S.
These questions, as I said, are repeated at each interview. The core data built, basically, an income profile for everyone who is 15 years of age and older in the sample household.
The profile is developed by asking a series of questions about labor force participation over the 4 month period. Essentially, a calendar is developed, so you have weekly labor force participation.
Then asking specific questions about the types and amounts of various sources of income, and, particularly a number of detailed questions on program participation and asset ownership. There are a few questions that deal with health insurance, also.
There are several different questions on income types that are asked about during the interview. It pretty much runs the range of all the major federal income security programs. Then there is the asset listing; the usual listing of assets, savings, CD's, NOW accounts, IRA's, mortgages, royalties. There are a lot of questions and a lot of details.
In addition to these questions that we repeat each interview, we also ask a series of questions depending on the interview that we call "fixed topical modules."
The data from these modules should allow an analysis of well-being, which go beyond strictly the income and demographic area.
The idea behind SIPP was to provide a broader context for analyzing by adding questions on topics not covered at the core section.
The administration of these modules is made possible by the fact that when we go back in the second and subsequent interviews less time is required to update the income data and some time in the interview is freed up. Topics covered in these modules take up about 10 minutes, actually, for each module, each interview.
Typically, the data collected in a module does not have the same reference period that we have in the core data. It can be the last two jobs held. It could be over the last year. The reference period varies depending on the topic.
We have an information handout which you can request. It shows the breadth of data collected in this survey. It is called "Topical Modules for the 1984 Panel," and then goes on to 1985, 1986 and 1987 panels.
The kinds of questions that we ask deal with topics of health and disability, and our third interview of the 1984 panel, education and work history. A detailed series of questions on assets and liabilities, assets held and liabilities owned, and the values of those assets. Pension plan coverage, retirement plan shelter costs are in another interview. Child care arrangements and expenses in yet another interview.
Support for non-household members, marital history, fertility history, and migration history--all of these topics get asked in one module or another.
In the 1986 panel, and toward the end of the 1985 panel, there were a series of questions on health status and utilization of health care services, and another series of questions on long term care.
These tend to be questions about health conditions that last 3 months or longer. Then they ask if they need any help in looking after personal needs and who helped them, if they need help doing certain activities and, again, who helped.
These questions, and all questions that have to do with the topical content of SIPP, are not developed independently by Census.
When Census acquired funding for the survey in 1981, after the yearly budget cuts of the Reagan Administration, and DHHS had to bow out of the enterprise, a SIPP Advisory Committee was formed by the Office of Management and Budget (OMB). This Advisory Committee advises and recommends to Census changes in the SIPP content, particularly changes that relate to data required for policy analysis. The OMB Advisory Committee has representatives from over 20 federal agencies and it is through this Committee that we sort out the difficult task of deciding what goes on a survey of this nature.
It is a multipurpose survey and so you have got the problems of rule by committee. Everybody that comes to these meetings has their own agenda for action and for analysis, and it is through a process of working through committees and working groups that the content of SIPP is finally determined with regard to these topical modules.
The survey itself, because of the way we approach the modules, is viewed as government data resource and every effort is made by Census to maintain open lines of communication with this Advisory Committee with regard to the content.
In fact, we are now going through an exercise to develop the content for the 1988 panel of the survey. We have solicited information from various agencies on the Advisory Committee, asked whether there is any changes they require, demonstrated by data with regard to the core section of the questionnaire and sometime this summer we probably will start the process with regard to the topical module.
There is at this point one topical module open sometime in the 1988 panel, late part of the 1987 panel. I am sure that it is a varying policy. The government will have lively debates as to what ought to be considered in that module.
A little bit about non-response. SIPP, at the first interview, it was a household address sample and then it becomes a person-based survey. So the kinds of non-response rates you can devise vary.
Household rates, of a fashion, are a little complicated algorithm to create that because we do follow people and splits in household take place. It is not a one time concept, it is not a cross-sectional concept.
You can see that in the 1984 panel our non-response rate for the first interview was 4.9 percent, about 60-75 percent of that were refusals. That rate is pretty comparable to the rates that they see in the CPS in the March supplement.
As you go down there from wave one through wave nine, you will see that obviously the non-response accumulates. You will also notice that from wave to wave we lose fewer households over time, so that it averages probably about 2 percent a wave. We lose most of the people in the first three interviews. It is similar with the 1985 panel.
Another way of looking at non-response in this survey is to look at it from a person-based point of view. That is probably a better way. Looking at it from a person point of view, you are interested, usually, in the cohort of people that were there at the time of the first interview.
We have got about 79 percent who were there for five interviews straight. The dominant pattern of missing this is the attrition pattern. Once they drop out, they are gone. It is very difficult to convert them.
The way we treat our follow-up for non-response goes something like this. If you are out for two interviews in a row, just drop them from the sample, and we never go back. Otherwise, our interviewers try their best to convert the respondents, and they try hard.
Another form of non-response is the item non-response. Obviously, the interest in the survey is on income and program participation, whether we are getting better measurements of it.
We are getting lower item non-response rates for many income types that we are concerned with, compared to the March CPS.
Because it is a multi-interview design, you want to be able to link the people across time. An identifier is used to match data across time.
By and large, it works. The matching using the ID is very effective. We can match over time. Problems arise and you should try to understand why something does not match, it becomes very complicated, because people drop out of the survey for a variety of reasons, only one of which is non-response. Another could be they have moved, another they could have been institutionalized, or they could have died. There is a variety of reasons why you might not make a match, but, by and large, this identifier does work and people have been successful in matching over time.
The final thing I would like to mention are data products. The SIPP has several data products. One report series is the Series P-70, Household Economic Studies, which was originally providing average monthly data for calendar quarter; now looking more like a Series that provides information on special topics. The last three reports have dealt with assets and liabilities or wealth in the country, functional limitations and disability, and child care arrangements. Like any Census report, it is available through the Government Printing Office (GPO).
There are also working papers. These are papers developed by staff, and they are principally evaluation papers to deal with survey methodology and in some cases may even provide some substantive analyses, but only preliminary. That is, you have done a study and you do not quite feel that it is final, but you want to get it out to a broader community, the working papers vehicle is the way we do it.
The most important data product coming out of SIPP is the micro-data files. By nature, as I have mentioned, the content of SIPP is so diverse it is virtually impossible for Census or anyone to analyze all that data.
At this time, there are several kinds of micro-data files. We release data for the core portion, that is, the income and program participation data. We release it in two structures. One a complex structure that has a series of record types, household, family, person, and income. All these record types are related to each other through a series of pointers.
Another product is the same data just in a different format. That is a rectangular format. For every person in the sample there is one record and so it is a rectangular file. We have got what we call a complex file and a rectangular file--same data, just a different structure.
Then we release the topical module data. The topical module data is always in a rectangular format and it has all the core data collected at the time the module data were collected.
These are the files that are currently available. We have released the waves one-nine core data for the 1984 panel; all nine interviews are now available. We have released the wave three, four and five of the topical module data for the 1984 panel. The release of wave seven, which is another asset/liability module, is imminent within the next month or so. The release of some core data from the 1985 panel will take place within the next several months.
Finally, we have a multi-wave data collection, but all the files I have just mentioned to you are cross-sectional. They are just for the interviews that we have conducted. One of the problems we have had difficulty dealing with is how to create the multi-wave file in such a way, that it links data, edits and imputes it over time so that it all makes sense. The project is difficult. An initial stab has been made of it, and that is what we call the "multi-wave research file." It is not, strictly speaking, a Census public use file. It is a research file that is not available through the normal means of going to our Data Users Services Division. It is available by writing to me or David McMillan at Census.
It is a file that puts together three interviews of income and program data, edits and imputes for missing data in a way that attempts to be more logically consistent than what you would find by linking the individual wave files.
The wave files, as we release them, are processed independently so that if you match them together there may be cases of change in status which are solely a function of the processing system, not a function of any reporting by the respondent.
Those are the basic data products. The public use cross-sectional files are available through our Data Users Services Division.
Another way of accessing SIPP, by not going through Census is through a data base at the University of Wisconsin. The National Science Foundation has funded what is called SIPP Access. It is a data base system that is up at the University of Wisconsin. The contact there is Martin David, who is a Professor of Economics.
We have coming out shortly a users guide for SIPP that attempts to help people merge their way through these files; and a quality profile for SIPP, which talks a little bit about the non-sampling error aspects of the survey, and the other things you might run into.
MARY HARAHAN: We have a second speaker who will be talking about SIPP. By now you all know how very complicated it is.
ROBERT FRIEDLAND: The Employee Benefit Research Institute (EBRI) is a nonprofit, nonpartisan public policy research institute located in Washington, whose primary objective is to facilitate responsible public and private health and welfare retirement policies.
We are at the moment conducting two studies that explicitly examine the ability of retirees to finance health care for themselves and their dependents.
Both of these studies are using a number of public data bases, many of the data bases that we are discussing in these two days. One of the data bases that the two studies share in common is SIPP. SIPP is just one of the many data bases we are using for these two studies.
The first study, which should be out by this fall, is directed by my colleague Deborah Challet, and it examines the elderly's ability to finance their health care in general. I am conducting a companion study which examines financing of long term care, and I expect that to be complete a year hence.
The starting point for both studies is the economic status of the elderly, and, in particular, the degree to which the elderly are vulnerable to change, change from the loss of a spouse, illness, or inflation.
Long term care, in large part, is an issue related to retirement income adequacy. In a sense, this is why we turn to a data base like SIPP.
I was invited to talk about our views of SIPP. In particular, I would like to tell you why we chose this particular data base and how we ended up modifying SIPP to meet our purposes. In doing so, I would like to convey both the strengths and the weaknesses of this data base.
The primary reason we chose SIPP is that we felt this data could provide us with the most comprehensive picture of economic status of the elderly.
As we have just heard, it is quite a rich data base. Most of the first panel is now available. We can examine employment, earnings, sources and amount of retirement income, assets, liabilities, housing conditions, sources of health insurance including post-retirement employer provided health insurance, characteristics about current or past employers and occupations, the health and disability for any age, sex or marital group, or living arrangement that you want to put together.
SIPP will enable us to get a sense of the extent of limitations in activities of daily living (ADL), and instrumental activities of daily living (IADL).
We are able to get a sense of the degree of assistance needed, who provides that assistance, and whether or not this assistance is paid for.
In addition, some of the conditions address what health conditions were considered the primary reason for that limitation. This is for the entire population.
We can get a sense of how many days an individual spent in bed in a year. There are a few questions that ask about health care use. However, it really is limited to hospital use and ambulatory care. Unfortunately, nursing home care, in the data that is now available, is not asked. Also unfortunate is when there is exit from the survey, then there is reentry, and it is because of institutionalization, we do not know what kind of institution has been entered.
The scope in this area is wide, but it is certainly not as deep as the 1982 National Long Term Care Survey (NLTCS). The assessment for disability is somewhat limited. It is limited to hearing, vision, speaking, mobility, transferring, some light housework, meal preparation, and a broad category called "personal needs."
Cognitive disfunction and continence are not assessed, nor is there any attention paid to technological aids, such that you find in the 1982 NLTCS.
A secondary reason why we chose to use SIPP is that the data base offers potential for addressing so many different social, economic and public policy questions. Its richness is enhanced because the longitudinal nature of the core questions that we have just heard about that are asked every 4 months.
There is also a list of topical modules. There is a wide range of topical modules over the next 6 years.
The designers of SIPP I have to commend. They were ingenious in their way of minimizing the cost of collecting the data and getting the data out quickly without sacrificing, it appears, the integrity of the data.
As soon as each of the four rotation groups is interviewed, the data is prepared for release as a wave.
Each of the 4 months of data is labeled as month one through four. Corresponding month one, for example, of each of the four rotation groups of the weight is not the same as calendar month for each person in the rotation group.
Because each rotation group is interviewed one after the other, there is partial overlap.
To illustrate, when you get a wave of data, you are getting information about January in July on one quarter of the panel, and information on April is available for everybody.
It is not bad to use this approach for just one wave. But I think there are some limitations in just using one wave of data. The cost of using two waves of data increases tremendously when you try to put them together.
This is the biggest disadvantage of using SIPP. The cost of producing SIPP has been shifted to us who use the data base and minimized by Census. On one hand, as a taxpayer, I applaud this and as a researcher I get a little older trying to sort this through.
You practically need a commercial pilot's license to navigate through the relationship between rotation groups and waves. The data base is very complicated and can be very expensive to run.
We were not particularly happy with the wave format and we decided early on that we, for our purposes, needed to combine waves. We combined waves two through five to create a 12 month longitudinal file corresponding to 1984.
We started this process before Census announced that they would create a longitudinal file. I understand the longitudinal file that is available from Census is a 12 month file, but not necessarily a calendar year file.
We felt, for purposes of public policy discussion and in particular, those who deal with public policy, the ability to compare numbers from different data bases. In particular we had to worry about, after using the CPS for all these years, numbers that come out in one data base versus the numbers in the CPS.
We put together, for that reason, among other reasons, a calendar year file. That was not an easy trick.
We have just completed this process and we are now attaching to our calendar year file of core data, data from two topical modules. Topical modules from waves three and four provide us information at a point in time on assets. I am glad to hear that the second asset module is coming. We will have two point of time of assets, liabilities, health, and disability.
We have really just begun to look at the data and I am glad that we put the many waves together because one finding, not terribly surprising, was the propensity for inconsistency on the part of the respondents. The data seems to be relatively very clean on the part of the coding and the Census edits are beautiful, but they can not control for inconsistency on the part of the respondents. The inconsistency, if you are looking at the elderly, appears to increase dramatically with age, particularly when you hit around age 75.
If we had not put the waves together, we would not have seen this, and we would not have been able to come up with an algorithm to try and adjust for that.
I would like to close with a thank you to ASPE and the Office of the Assistant Secretary for Health (OASH) and all those who participated in the development of this Conference.
In some ways, finding out about data, especially forthcoming data, can be at times as frustrating as trying to find out about available community-based services for our loved ones.
In taking the analogy one step further, very often the answer depends on where you begin the process.
Having some personal experience in both of these matters, I know that the consequences cannot be compared. This forum will help a great deal as research is conducted, as markets develop, and as public policy is formulated using information, availability of these data bases will be critical in that process. If you have questions about how we put together our SIPP data base, I would be happy to answer those questions.