National Invitational Conference on Long-Term Care Data Bases: Conference Proceedings. A. NHANES I Epidemiological Follow-Up Study


Jennifer Madans, Ph.D., National Center for Health Statistics

I am going to present the NHANES I Epidemiological Follow-up Study, which is a jointly funded and truly collaborative study between the National Center for Health Statistics (NCHS) and various agencies in the National Institutes of Health (NIH) and the Alcohol, Drug Abuse and Mental Health Administration.

As the name implies, this is a longitudinal study that uses as its baseline cohort those persons who are examined as part of the first National Health and Nutrition Examination Survey (NHANES).

We had three major objectives in designing this survey, with the major aim being trying to make the most of a longitudinal nature of this data base.

The first was to relate morbidity, mortality and institutionalization to risk factors measured at baseline.

Second, we wanted to look at changes in individual characteristics between baseline and the follow-up. Here we are basically talking about risk factors. In some cases, we have measurements at time one and a measurement at time two. In other cases, there was some retrospective history collected at the time of the follow-up.

Third, to the extent possible, we wanted to look at the natural history of chronic disease and also functional impairments.

First, let me say something about the NHANES program.

You heard earlier about two of the other data systems in NCHS, the National Health Interview Survey (NHIS) and the National Nursing Home Survey (NNHS). One of the other major data collection systems is the National Health Examination Survey (NHES).

These started in the 1960's. The first three were called Cycles I, II and III.

NHANES I is really the fourth in the cycle, but in the early 1970's the survey was expanded to take into account an interest at that time in poverty, the effects of poverty on nutritional status and then the effects of nutrition on health. There was a major nutritional component added to the NHES, and, hence, the name NHANES.

The second NHANES was done in the late 1970's. An Hispanic NHANES was done in the early 1980's. We are currently planning the third in the series and that should be fielded in about 2 years.

These surveys are unique in that they contain objective measures of health as opposed to the interview surveys which are based on self-reports or the surveys that are based on records.

We do a multi-stage probability sample down to the household level, an interviewer visits the household and takes some history information, does a series of interviews there and then the sample person is asked to come to a trailer where a standardized exam is administered. There is a lot of data collected at this point.

The first NHANES was done between 1971 and 1975. It has a very complicated sample design. There was a nutrition component and then a detailed component and sub-sampling within that. It is somewhat difficult to do longitudinal analysis on this kind of data base.

We have decided to follow the 14,407 people who were 25+ and over at the time of that survey.

The initial follow, which is what I will be talking to you about today, was conducted between 1982 and 1984. Based on earlier presentations, these sound like some very good years for data collection.

There were multiple parts of this design, which I will go through in some detail. This is an ongoing survey. In 1986, we did a telephone re-contact of survivors who were 55+, and over at the time of the baseline. That data collection is completed and is now being processed and cleaned, and hopefully we will have data tapes for that in 1988-1989.

We are currently in the field with the third wave of follow-up, which is another phone re-contact of the entire surviving cohort.

We will be following this cohort through the National Death Index (NDI) until they are all deceased. We may have a series of other interview contacts over the next 10 years, depending on need and funding.

Let me say that the tapes for the initial follow-up will be released through National Technical Information Service (NTIS) in July. If you have any questions prior to that, we do have some materials in the office and would be happy to send those to you.

The kind of data collected at the time of NHANES I includes medical histories, a health care needs and the nutrition component, the standardized examination, and a series of laboratory tests and X-rays.

To give you some idea of the magnitude of this data collection, there are 14 data tapes from NHANES I, all of which are public use and can be ordered from NTIS.

The design for the follow-up did not include an examination, but did have four other kinds of data collection mechanisms. The first activity, of course, was to trace the people in the cohort. This was somewhat problematic because NHANES I was not designed as a longitudinal survey, and there was no tracing information collected. We had no information about these people over the 10 year period of the follow-up.

We used various methods of tracing to determine vital status, which was the first data point, and also to get an address of the subject or of a proxy who could act as a respondent if the subject was deceased.

Once that address was obtained, we conducted personal interviews with surviving subjects and did some physical measurements at the time of the interview. We also did telephone interviews with proxies for decedents.

The other major data collection activity was getting the hospital and nursing home records for the 10 year period of follow-up. We got the names of all the hospitals and nursing homes they had been in, contacted those facilities and got copies of the records.

Finally, we obtained death certificates through state Vital Statistics Offices for all decedents.

We started out with 14,407 in the cohort and managed to trace 93 percent of them over the 10 year period. By "traced," I mean we could determine their vital status, either because we contacted subjects who were surviving and could respond to our validation questions, if we could get a proxy to do the proxy interview, or if we got a death certificate.

In terms of the results of the interview component, we have broken that out to show the difference between the surviving subjects and the decedents.

The response rate for surviving subjects is 93 percent.

However, for deceased subjects, we could only get proxy interviews for about 84 percent. A lot of the tracing was done through mortality records. We would get a death certificate, but there was no way to contact a proxy. There were no leads that we could pick up on to find someone who would do the interview.

We have death certificates for about 96 percent of all known decedents.

Blood pressure, weight and pulse measurements are the three physical measurements that we conducted; that also was quite successful with 96 percent response rate there.

The longitudinal data is wonderful. That seems to be the wave of the future. You have to have good follow-up, otherwise you can not really generalize about your findings. The first thing we tried to do is evaluate how good the tracing was, particularly because there was no built-in tracing mechanism in the baseline.

Here you have the results of tracing by sex and age, and there is clearly some differences. The older people were much easier to trace and there is a race/sex interaction. White males tend to be the easiest to find and Black females the hardest to find.

We do have problems in the younger age groups, especially among females because of name changes.

When this survey was designed there was a lot of interest in poverty and the sample was designed to over-sample areas where they thought they would find high levels of malnutrition or health effects of malnutrition. There was over-sampling among women of child-bearing age, the elderly, and of people living in poverty areas.

The sample was designed to maximize the rates that were to be calculated by subgroups. It was not really even designed as an epidemiologic study. You get this very funny age distribution, which is bad for some things, but happens to be very good for studying nursing home utilization.

We did look at how health effects measured at baseline were related to whether or not a person was traced.

If we were having a hard time finding people who were sicker at baseline, that would indicate that probably our mortality follow-up was not as good as it should be. We have looked at a multiple regression with age, sex, race and health characteristics to see if they were significantly related to successful tracking.

The only variable that seems to be related to not being traced is smoking. There is some indication here that either we are missing some deaths, possibly among the younger age group, or smoking is acting as some kind of surrogate for some other characteristic that makes people hard to find, people who move often, for example.

In the course of doing these continued follow-ups, we keep trying to find those lost to follow-up. Eventually, we will find them, if through nothing else, through the mortality records.

As the NDI is expanded backwards in time, I think it starts in 1977 now, we will be able to fill in that gap where we have not been able to do adequate death tracing. We are finding people at each stage. You lose some; you find some. I think that especially in the elderly our response rate is up close, I think, to 96-97 percent. We are always looking for new and interesting tracing mechanisms.

Finally, we compared the mortality experience of our cohorts with what would be expected given the national mortality rates occurring at that time. What you find are proportions surviving for White males and White females 65-69. We do have a representative population and that we are not missing any significant portion of that population or a particular kind of person.

You would expect in the early years for our cohort to have lower mortality (because they had to be healthy enough to make it to the van) and, therefore, would have a lower death rate.

On the other hand, we over-sampled in poverty areas and in other groups where we thought we would have higher mortality. The poverty areas do have higher mortality. The non-poverty areas have lower mortality.

When you put those two together, they just kind of converge on the center. In the final analysis, the data is behaving as one would expect it should.

The next aspect of the data collection was the interview procedure and this consisted of a very lengthy questionnaire--it took about 2 hours--and also the physical measurements.

I said this was a collaborative study. It is truly a collaborative study. There are about 12 institutes that participated all with their own agendas. If you look at the questionnaire topics, you can identify who participated in this study.

It was an extremely complicated interview. I would say that the majority of it was taken up by determining whether someone had some chronic condition or acute condition in the 10 year follow-up. If they did have it, when was the onset and were they hospitalized.

If they were hospitalized, the name of the hospital was obtained. All that information was taken down and used for the next kind of data collection.

There was also some more risk factor information collected, some psychosocial variables, some mental health variables, smoking history and that kind of thing.

Most of the subject interviews were done in-person, the proxies were done by telephone.

Again, the physical measurements were pulse, blood pressure and weight. The interviewers actually carried around a little scale and blood pressure equipment, and took three blood pressure measurements.

The non-response goes up with age. These were people who, for some reason, we felt should not take part in the physical measurement section because of a health condition. In some cases, we had an interview with someone using a proxy respondent because the subject was incapacitated; for example, they were in a nursing home where they were too ill to participate. In those cases, of course, we could not get physical measurements.

Finally, the health care facility data collection included the names of all of the institutions that someone had been in; all the hospitals, nursing homes, any other kind of overnight stay the person had. We asked people to sign a release form, sent those to the hospitals, got all the records back and so have, in essence, a 10 year history of utilization for 14,000 people.

The continued waves of follow-up also get this information and we are continuing to go back to hospitals to get these records.

We asked them to fill out an abstract form, but also to send us a Xerox of the face sheet and the discharge summary.

It is a little difficult to give you response rates on this kind of data collection, because we were not quite sure what we should have. We were asking people to recall dates and reasons for hospitalization over a 10 year period. People are not very good at remembering dates.

We do have information from about 2,500 facilities and 400 nursing homes. Some did refuse to participate, and, in some cases, the respondent refused to sign the release form.

We are dealing with about 17,000 hospital records and about 400 nursing home records.

We feel this is a very important data base and can be used for a lot of different activities, including health services utilization, but also used to verify certain diagnoses.

We have objective measures of health at baseline. We do not have objective measures at follow-up, but if someone reports that they had cancer, an MI or something like that, we can look at the hospital record and try to get a verification of that.

We do have some hospitals that refused plus some people that refused. We have an additional contract to do an evaluation of these records to try to merge what the person told us and on a case-by-case basis match that with what the hospital sent. We try to make up dummy records where we are pretty sure we should have a record, but the hospital refused to participate. The entire file will also be matched to the Medicare file to evaluate completeness. We also will do some methodological work on how far back people can remember and do they remember certain kinds of conditions better than others, etc.

That is currently underway. The hospital records will be released with the entire file, but we will then do another release of this evaluation tape in about a year and a half.

Finally, the collection of death certificates. There are a few cases where we do not have the death certificate and keep going back to the states trying to obtain them.

That pretty well describes the data collection. I guess the next question is what can it do for issues of long term care. I think the first thing you should be aware of is this funny age distribution and that we do have a lot of people in the older age groups.

The epidemiologic follow-up is a representative sample it is a large sample, and it is multipurpose. It was not specifically designed to look at nursing home care, and was not designed to look at institutionalization; it was not designed to look at any one particular thing.

Because of that, it has a lot of different kinds of information and you can start to look at things like the interrelationship between health and sociodemographic or socioeconomic factors and the use of nursing homes.

We have a couple of little scenarios about how people can get into a nursing home. In this case you have a health effect that leads to an income change. Then there is some outside factor, some home support not being available and the person goes into a nursing home.

Alternatively, you have some problem with income, then you have the health effect and that leads to a nursing home. You have various payment strategies once you are in the home.

You can have a scenario where a person had much better higher level of income and through some outside factors like the death of a spouse, also ends up in a nursing home at some later date with a different kinds of payment.

The epidemiologic follow-up clearly cannot differentiate between these patterns of health and utilization. It can start to look at some of the components in trying to understand these very complicated interrelationships between health and social factors and utilization.

What we are trying to do now is look at something fairly simple.

We are looking at some of the socioeconomic variables measured at baseline in relation to outcome. This is the kind of table that we are planning on running, using survival techniques. We can look at the percent institutionalized at any point in the follow-up period, the percent not institutionalized, but functionally dependent, and then the percent not functionally dependent. Family income is measured in the dollars in 1970-1975. About 15 percent of this sample had been in a nursing home at some time during the follow-up. About 9 percent were in a home at the time of the interview or had been in a home prior to their death, and then about 3 percent who had been in and out again.

The data base includes these hospital and nursing home records. You can look at how the hospital experience relates to the nursing home stay, over a 10 year period, which is a fairly long period of observation. That is just for the initial follow-up, and now we are adding about another 3 years to that.

View full report


"87cfproc.pdf" (pdf, 2.52Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®