Medicare Post-Acute Care: Quality Measurement Final Report. D. Instrument Refinement


1. Expert Panel

Once the four draft instruments were compiled, we convened a second expert panel that included content experts, methodologists, providers, and Federal policy officials (see Appendix H). All expert panel members were asked to comment on the relevance and feasibility of the measures, response burden, sampling issues, definitions of an episode of care, and redundancy with other reporting requirements. Content experts (disease specialists) in the four clinical conditions were also asked to comment on the balance between global and disease-specific measures, the feasibility of constructing a summary performance measure, and risk adjustment. Methodologists, including those with experience in post-acute care quality measurement, were asked to comment on the feasibility of administering the measures, the scaling (i.e., floor and ceiling effects), responsiveness of measures, and risk adjustment.

Prior to the in-person panel meeting, we convened three separate conference calls with disease specialists in the areas of CHF, pneumonia, and back and neck conditions. (Note: We did not convene a separate conference call for stroke because the stroke specialist was present at the in-person panel meeting.) Among the issues discussed during the specialist calls were disease-specific measures, post-acute care episode length, and case mix adjustment. Notes from the three specialist calls are included in Appendix I. One especially significant outcome of the specialist calls was the decision to narrow the back and neck condition down to the more specific condition of lumbar spinal stenosis. The back and neck specialist strongly recommended using lumbar spinal stenosis as a tracer condition for this study because it is a relatively prevalent condition that will yield a more homogeneous patient sample. Therefore, throughout the remainder of this report, we will refer to the former back and neck condition as lumbar spinal stenosis.

For the in-person panel meeting, we compiled a series of key questions (see Appendix J) related to issues such as post-acute care episode length, measure feasibility and responsiveness, response burden, and case mix adjustment, which served as a guide for panel discussion. Following are some highlights of the panel’s recommendations.

  • With respect to physical function particularly as it relates to stroke outcomes, panel members recommended including a more thorough assessment of higher order function (i.e., IADLs) in addition to basic ADL function, which would allow better discrimination between patients with varying degrees of functional ability as well as recovery of higher level functions following stroke.

  • In defining an episode of post-acute care, (the interval between admission and follow-up assessments), the panel’s objective was to recommend an episode that would allow enough time for most disease-specific outcomes to have occurred, while not encountering any new post-acute care episodes. Panel members therefore recommended a 60-day follow-up for CHF and pneumonia, and a 90-day follow-up for stroke and lumbar spinal stenosis.

  • Panel debate centered around the appropriate assessment of satisfaction. Patient confidentiality may be a concern if assessing satisfaction upon admission, especially if the data collector were affiliated with the post-acute facility; whereas poor patient recollection of the post-acute stay may be a concern if assessing satisfaction at follow-up. Mail-in satisfaction surveys were mentioned as a possible alternative; however, difficulties with low response rate to mail-in surveys remains an issue. Consensus was not reached in this area.

  • The panel recommended assessing caregiver burden, satisfaction, and patient and caregiver education as outcomes that are reflective of quality of post-acute care.

  • Certain processes of care that may be difficult to abstract from post-acute care charts might better be assessed through patient interview at follow-up. For many of these measures, such as caregiver/family education and discussion of advance directives, patients’ perceptions of care processes might be more informative than the facility’s documentation of such processes.

  • Another issue discussed during the meeting included disease-specific risk adjustment items (e.g., Rankin Score for stroke and renal disease in CHF).

2. Feasibility Tests

Prior to and concurrent with the second expert panel meeting, we began conducting feasibility tests of the four quality measurement instruments in SNFs, IRFs, and HHAs located in the Denver metropolitan area. We selected facilities based on feedback received from local clinicians regarding volume of admissions and responsiveness of facility staff. Following is a brief summary of the feasibility test issues and findings.

Recruitment Issues

Feasibility testing was conducted in two phases. During the first phase, four SNFs, two HHAs, and two IRFs were enrolled and a social worker or intake person from each facility was asked to identify patients newly admitted with one of the four tracer conditions. Identified patients were subsequently invited to participate and consent was obtained. Interviews, using the instruments developed, were then conducted to determine feasibility of administration of the questions.

In the first phase of piloting, 21 subjects were recruited and interviewed. Of the 21 interviews, 13 were conducted in SNFs, six were conducted in HHAs, and two were conducted in IRFs. Seven of these 21 subjects were admitted to the post-acute facility with pneumonia, five with CHF, two with back and neck conditions, and seven with stroke. Hospital and post-acute care charts were obtained on these subjects and reviewed using the chart review instruments. Revisions were made to the instruments based on insights gained from the first phase of piloting and feedback from the expert panel. A second phase of piloting was then conducted using the newly revised instruments. Four subjects were recruited and interviewed during this phase. Of the four interviews, one was conducted in an SNF, two were conducted in IRFs, and one was conducted in an HHA. One of these four subjects was admitted to the post-acute facility with a back and neck condition and three were admitted with strokes.

During both phases of feasibility testing, ongoing difficulties were encountered in identifying and recruiting patients. Facility staff who were responsible for identifying newly admitted residents for our study indicated on several occasions that there were numerous demands on their time and that they simply did not have the time (or additional staff) necessary to review the facility’s admissions on a regular basis. This issue was of particular concern to staff members in SNFs and HHAs. Although we had been successful with this method of patient recruitment for past studies, many of the facility staff and administrators we spoke with indicated that pressures under PPS had placed increasing demands on their time and available resources.

General Findings

  • The time to complete the entire admission interview, after consent was obtained, averaged 25 minutes.

  • Approximately 13% of eligible subjects declined to participate.

  • Subjects requiring a proxy were not eligible for the pilot study because we did not have IRB approval to use proxies. (Approximately 17% of subjects were below the mental status cut-off.)

  • The order of the sections was changed to improve the flow from one section to the next.

  • The time to complete the telephone follow-up interview averaged 14 minutes, and the response rate was approximately 63%.

Married, Education, and Race

A question about the duration since married/widowed/divorced was deleted because of uncertain relevance to any of the measured outcomes or as a risk stratifier.

Mental Status

The time necessary to complete the MMSE averaged 10 minutes and some subjects appeared to be annoyed or burdened by the questions. The SPMSQ, used on the stroke and lumbar spinal stenosis instruments, appeared to be less burdensome for subjects to complete, eliciting fewer complaints or difficulties. However, the purpose of mental status testing is three-fold: (1) To determine a subject’s capacity to consent to participate in the study and understand the described risks and benefits; (2) to identify subjects whose memories are impaired and whose responses to questions about past functional capacity may be inaccurate; and (3) to use as a descriptive variable for the study population and risk adjuster in later data analysis. Given these three needs, the MMSE was retained despite the time necessary for completion.

Physical Function

As mentioned, this section was initially expanded to include measurement of perceived physical function at the time of the interview as well as functioning one month prior to the hospitalization. However, we found qualitative difficulties administering the questions related to function at two distinct time periods, with subjects becoming confused. Thus, function at the time of admission to rehabilitation was dropped from the patient survey, because it can be derived from automated patient data in the form of a cross-walked Barthel Index, for the purposes of risk adjustment. While functional items from automated patient data like the MDS are considered adequate for risk adjustment purposes (for function at the time of post-acute admission), recovery of patient-perceived function is the outcome of interest. Recovery of function requires measurement at a fixed time point across all settings and must be obtained from the patient or proxy, as previously discussed.

IADL measures were included in the final stroke instrument to represent higher levels of function as recommended by the panel. However, IADLs with high non-response rates (e.g., due to gender bias) were excluded based on prior IADL measurement experience at the University of Colorado Health Sciences Center.

Social and Role Function

The Re-integration to Normal Living Scale was difficult to administer in an interview format, although the original instrument was used in this context. In particular, the 5-point scale proved to be too cumbersome for subjects to answer. Secondly, the first-person nature of the questions was awkward to use in an interview format. The following changes were therefore made:

  • Several questions about role function were deleted prior to the pilot project because they were not relevant to an elderly, post-acute population.

  • The first-person narrative was changed to a question format (from “…I moved around my house…” to “were you able to move around your house…”).

  • The present tense of the questions as originally written was changed to past tense for the admission interview to allow measurement of social and role function prior to the index event, in concert with the other baseline measurements.

  • The answer scale was changed to a yes/no answer, followed by a rating of how bothered the subject is/was by limitations in social/role function if he/she answered “no, they were not able…” to perform a given social/role function.

  • Throughout the process of altering the Re-integration to Normal Living Scale, a concerted attempt was made to maintain the essence of the questionnaire in terms of measuring subjects’ function from their perspective, their perceived need to perform a certain function, and how disturbed they are/were by limitations in social and role function.

Pain (Lumbar Spinal Stenosis Instrument Only)

  • Questions specifically referring to back/neck or limb pain were removed because subjects’ responses to the general pain questions appeared to be related to back/neck/limb pain and the site-specific questions seemed redundant. In an attempt to shorten the number of pain questions, questions related to numbness were also removed.

  • The pain instrument was derived from the MOS pain questionnaire. The original instrument uses a 20-point visual analogue scale. With pilot subjects, the 20-point scale was awkward, and a 10-point scale was more familiar to subjects. A 10-point scale was adopted.


The GDS questions were easily administered. Question #9 was difficult for several subjects to answer because at the time of the interview they were in a skilled nursing or rehabilitation facility and were therefore limited in their ability to go out. Therefore, it was decided that this question would be asked only of home health patients.

Social Support

Two questions regarding the availability of a willing and able caregiver were added to reflect both short-term and long-term social support networks. The original instrument had similar questions, but they proved to be difficult for subjects to answer in their original format, particularly about the time frame of support availability (“indefinite, short time, now and then”). Some participants answered that all three were true and specified a different person for each. The added questions were adapted from the Longitudinal Study on Aging, but differ in the addition of the two time frames: short and longer term.

Quality of Life (CHF and Pneumonia Instruments)

Questions related to dyspnea were modified. The original questions had a seven-level response ranging from “Not at all short of breath” to “Extremely short of breath.” The response scale was modified to a three-level scale because of subjects’ difficulty responding to the seven-level question. This modification decreases the likelihood of finding the same degree of variability in these questions as well as detecting small amounts of intra-subject change over time. The trade-off between making the questions easier to answer versus decreasing the likelihood of finding change and variability was intended as a discussion point for the expert panel, but there was not ample time to cover this issue. Thus, a decision was made to simplify the questions based on the qualitative experience conducting pilot interviews.


Patient satisfaction with care was ranked very highly by the original expert panel and considered a necessary component of quality measurement for post-acute care. However, measurement of satisfaction presents several challenging issues. First, when should satisfaction be measured? In the original instrument, satisfaction questions were placed in the follow-up telephone interview. This presents a problem because that interview is scheduled to take place 90 days later. Subjects’ ability to recall the care received 90 days prior may be suspect. Alternatively, measurement of satisfaction early in the post-acute stay via the baseline instrument may reflect satisfaction with acute hospital care, or be too premature for subjects to have formed an opinion. Also, at the baseline interview facility staff will be collecting baseline data, which may bias subjects’ response to satisfaction questions. We have considered an alternative strategy of giving a written satisfaction survey to subjects and asking them to mail it at the end of their post-acute stay. This strategy is subject to selection bias. Ultimately, we decided to ask three satisfaction questions during the 90-day follow-up telephone interview.