Once there is agreement upon the goals of a program, the next step is to develop specific measures that reflect these goals. At this stage, both operational and theoretical concerns must be taken into account.
The availability of data is the primary operational concern. Overall goals must be linked to specific measures for which accurate and comparable data are available at the state level in a timely fashion and at a reasonable cost (Brown and Corbett, 1997; Hatry, 1999; Yates, 1997, Zornitsky and Rubin, 1988). Comparability of data means that the measures should detect real differences in program performance across states or localities or over time rather than reflect differences in the quality of data used to calculate the measures. Timeliness of data has not always been considered in the selection of measures, but it is necessary if the performance measurement system is expected to provide policy-relevant feedback to states on the results of their actions. At the July 1999 consultation, states expressed a clear preference for minimizing the cost and burden of data collection by using measures that could be assessed using national survey data or data they were already reporting over measures that would increase their data collection and reporting responsibilities.
Experience to date has shown that data systems at all levels of government fall short of the ideal (Brown and Corbett, 1997). These experiences have found that data for some measures cannot be collected at all, while others can only be measured poorly. Moreover, the cost of developing or improving data collection systems can be substantial. While states are collecting a range of information about TANF recipients beyond that required under the federal reporting rules, they do not all collect the same data elements. Even when the same general information is collected, there is no consistency in how it is measured across states (APHSA, 2000).
Some performance measurement systems have had success with existing administrative data - such as Unemployment Insurance (UI) records - which are collected uniformly across a range of states or localities (Bartik, 1996; Yates, 1997). Administrative data usually attain some level of quality and the cost of collecting the data is limited since they are generally collected for other purposes. However, the types of measures that can be derived from administrative data are limited. For example, UI records include data on earnings over a quarter, but not on hourly wages. Moreover, the quality of administrative data is highest for data elements that are directly related to the purpose for which the data were collected - such as the amount of benefits paid - and lower for other elements - such as the educational level of recipients (GAO, 1997). National or state surveys can also provide data on a wider range of measures, particularly if existing survey efforts can meet the needs of the performance measurement system. However, initiating survey efforts can be relatively expensive. If one is interested in outcomes that are valid at the level of specific states or localities, relatively large sample sizes will be required to achieve this level of precision. Appendix D of this report reviews the merits and disadvantages of several potential data sources for outcome measures.
The theoretical concerns in developing measures are driven by the fact, as noted earlier, that all high-level outcome measures are affected by a range of factors, not just by program performance. This would not be a problem if there were a strong correlation between performance on an outcome measure and program effectiveness, as shown through evaluation. Unfortunately, research has shown that there is not always a consistent relationship between program outcomes and impacts.(2) For example, many welfare recipients find jobs on their own - without the assistance of welfare-to-work programs. The role of welfare-to-work programs is to add value to the "natural" movement off welfare and into employment. States with stronger economies and lower unemployment rates are generally able to move more individuals into employment than those with weaker economies. Similarly, states with a more disadvantaged caseload may have greater difficulty moving individuals into work than states with a more job-ready caseload. Therefore, differences in economic conditions or in caseload composition, rather than in welfare-to-work program effectiveness, may have more to do with performance on an outcome measure.
Appendix A examines this issue in more detail. Using data from random-assignment evaluations of welfare-to-work programs in five sites, it can be seen that there is not a consistent relationship between the programs with the highest employment rates or average earnings - two possible outcome measures - and the programs that produced the greatest impact on these measures. This problem is one of the major issues identified in the 1994 report to Congress (HHS, 1994) that needs to be resolved in order to adopt an outcome-based performance measurement system. However, the research also shows that this problem is not unique to outcome measures - participation rates over time also are poorly correlated with program impacts.
Several lessons can be drawn from this research:
- Outcome-based performance measurement is only one element of a comprehensive monitoring and research program. By necessity, performance measurement systems are limited to those elements for which data can be collected inexpensively, routinely and in a timely fashion. In-depth understanding of participant experiences and program effectiveness requires different approaches, including detailed participant surveys, rigorous evaluations, and advanced econometric analysis (Forsythe, 2000).
- Outcome-based performance measurement can still be a useful tool to monitor program operations and promote improvements, as long as stakeholders at all levels of operations agree that there is a clear logical system connecting the activities of program operators to the outcomes that are measured. Outcome measures may be used to identify areas where additional resources or technical assistance are needed (Perrin and Koshel, 1997).
- The chosen measures must not give programs incentives to achieve high levels on performance measures through the use of strategies that subvert their fundamental intent. For example, it is important to develop measures for welfare-to-work programs that minimize incentives for creaming (e.g., serving only those who are most job-ready and most likely to become employed on their own, with minimal program assistance). Likewise, in measuring how well states are succeeding at helping former welfare recipients achieve enduring self-sufficiency, care should be taken not to focus exclusively on welfare recidivism, since a state could achieve very low levels of recidivism by making it impossible for former recipients to reapply for cash assistance, even without supports for employment.
- Because it is impossible to fully account and adjust for all the variations in circumstances among states, no performance measurement system can be perfectly fair. It is important to develop mechanisms which recognize that states are facing different economic and demographic environments. Discussed below in the next section are a number of ways in which the standards for performance measures can be adjusted to provide a more level playing field for all states, and the advantages and disadvantages of each.
The selection of specific measures inevitably involves trade-offs. The use of multiple measures can help guard against any unintended consequences that might be caused by reliance solely on a single measure. However, it is important not to err by going too far in the other direction - a relatively complex system can have a less immediate effect on motivating programs in any particular direction (Bartik, 1996). It is also important not to lose sight of the program goals and desired outcomes: the measures that have been chosen must reflect the initial choice of goals.
Part III of this report includes a detailed examination of several potential measures that could be used to assess the performance of state Temporary Assistance for Needy Families (TANF) programs. These include the measures that have been selected for the TANF High Performance Bonus.