Review Criteria for Assessing Program Evaluations
Used by the Special Panel of Senior Editorial Advisors
The study addresses a significant issue of policy relevance; evaluation findings are likely to be useful.
A literature review is included; the project is shown to be logically based on previous findings; the report uses theory and/or models; program assumptions are stated; the evaluation draws from previous evaluations (if any); there is linkage with, and description of, a program; multiple perspectives are presented if multiple relevant stakeholders are consulted and involved; the timing is appropriate because the program is ready for evaluation.
Questions for evaluation The aims of the evaluation are clear, well specified, and testable; the questions are feasible, significant, linked to the program, appropriate with respect to resources and audience, and derive logically from the conceptual foundations. Ingenuity and creativity are shown.
Findings and interpretation The conclusions are justified by the data analyses; the summary does not go beyond what the data will support; appropriate qualifiers are stated; the conclusions fit the entire analysis; equivocal findings are handled appropriately; the initial questions are answered; the interpretation ties in with the conceptual foundation; there is recognition of consistency with, or deviation from, the relevant literature; the presentation is understandable; the results have practical significance; the extent of program implementation is assessed.
Recommendations The recommendations follow from findings and are worth carrying out, affordable, timely, feasible, useful, and appropriate; the recommendations are shown to be relevant to the questions asked; the breadth or specificity of the recommendations is addressed. Any recommendations for future evaluations and/or for improvements in implementation are clearly presented.
Design considerations include overall appropriateness; soundness; feasibility; funding and time constraints; generalizability; applicability for cultural diversity; assessment of the extent of program delivery; validity; feasibility for data collection; reliability of selected measurements; use of multiple measures of key concepts; and appropriateness of the sample. In addition, variables are clearly specified and fit with the questions and concepts; the design permits measurement of the extent of implementation of the program and answering of the evaluation questions.
Data collection Data are collected using appropriate units of measurement for analysis, controls for participant selection and assignment bias, and proper handling of missing data and attrition. Other considerations include use of an appropriate comparison group or control; adequate sample size, response rate, and information about the sample; a data collection plan; data collection that is faithful to the plan; attention to and cooperation with the relevant community; project confidentiality; and consistency in data collection. The quality of the data, (including the quality of any extant data sets used in the study) and the efficiency of sampling are addressed. The data collection is appropriate to evaluation questions.
Data analysis Among the factors that the data analysis addresses are the way that attrition should be handled; the matching of the analysis to the design; the use of appropriate statistical controls; the use of methodology and levels of measurement appropriate to the type of data; and estimation of effect size. The analysis shows sensitivity to cultural categories; appropriate generalizability of inferences; and choice of an analysis type that is simple and efficient.
Crosscutting Factors The following crosscutting factors are likely to be important at all stages of a report: clarity, presentation, operating at a state-of-the-art level, appropriateness, understandability, innovativeness, generalizability, efficiency of approach, logical relationships, and a discussion of the report's limitations. The report should also address ethical issues, possible perceptual bias, cultural diversity, and any gaps in study execution.