The following core criteria were developed to address four types of evaluation conducted by HHS agencies, offices, and programs:
- Program Effectiveness (PrE) evaluations provide a way to determine the impact of HHS programs on achieving intended goals and objectives.
- Performance Measurement (PfM) projects are intended to assist in the development of data systems for monitoring progress on departmental strategic or agency performance goals.
- Environmental Assessments (EnA) provide a means for understanding the forces of change in the health and human services environment that will influence HHS programs and achievement of its goals and objectives.
- Program Management (PrM) studies reflect the needs of program managers to obtain information or data helpful for designing or managing a program.
The criteria are intended to be generally applicable to each of the four types of evaluations. However, some special considerations are noted for specific elements when applied to particular study types. All criteria are intended as general guidelines; not all elements may apply to every evaluation.
- OVERALL SIGNIFICANCE
- The study addresses a significant issue of policy relevance.
- Evaluation findings are likely to be useful; the study’s usefulness to the stakeholder audience(s) is described.
- CONCEPTUAL CRITERIA
- Conceptual Foundations
- The study is based on either theory, or conceptual models; it builds on previous relevant research, is logically based on previous findings (as appropriate), cites relevant literature, or otherwise justifies the study focus and utility.2
- The program and/or study assumptions are stated.
- When the report is linked with a program, policy, or issue, the report describes this context.
- The timing is appropriate because the program/policy/issue is ready for study; evaluation methods used are appropriate to the program/policy/issue stage.3
- Study Questions
- The research questions addressed are clearly stated, measurable, and clearly specified.4
- The questions are feasible, significant, linked to the program or issue, appropriate for the resources and audience, and derive logically from the conceptual foundations.
- Conceptual Foundations
- Evaluation/Study Design
- Design considerations should include all of the following deemed appropriate: feasibility, funding and time constraints, generalizability5, applicability for cultural diversity, assessment of program delivery, validity, feasibility for data collection, reliability of selected measurements, use of multiple measures of key concepts, appropriateness of the sample, and assessment of statistical power prior to data collection.
- Variables and/or methods used are clearly specified and fit with the questions and concepts.
- The design matches the study questions; the design permits measurement of program implementation, as appropriate.
- Multiple methods are used as appropriate and should be well integrated to support triangulation of results.6
- Data Collection
- Data collection includes: a data collection plan; data collection that is faithful to the plan; attention to and cooperation with the relevant community; data integrity; project confidentiality; ethical handling of data; and consistency.7
- Data are collected appropriate to the evaluation questions, use appropriate units of measurement, and appropriately handle missing data and attrition.8
- The quality of the data (including the quality of extant data sets used in the study), training of interviewers/data collection staff, and justification of the sampling frame are addressed.
- Data Analysis
- The data analysis addresses: handling of attrition, matching of the analysis to the design, use of appropriate statistical controls and techniques, use of methodology and levels of measurement appropriate to the type of data, estimation of effect size, and confidence intervals.9
- If multiple methods are used to collect data on the same phenomenon, the data are compared to assess the extent to which the findings are consistent (triangulation).
- The analysis shows sensitivity to cultural categories, if applicable.
- The analysis procedures are appropriate and correctly applied.10
- Evaluation/Study Design
- INTERPRETATION OF RESULTS
- The study questions are answered/addressed, or if not, explanation is provided.
- The interpretation of results is linked to and assesses the study’s conceptual foundation.11
- The report notes that the findings are either consistent with or deviate from the relevant literature and findings in the field.12
- Program implementation is assessed, as appropriate.13
- Generalizability inferences are addressed.
- The summary does not go beyond what the data will support; conclusions are justified by the analyses; qualifiers are stated as needed.
- Equivocal findings are addressed appropriately.
- The results have practical significance.
- The programmatic/policy recommendations follow from findings; are worth carrying out; are affordable, timely, feasible, useful and appropriate; and are relevant to stakeholders.
- Any recommendations for future studies, and/or program improvements are clearly presented and justified.14
- CROSS-CUTTING FACTORS
- The following are cross-cutting factors that are likely to be important at all stages of a report: presentation is well written, clear and understandable; use of state-of-the-art approaches where possible; methodology is clearly described and defended; innovative, efficient, logical relationships, and discussion of the report’s limitations. The report presents multiple perspectives, as appropriate; relevant stakeholders are consulted and involved. The report should also address ethical issues, possible perceptual bias, cultural diversity, and gaps in study execution.
2 This element may be less relevant to Program Management and Performance Measurement studies.
3 For Program Effectiveness studies, timing should be appropriate to implementation stage; for Performance Measurement projects, timing should be appropriate to establishment of strategic goals and objectives; for Environmental Assessments, timing should be appropriate due to changing or uncertain environments; for Program Management studies, timing should be appropriate to management priorities.
4 Program Effectiveness studies should specify research questions and testable hypotheses.
5 For Performance Measurement and Program Management studies, generalizability is not a key consideration, given study design and foci for these study types.
6 This element may be less relevant to Performance Measurement studies.
7 Program Effectiveness and Program Management studies that examine person-level outcomes should also address use of an appropriate comparison or control group; and adequate sample size, response rate, and information about the sample.
8 Program Effectiveness studies and Environmental Assessments that address person-level outcomes should control for participant selection and assignment bias.
9 For Program Management studies that may use other secondary data and methods, data analysis should be appropriate to the kinds of data collected. Estimation of effect size may not apply to Environmental Assessments, where results may be descriptive.
10 For Program Effectiveness studies, appropriate sensitivity analyses are conducted, and uncertainty in key parameter estimates is addressed, as appropriate.
11 This element may not apply to Program Management studies, given their focus and design.
12 This element may not apply to Program Management studies, which may not have a conceptual foundation in the research literature.
13 This element may not apply to Environmental Assessments, which do not directly assess program implementation issues.
14 Performance Measurement studies should consider the context of the strategic goals and objectives of interest to the study.