Grouping and classifying entities is a hallmark of scientific inquiry. The goal of any subgrouping method is to divide a single population into meaningful subpopulations, each consisting of members who are as similar as possible to each other along one or more dimensions and who are as different as possible from all other subpopulations. In statistical terms, subgrouping methods seek to "minimize within-group variance while maximizing between-groups variance" (Bailey 1994). Therefore, an effective subgrouping strategy is one that produces subgroups that differ on the outcome of interest or, in the context of program evaluation, one that produces subgroups in which different levels of program impacts may be expected.
Defining subgroups for use in impact evaluations using an experimental design involves selecting one or more variables along which groups will be subdivided, and assigning individuals to a subgroup based on baseline (pre-random assignment) scores on these variables. For example, if impacts are hypothesized to differ according to whether or not an individual has a high school diploma (or its equivalent), then educational attainment must be collected at baseline and dichotomously coded as "diploma/GED" and "no diploma/GED"-individuals are then assigned to either the "diploma/GED" or "no diploma/GED" subgroup. The choice of variables and decisions regarding how to assign individuals to subgroups based on their scores on these variables will have major implications for the validity and predictive utility of the resulting subgroups.
In program-evaluation research, demographic factors reflecting qualitatively different statuses (such as custodial/non-custodial fathers) or characteristics (such as whites/Hispanics/African-Americans) have traditionally been used to create subgroups. Increasingly, however, variables reflecting an individual's risk level and/or service needs are being used to create low-risk and high-risk subgroups. The underlying assumption is that those with certain characteristics or with greater risks and needs may require more intensive services, or a wider array of services, or a certain package of services-and that a program tailored to particular characteristics and needs will be more effective. But which characteristics, needs, and risks are likely important for distinguishing subgroups who may respond differently to intervention? Is it useful to define low risk versus high risk on one or more dimensions? Does it also make sense to think about moderate risk? How can conceptually-relevant characteristics, risks, and service needs be adequately captured with valid and reliable measures? When defining valid and reliable subgroups for impact analyses, a critical first step is making sure that individuals are being grouped together according to the correct characteristics.
Once key constructs have been identified and measured, researchers need to decide how they will assign individuals to subgroups based on these characteristics.
1. The single-factor approach uses a single variable to divide individuals into mutually exclusive subgroups. Subgroups may reflect degree of risk (such as high school graduates and non-graduates) or may simply be qualitatively different (such as whites, Hispanics, and African-Americans).
2. The additive-risk approach sums across multiple variables then divides individuals into mutually exclusive subgroups according to the number or severity of cumulative risks. For example, risks across multiple domain-such as education, employment, and family relationships-may be summed, and impacts examined for those with few risks and those with many risks (however operationalized). This approach relies on variables conceptualized specifically as risk factors.
3. The interactive approach also considers multiple variables, but it focuses on the co-occurrence of certain factors, and these factors may or may not reflect risks. Individuals sharing similar profiles on these variables are then assigned to the same subgroup.
Below we describe these approaches in greater detail, discussing the implicit assumption underlying each approach and the mechanics involved in creating subgroups for each approach, providing additional examples. We also pose some general hypotheses regarding program impacts that may be expected among low-, moderate-, and high-risk subgroups defined using either the single- or additive-risk approach. (Because hypotheses regarding program impacts in subgroups defined using the interactive approach are specific to the variables used, no generic hypotheses are presented.)
1. Single-Factor Approach
Researchers often use single categorical variables to create subgroups. For example, it may be of interest to learn whether a fatherhood program is effective for both custodial and non-custodial fathers. Custodial status would be the subgrouping variable, and subgroup impact analyses would examine whether custodial fathers invited to participate in the program group fare better than custodial fathers who were not offered the program, and whether non-custodial fathers invited to participate in the program group fare better than non-custodial fathers who were not offered the program.
Researchers have also defined subgroups using continuous variables that naturally lend themselves to meaningful categorization. For example, years of schooling is often categorized into high school graduates versus non-graduates because the distinction predicts important outcomes, such as employment and earnings, and may therefore serve to differentiate who may benefit (or benefit most) from program intervention. Even for some non-demographic characteristics, empirically meaningful cut-offs may be available. For example, scores on a clinically validated checklist of depressive symptoms (Center for Epidemiological Studies-Depression Scale; Radloff 1977) is often used to categorize individuals at low, moderate, or high risk for clinical depression (Devins and Orme 1985). For continuous variables in which theoretically or empirically meaningful cut-offs are not immediately obvious, researchers either select what they view as cutoffs with face validity (for example, those who agree or strongly agree with a statement) or use data-driven considerations (such as median splits, or upper and lower quartiles) to create subgroups.
2. Additive-Risk Approach
Researchers using multiple variables to define subgroups have the added task of deciding how to combine these variables to yield meaningful subgroups The implicit assumption underlying the additive-risk approach is one of cumulative risk-that is, those with a greater number and/or severity of risks are at greater overall risk for poorer outcomes than those with fewer and/or less severe risks.
The additive-risk approach and the single-factor approach to examining a variable conceptualized as a risk factor involve assigning individuals to low-, high-, and-if a trichotomy of risk is considered-moderate-risk subgroups. For example, researchers may count up the number of barriers to employment faced by low-income parents (for example, limited education and work history, physical or mental health problems, and logistical barriers, such as lack of child care and transportation), then assign individuals to either a "many-barriers" or a "few-barriers" subgroup (or even a "no-barriers" subgroup). Program impacts are then examined within each complementary subgroup-for example, "Is the program effective among those with many barriers to employment and for those with few barriers to employment?" Differential impacts can also be examined-for example, "Is the program more effective among those with many or few barriers to employment?" Program evaluators typically examine one or more of the following hypotheses regarding the nature of impacts found in low-, moderate-, and high-risk subgroups.
a. Low risk
· Hypothesis 1. Creaming. Creaming refers to skimming the best clients off the top in an effort to produce better outcomes. However, it does not necessarily follow that the best clients will benefit most from intervention (for the reasons described in Section A above). Despite a relatively low need for services, it is possible that those with fewer and/or less severe risks may still benefit from program services and may therefore fare better than those in the low-risk control group. In this case, there may, in fact, be positive impacts among a low-risk subgroup.
· Hypothesis 2. Irrelevant. On the other hand, those with fewer or less severe risks may not need the program and, if they do participate, the low-risk program group may do no better than a low-risk control group. Thus, there may be few, weak, or no impacts among a low-risk subgroup.
· Hypothesis 3. Counterproductive. Those with fewer or less severe risks may do better on their own, or the program may have unintended negative impacts. In either case, a low-risk control group might actually do better than a low-risk program group, resulting in negative impacts among a low-risk subgroup.
b. High risk
· Hypothesis 1. Compensatory. This refers to the assumption that program services help compensate for client risks. High-risk individuals who need and are offered program services may do better than what a high-risk control group could do on its own. Thus, the most or strongest program impacts may be among high-risk subgroups.
· Hypothesis 2. Poor program match. If the program does not meet the needs of high-risk individuals, then a high-risk program group may do no better than what a high-risk control group could manage to do on its own. Thus, no impacts may be expected.
· Hypothesis 3. Overwhelmed. If risks are so numerous or severe that high-risk individuals cannot participate fully in program services designed to meet their needs, then a high-risk program group may do no better than what a high-risk control group could manage to do on its own. Thus, no impacts may be expected.
· Hypothesis 4. Exacerbate risks. If the program inadvertently exacerbates existing problems and, as a result, a high-risk program group actually does worse than what a high-risk control group would do on its own, then the program may have negative impacts among high-risk subgroups.
c. Moderate risk
· Hypothesis 1. Goldilocks hypothesis. This refers to program benefits for those with just the right amount of risk-not so much that individuals cannot participate or that program services cannot meet their many needs, but not so few that services are irrelevant. It may be that those with a moderate number or severity of risks have the most to gain from program participation. With the benefit of program services, a moderate-risk program group may do better than what a moderate-risk control group could do on its own. Thus, we might expect the most and/or the strongest program impacts among a moderate-risk subgroup.
· Hypothesis 2. Poor program match. As with high-risk individuals, if the program does not meet the needs of moderate-risk individuals, then such a program group may do no better than what a moderate-risk control group could to do on its own. Thus, no impacts may be expected among moderate-risk individuals.
· Hypothesis 3. Exacerbate risks. As with high-risk individuals, a program may inadvertently exacerbate whatever problems may exist among moderate-risk individuals. If so, then such a program group might do worse than what a moderate-risk control group would do on its own, resulting in negative impacts among moderate-risk subgroups.
3. Interactive Approach
The interactive approach does not seek to create low-, medium-, or high-risk subgroups a priori but, rather, it seeks to identify subgroups of individuals who share certain characteristics hypothesized to shape if and how they benefit from a program. For example, it may be that only those individuals who both need and want the program may benefit. Because the interactive approach examines the constellation of a number of factors, researchers can simultaneously consider not only risk factors or service needs, but also protective factors-individual or family strengths and supports-that may buffer the negative effect of the "deficits" reflected in risk factors and service needs. The interactive approach posits that the meaning of any single factor depends upon the presence and constellation of other factors. For example, in developing a typology of parenting in a sample of African American single mothers receiving welfare, McGroder (2000) employed cluster analysis and found four distinct patterns of parenting distinguished by mothers' levels of cognitive stimulation, nurturance, and aggravation in the parenting role-including a pattern characterized by both high aggravation and high nurturance, a combination not typically thought to co-occur, but whose unique combination predicted differential outcomes for children.
Researchers adopting an interactive approach may have an idea of which variables are important to consider, but they may or may not have a priori hypotheses about which combination of variables might matter or matter most. These researchers may adopt a more data-driven approach, selecting variables hypothesized to matter (as above) but then allowing a clustering algorithm to identify naturally occurring subgroups of individuals who share similar profiles of scores across clustering variables but whose profiles differ from those of individuals in the other subgroups. For example, McGroder and colleagues (2003) found that employment outcomes among single, welfare-receiving mothers were as problematic in the subgroup characterized by health problems but no other barriers as in the subgroup characterized by multiple barriers to employment, including depressive symptoms and limited education, literacy, and employment experience.
Data-driven approaches to defining subgroups require identifying baseline characteristics hypothesized to influence the likelihood that fathers benefit from program services, then subjecting these variables to a clustering algorithm that combines individuals who are similar along each of these dimensions. This approach is especially fruitful when there is little or no theory or empirical evidence to suggest exactly how these variables should be categorized or combined into subgroups.
Variables selected for creating subgroups in program evaluation research should be expected to differentiate who will benefit, or benefit most, from intervention. Single variables can be used to define subgroups, but a more differentiated view of individuals can be obtained if multiple variables are used. In addition, while demographic variables can prove fruitful in identifying subgroups of individuals who may especially benefit from intervention, there likely remains important differences among demographically similar individuals-differences in psychosocial characteristics (such as motivation and readiness to change) and life circumstances (such as access to social support)-that may affect the likelihood they benefit from intervention. Additive-risk approaches produce subgroups that differ in the quantity of risk, so these approaches are warranted if the number or severity of risks is hypothesized to shape program impacts. Under various scenarios, program impacts may be positive, negative, or neutral in each risk subgroup. Interactive approaches produce qualitatively different subgroups. Interactive approaches are warranted if a particular combination of risks (needs) or risk and protective factors are expected to matter in shaping if, how, and how much an individual benefits from a program, compared to what his or her similarly situated counterparts would do on their own. As with single-factor and additive-risk approaches, impacts can also be positive, negative, or neutral in subgroups derived using interactive approaches. In Section VIII, we discuss how these approaches may be applied to examining subgroup impacts of fatherhood programs.