Inside the Black Box of Interactions Between Programs and Participants: Re-conceptualizing Subgroups for Fatherhood Program Evaluation. APPENDIX G: ROUNDTABLE SUMMARY


A. Project Purpose

The purpose of the Black Box project was to identify innovative approaches-focusing on psychosocial factors grounded in behavior change theories-for defining subgroups of men who may benefit from fatherhood programs. A key project objective was to disseminate study findings at a roundtable in which federal and non-federal experts in program evaluation, fatherhood research, and fatherhood programs and policy discuss implications for DHHS evaluation projects focused on low-income fathers. The ultimate goal is to improve future evaluations of fatherhood programs.

B. Overview of Project Tasks

The research team at Mathematica Policy Research: (1) conducted a scan of cutting-edge approaches to defining subgroups; (2) reviewed the behavioral sciences literature to identify psychosocial determinants of behavior change; (3) reviewed the extent to which these theoretically-relevant psychosocial factors have been examined in the literature on low-income fathers; (4) compiled a list of measures used to tap these psychosocial constructs; (5) summarized findings in a written report; and (6) convened the roundtable of experts to discuss study findings and implications for fatherhood programs and program evaluations.

C. Participant List

The following federal and non-federal experts participated in the roundtable:



Non-Federal Experts

Héctor Cordero-Gúzman

Baruch College, City University of New York

Robin Dion

Mathematica Policy Research

Derek Griffith

Vanderbilt University

Joe Jones

Center for Urban Families

Charles Michalopoulos


Ron Mincy

School of Social Work, Columbia University

David Pate

University of Wisconsin - Milwaukee

Elaine Sorensen

Urban Institute

Matthew Stagner

Chapin Hall, The University of Chicago

Brett Theodos

Urban Institute

Federal Experts

Vicki Turetsky


Earl Johnson


Robin McDonald


Frank Fuentes


John Tambornino


Nancye Campbell


Linda Mellgren


Seth Chamberlain


Kimberly Clum



D. Meeting Proceedings

Below we present a summary of the roundtable discussion, organized according to key themes that were explicitly addressed or that emerged during the roundtable.

1.    Federal and Programmatic Landscape

A practitioner opened the discussion by asking about how a change in administration might affect priorities relating to fatherhood projects, research, and the focus of the fatherhood initiative, and what the implications may be for practitioners. Three federal experts who are long-time federal employees and experts in fatherhood research and policy all agreed that fatherhood is not a particularly politically-sensitive topic. Fatherhood programs have been a priority since the Clinton Administration, and this persists today.

The practitioner indicated that the significance for the field is great, especially given that federal funding for fatherhood programs has declined during past administration changes, and that philanthropy has changed priorities in recent years as their investments have declined, which has meant that work in fatherhood had really regressed. He noted the importance of building a knowledge base starting with the most recent round of fatherhood grantees.

This practitioner also emphasized that if our fatherhood program models are to be sustainable, we need to demonstrate that a child is better off if public dollars are invested in that child's father and that father's well-being. In particular, we need to figure out a way for programs to create opportunities for father-child interactions and a way to measure father involvement, otherwise the current business model won't work.

A federal expert agreed. She said that this is the problem with the child support model. It incorporates economic self-sufficiency and cash, but father involvement is not part of the model, because it's easier to measure income and employment. Funding tends to be tied to these more easily measurable outcomes.

2.    Clarifying Goals of Black Box Study

Early on, an evaluator asked for clarification on goals of the study-specifically, whether the goal was to inform future evaluations in terms of what researchers and evaluators should be measuring at baseline, or to inform practitioners on how to target their programs in more effective ways.

Black Box Project Director, Sharon McGroder, said that whereas the primary audience for the project and its findings is evaluators, our findings are important for academic researchers and for practitioners. We would hope that practitioners will find the information useful in designing and implementing more effective programs and in targeting their programs more efficiently.

3.    Value of Examining Subgroup Impacts in Program Evaluation Research

Experts agreed that there is value to examining subgroup impacts in program evaluation research, despite procedural and technical challenges in doing so (see Section 6). In response to Sharon McGroder's presentation of subgroup findings from the Strengthening Families Evidence Review (SFER; Avellar et al., 2011; Avellar et al., 2012), a researcher indicated that it would be useful to know the number of interventions in SFER for which there were overall impacts; he suspected that there weren't many, which is why subgroups weren't examined in many of the studies included in SFER. He lamented the standard practice of examining subgroup impacts only if aggregate impacts have been found. An evaluator agreed that this is standard practice, adding that studies finding no aggregate impacts may be statistically underpowered, in which case there would be insufficient power to detect subgroup impacts.

Sharon explained that even if research finds no aggregate impacts, there may be evidence of impacts for a particular subgroup, such as isolated or off-setting subgroup impacts. She agreed that while it was important to continue to build the research base on what constitutes an effective program (a point also made by a federal expert who was not able to attend the roundtable but who provided written feedback on the Synthesis Report), we need not wait for a strong evidence base of which programs are effective overall prior to examining subgroups, because a program may show signs of effectiveness for particular subgroups even though they may not demonstrate impacts for their overall population.

A researcher picked up on this point. Reflecting on findings from Parents Fair Share, this researcher felt that it was a "tragedy of dissemination" that the evidence of positive impacts for severely disadvantaged men was not emphasized, overshadowed by the well-publicized storyline of no impacts for the sample as a whole. An evaluator emphasized the importance of developing theory-based subgroups in program evaluation research, rather than just looking at everything. He cautioned against looking too hard for subgroups impacts and that the "kitchen sink" approach of looking at dozens of subgroups would yield results simply due to chance alone, so he questioned whether these findings from multiple subgroup analyses are to be believed. The researcher noted that in Parents Fair Share, there was in fact a theoretical basis for examining whether the program was effective for the most vulnerable men.

Sharon noted that moving the field toward the development of theory-based subgroups is a key goal of the Black Box project. She speculated that there are probably more impacts in subgroups if we only knew what subgroups to look at, rather than just throwing everything at the wall and hoping something sticks. Sharon also pointed out that if, in addition to considering studies whose samples we slice into subgroups, we considered impact studies involving narrow subpopulations of low-income men-such as African-American noncustodial fathers-there may be more "subgroup" impacts than we realize.

A federal expert resonated with this point. He noted that the Black Box Synthesis Report did not present the racial composition for the various studies reviewed, but he assumed that most of the low-income studies involved men of color, particularly African-American men. If men of color did comprise the majority in these samples, then the main "subgroup" is defined by race/ethnicity. And if this is true, he noted, we would need a different theoretical framework and race analysis to identify psychosocial factors germane to men of color.

A policy researcher added that it is important to distinguish pre-existing theory from a theory that emerges out of a particular study and that, because the field is still nascent, there is not a lot of pre-existing theory on who would likely most benefit from services. A federal expert concurred that fatherhood program evaluation is still in the formative stages, and that qualitative work during a study can lead researchers to develop hypotheses about who is and is not responding to intervention and why.

This point about selecting subgroups based on what emerges during the evaluation led to a discussion of the utility of relying on baseline characteristics as subgrouping variables. A practitioner indicated that he and his staff used to think they could identify at program entry who would and would not be successful, but they were often proved wrong. For example, they thought that fathers who had a support network of friends and family would do well, but they found that it was the fathers who lacked support who tended to respond best to his program. He hypothesized that it was precisely because the program was the only source of support for these men that they valued participating and responded to services.

A researcher likewise questioned the predictive utility of baseline characteristics, given the example of men who are not entirely ready to make changes when they enter a program, but because of what they receive in the program, their readiness is "turned on" and they are then ready to benefit from program services.

A federal expert provided another example of how what happens during the program can lead to different outcomes. She suggested that in large-scale demonstration studies, one can examine differences in outcomes according to differences in treatment quality, noting that in Parents' Fair Share, treatment quality showed up "in a subgroup way" and could be important in interpreting program impacts.

Sharon pointed out that examining subgroups defined by post-baseline variables-such as changes in readiness or social support, and varying levels of program quality-are not experimental analyses, so the causal evidence regarding subgroup program impacts is not strong. Nevertheless, the experts agreed that it was important to understand fathers' program experiences in an effort to identify "active ingredients" and the pathways through which a program may have affected key outcomes.

4.    Informing Subgroups with Impact vs. Outcomes Studies

To set the stage for the discussion of subgroup impacts, Sharon's "Getting on the Same Page" presentation clarified the distinction between outcomes and impacts. She also referred to outcomes when presenting study methods, noting that a key premise of the Black Box study was that baseline levels on outcomes of interest may serve as good subgrouping variables. Another underlying premise was that psychosocial predictors of behavior change may also serve as good subgrouping variables, so a key study goal was to examine the extent to which theoretically-relevant "determinants" of behavior and behavior change predicted fatherhood-related outcomes among low-income men.

Pia Caronongan's subsequent presentation of findings from this review of the literature prompted a discussion about the validity of the underlying premise that predictors of fatherhood-related outcomes were good candidates for subgrouping variables. A policy researcher pointed out that while the report distinguished outcomes from impacts, the bulk of the report (the literature review) focused on outcomes research. An evaluator also appreciated that we were careful to distinguish outcomes from impacts but was likewise curious as to why the report focused on outcomes, and he suggested dropping this chapter from the report. He argued that whereas outcomes tell you whether an individual needs services, they may be irrelevant when trying to identify who benefits from intervention or how best to target programs. Instead, he suggested focusing the search for subgroups on the subgroups that have already been examined in program evaluation research, even from non-experimental evaluations and SFER studies rated as moderate or lower in quality. A researcher echoed this point, wondering whether we should have simply reviewed impact studies rather than outcome studies. Sharon noted that a review of outcomes studies allowed us to examine the kinds of psychosocial factors already addressed in the fatherhood literature, even if the findings themselves don't necessarily suggest whether these variables would make good subgroups.

Also addressing this point, Federal Project Officer Kimberly Clum said that, in some ways, we've gotten stuck in how we think about serving subgroups; we rely on demographic characteristics and don't consider things that are heterogeneous in the people's experiences. For example, for the key outcome of economic self-sufficiency, research tends to focus on whether or not men had a job at the outset of the study. Kim argued that the literature is very narrow in how it views subgroups of men, ignoring such factors as larger social structures, inequality, norms, and peer group support. So the overarching goal of the Black Box project, Kim noted, was to look beyond the demographic characteristics typically examined in program evaluations and consider whether additional constructs-constructs reflecting psychosocial factors-were worth pursuing. On this point, there was wide agreement that this was a useful direction for future research.

5.    Informing Subgroups with Qualitative and Small-Scale Studies

Early in the Roundtable, many experts expressed discomfort over what they viewed as a premature attempt to extrapolate Black Box study findings to experimental evaluations of fatherhood programs. A policy researcher noted that it is a longer-term process because there remains much to learn about fathers in fatherhood programs and what leads to changed behavior. This policy researcher characterized the dilemma as trying to fast-forward to stage 8 when our knowledge base is only at stage 2. He suggested that stages 3 through 7 involves a series of qualitative and small-scale studies exploring the theoretically-derived psychosocial factors such as those identified in the Black Box study. He reiterated his earlier point that qualitative studies can help generate theories, which should occur prior to testing theories in large-scale impact studies.

A researcher concurred that we need smaller-scale studies not only to address the question currently before the group (i.e., what non-demographic variables may be important for defining baseline subgroups), but also to improve the science of fatherhood program research in general. He noted that for 30 years, scaling up of human service interventions occurred without attention to whether the programs were grounded in theories of human behavior. He likewise advocated for smaller-scale and qualitative studies of fatherhood programs to help develop the evidence base before ramping them up to scale.

This researcher then gave an example of what he learned from a process study of how a fatherhood program helps fathers navigate the child support system. His research team found that fathers do not make immediate use of program information on how to request a modification to child support orders but, rather, waited until they had a job. Through in-depth discussions with fathers, researchers learned that fathers thought the judge would be more responsive to a request for modification if he saw the father taking steps to improve his situation. This finding led the research team to propose altering the sequence of program activities, providing child support information at a time when men were most likely to find the information useful.

A second researcher likewise highlighted the value of process studies for addressing the big question not posed by the Black Box study: Why does this program work? He emphasized the importance of basic needs assessment information and our current lack of understanding of what fathers want in a program and what they say they need.

Picking up on the theme of identifying fathers' needs, a third researcher advocated for the use of community-based participatory research methods (in which study "subjects" are collaborative partners in the research endeavor). He provided an example from his research on men who present to WIA office, many of whom experienced trauma in childhood. Convinced he and his research team were "missing something," they are now exploring what race, gender, and masculinity means to these men and how it relates to their ability to obtain and maintain employment.

Another researcher suggested that small-scale studies allow you to explore how best to measure novel topics prior to investing in large-scale evaluations. For example, if we suspect that readiness to change is affected by the program and is a proximal outcome that may need to change before more distal outcomes can be expected to change, then a small-scale study would allow us to examine various ways to measure readiness to change and explore the various pathways hypothesized to matter.

A federal expert resonated with this call for more qualitative and smaller-scale studies. He reflected that only after William J. Wilson received funding to explore the "marriageability of men"-a concept Wilson hypothesized as critical to the declining marriage rates among African-Americans-was he able to study this construct empirically. His subsequent book, "When Work Disappears," helped this field progress. This federal expert further shared his concern that not all federally-funded grant programs may be ready and/or appropriate for random assignment evaluation; another federal expert agreed that fatherhood programs are in the formative stage and that they may be subject to the imperative to conduct random assignment evaluation too soon in their development.

6.    Feasibility of Examining Subgroup Impacts in Program Evaluation Research

Roundtable experts were quick to note the technical difficulties in examining subgroup impacts. A researcher recognized that the exercise we were asking roundtable participants to engage in-defining subgroups using psychosocial variables-needed to include a discussion of sample size and power. Relatedly, one needs to know what effect size might be expected for various subgroups so that the study design can be powered to detect impacts in those hypothesized subgroups. Sharon McGroder reflected that identifying minimum detectable effects (MDEs) requires doing the research and reflecting after the fact on whether the study may have been underpowered, and using this information to generate hypotheses and MDEs for the next study.

An evaluator concurred that sample sizes in subgroups and questions surrounding measurement (for example, identification or development of valid and reliable scales, and questions about the timing of measurement for certain constructs) are key challenges in subgroup impact research. This evaluator reflected that measurement challenges may be one reason for the focus on demographic rather than psychosocial variables. She also voiced the concern that an interactive subgrouping approach involving multiple variables would yield small cell sizes, though Sharon noted that profile analyses (such as latent class analysis or cluster analysis) can address this problem. Despite these challenges, this evaluator agreed that there is value to thinking about how best to refine subgroups to make them more meaningful so we can learn more about who tends to benefit from programs.

Another evaluator pointed out that the best way to test scales and metrics and the best way to do subgrouping research is to actually do the research and learn from the findings.

In addition to the technical challenges in conducting subgroup impact analyses, experts identified tensions between the goals of evaluators and the goals of providers that have implications for subgroup impact analyses. A policy researcher mentioned the tension between the pressures faced by programs to demonstrate results and evaluators pushing for casting a wide enough net to yield large enough samples. He noted that practitioners tend to know who is best suited for their program, but when evaluators encourage them to cast a wider net to yield larger study samples, this can dilute impacts, and that maybe evaluators need to listen more to practitioners on this matter. An evaluator echoed this tension, noting that for the Building Strong Families (BSF) evaluation, programs felt pressure to meet sample size requirement and ended up enrolling people they suspected were unlikely to participate. So while larger sample sizes are needed to detect subgroup impacts, if many of the people being recruited aren't motivated or "ready to change," it is unclear whether the impact findings (or lack thereof) are meaningful.

A federal expert noted another tension. When federally-funded grantees propose serving narrow populations but then can't engage enough of these kinds of participants, they are not meeting funding requirements and may be required to broaden who they serve to reach service targets. In this case, it is not evaluators but funders who are driving the "wider net." This federal expert noted that this tension stems from the authorizing legislation, which defines the services to be provided, requiring providers to seek out the kinds of participants they think can benefit.

7.    Cultural and Racial Frameworks

There was a lengthy discussion on the need to develop and apply a theoretical framework germane to the lives of men of color. A federal expert argued for a better and more nuanced representation of race in subgrouping research. He pointed to the importance of such factors as religiosity, stress, and trauma, which are often closely tied to race and ethnicity. A policy researcher agreed with the need for a theoretical framework that reflected the culture of men being studied, but he acknowledged that Mathematica researchers were hamstrung in doing so for the Black Box study, as they were stuck with the theoretical frameworks used by researchers whose studies they reviewed.

Sharon McGroder echoed the need for culturally-relevant theoretical frameworks in studying fathers and fatherhood programs but wondered whether the psychosocial constructs and/or broader categories presented in the Black Box report-such as social norms, and identity as a man and as a father-were worth exploring among men of color as possible factors that could shape their response to a program. She explained that ASPE's vision was to draw from other disciplines, such as public health and social marketing, to see if there were relevant concepts we should consider in defining subgroups of men in fatherhood programs, and she asked whether there was value in drawing from these fields.

A federal expert agreed there was value. But he noted that while many of the factors such as relationship with child's mother and stressors may be common, they will play out differently and will be influenced by a whole host of different factors-for example, stress related to living in a community that doesn't want you there. One issue not addressed, this federal expert noted, is culture and language. We need to think about these types of factors, he argued, to add texture in terms of the realities of the experiences of low-income men. It is these factors, this federal expert argued, that will determine how ready a father is and what his motivation is to participate in a fatherhood program. These cultural issues also influence what service providers offer, who offers it, and the curricula being used.

Another federal expert noted that this requires researchers to get out of their comfort zone, and he wondered whether researchers examined religiosity and whether that plays a role in fathers' being successful. Sharon indicated that a few studies reviewed for the Black Box study did examine religiosity.

A researcher emphasized the importance of using a critical racial lens and provided an example from his work with a WIA agency in Milwaukee. He described his research, which focuses on the effect of policing in the areas where the target population works and lives, and how the specific community experience of racial and ethnic minorities (such as lack of civil liberties, racial profiling, and legal barriers) may affect program response or participation.

Another researcher provided an example from his research on incarcerated fathers in a parenting program. He found that some aspects of the curriculum were more salient than others, while other parts of the curriculum were disregarded altogether. For example, a lot of men were not willing to participate in the part of the curriculum that discouraged fighting because in their communities, they thought it would make their children less safe. This researcher noted that because this was a community-based participatory research study, this issue was identified early on and was used to tailor program messages to the needs of the fathers and their communities.

A practitioner concurred that violence and stress is a huge issue, particularly for minority men; they are looking over their shoulder 24/7. He added that practitioners don't have the capacity to assess some of the specific mental health issues these men face.

A federal expert agreed, adding that men of particular racial or ethnic backgrounds often experience isolation and anxiety, even about small things. For example, taking his kids to school may be cause for stress and anxiety for a Mexican-American father in Arizona; even if he has documentation, he may be worried about being stopped and interrogated. These are very specific kinds of stress and anxiety related to his experience as a man of color.

Sharon McGroder reflected that it sounded like we need to better understand different aspects of a "community," such as safety concerns, legal issues, policing, social support, community networks, and community institutions or infrastructure. These could then be used to define a more nuanced theoretical framework from which specific subgroups could be derived.

A researcher pointed out that sometimes demographic characteristics, like race and ethnicity, are really proxies for other things such as environmental stressors, and he recommended a useful paper on this topic.[10] He argued that if we think more about what these demographic variables mean, we might be able to do a better job of defining, measuring, and using them in research.

8.    Value of Examining Psychosocial Factors

After the fruitful discussion about the need to develop and adopt conceptual frameworks reflective of the lives of men of color, and the woeful lack of such a framework in the extant literature on low-income men, the conversation shifted to whether it was nevertheless valuable to consider psychosocial factors as possible subgrouping variables. An evaluator remarked that, not sitting in the fatherhood field, he thought the Black Box study is a more important accomplishment than what he was hearing from other experts. He noted that approaches to subgroup analyses are not well-developed in the social sciences, so this report and expert roundtable is important for pushing the thinking forward. He acknowledged that progress would be "haphazard and messy" and that it may take years of "letting a thousand flowers bloom" to develop the theoretical frameworks and methodologies to define better subgroups, but at least he personally was provoked to think more about psychosocial factors in defining subgroups. The other experts agreed with this assessment.

Many roundtable experts indicated that readiness to change was an important factor worth considering. A researcher gave an example of a study of placement of children in foster care. He noted that when the judge was about to take child from home, family preservation programs were much more effective because that's when it really mattered to the parents. Then all the things they were telling the parents about keeping their child safe-they finally got it. Otherwise, they didn't engage in family preservation activities. He reflected that reaching this threshold, parents were ready to change because only then was the threat of losing their children real and imminent and, thus, what they had been learning about child safety suddenly became salient.

A policy researcher noted that he was the federal project officer on this study, and he recommended taking a look at the research by Julia Littell from Bryn Mawr who has studied readiness to change issues, as well as John Schuerman.[11]

A researcher reiterated the example from his study of how a fatherhood program he evaluated helps fathers navigate the child support system, in which many fathers requested an order modification only after they secured employment, believing that only then would the judge be responsive to the request. This suggested to this researcher and his research team that readiness to make use of information and program services is a key precursor to behavior change.

Citing self-determination theory, another researcher noted the importance of motivation in behavior change-especially the role of intrinsic motivation (motivation coming from within the individual) over extrinsic motivation (external rewards and punishments designed to induce behavior). He also noted that motivation and readiness to change is a "moving target," that it is important not only for individuals to begin behavior change but also to sustain behavior change.

A practitioner resonated with the point that readiness to change can be a moving target, reflecting that knowing a father's readiness at baseline may not help you understand who may especially benefit from intervention. He said that he and his staff would sometimes try to predict who would complete the program but were often surprised by who ended up as a success story. Many men who enter the program at highest risk were precisely the men who most benefited from the program because of the change process that occurred in the program. The practitioner illustrated this point by noting that in his work with low-income African American men, many of the young men often miss the maturity process that results from having a father or father figure orient them to boyhood and manhood, so they lacked order and discipline. When they walk into his program, they are not necessarily looking for structure but, the practitioner notes, if you create a space that allows them to get exposed to structure, many of them find it attractive, which then helps them evolve and increase their readiness for services. That's why he doesn't necessarily only recruit men who are ready from the beginning; he reaches out to the most disaffected men even if it means losing some of them, because many of them don't know what they want or need until they get exposure to it.

The conversation then moved beyond a discussion of the value of using psychosocial factors in creating baseline subgroups to a discussion of their potential value in targeting and designing interventions.

For example, an evaluator suggested that identifying key predictors of likely take-up can be particularly helpful in cases when services are oversubscribed and strategic targeting would be helpful. This evaluator noted that program folks currently are not able to base many of their decisions about targeting on research, but they are hungry for more to guide them in their practice.

Regarding the design of interventions, a federal expert picked up on a researcher's earlier point that readiness to change can be affected by a program, and she suggested that if a program's goal is to change behavior, and readiness to change is necessary to change behavior, then maybe we need interventions designed to change readiness to change. On this point, another federal expert referenced research from the fields of criminal justice and substance abuse, describing a re-analysis of data from the Serious and Violent Offender Re-entry Initiative (SVORI) demonstration, which targeted readiness to change itself as a proximal outcome.[12] She indicated that there is now increased focus on programs that first affect readiness to change before actually delivering the intervention. This federal expert surmised that these interventions may sometimes be more successful than skills-based interventions and that once a participant is ready to change, he may be able to take advantage of services already available, such as employment services (with additional services for those who need greater support). She wondered whether this approach would be more effective than providing employment services to someone who is not ready to change, and suggested that enrolling individuals who are not ready to change may be one reason for the lack of program impacts in many studies.

The conversation then broadened even further, moving beyond the discussion of psychosocial factors. A researcher emphasized that if we are to really understand which men benefit from programs and why, it is important to consider not only individual-level psychosocial factors but also structural and systemic factors, which have been missing from the conversation and the Black Box report. He noted there are a series of complexities that are systemic in a given community or city or state that affect an individual's ability to be employed, pay child support, and be involved with his children. Approaching these issues the way legal scholars look at it, including a race analysis, would be informative, this researcher argued, especially for black men who are under a different microscope.

This researcher illustrated his point by describing his research with men participating in home visiting programs. From surveys, program staff indicated their interest in helping these men succeed, but in focus groups, these same staff displayed anger and distrust toward the men they served and did not validate their role as a father. This researcher hypothesizes that these staff may have issues with their own fathers or negative personal experience with men, which affects their attitudes toward the men they serve and, consequently, decisions about service delivery. He emphasized the power that these "street bureaucrats" have in these men's lives as brokers and gatekeepers of important services, and that these systemic realities must also be taken in account alongside consideration of psychosocial factors.

9.    Relevant Psychosocial Constructs

Roundtable experts identified a number of psychosocial phenomena that are relevant to the lives of low-income men that should be included in future research-whether as subgrouping variables, proximal outcomes, or simply as descriptors of these men's lives. Below we summarize the factors identified as most salient, beginning first with psychosocial factors identified in the Black Box study, then highlighting additional factors not specifically addressed in the Black Box report.

Psychosocial factors discussed as important that were addressed in the Black Box study included:

·      Readiness to change. This psychosocial factor was the most often discussed during the roundtable. One of the researchers summarized the three ways that experts had discussed readiness to change:

·      critical juncture, or forcing event (such as a meeting with a judge, or an arrest)

·      window of opportunity (such as the moment of a child's birth)

·      threshold (the point at which a father is prepared to make use of information or services because of the salience of the outcome expected)

·      Motivation. Experts also discussed the related concepts of intrinsic motivation (such as wanting a better relationship with one's child) and extrinsic motivation (such as the threat of having children removed from care, or having a request for a child support modification granted).

·      Men's past experiences. Experts agreed that it is important to understand men's experiences with their own fathers. For example, a practitioner emphasized the importance of having a father or father figure help orient a young boy to boyhood and manhood. A researcher and two federal experts each pointed to the important role that past trauma can play in men's lives.

·      Depression in men. A researcher noted that symptoms of depression in men are different from symptoms in women and can include anger and outbursts. This researcher noted that current measures of depression were designed to detect symptoms more commonly found in women than in men, and that researchers need to focus on and improve measurement of depression in men.

·      Stressors, stress, and anxiety. Many experts agreed that stressors and feelings of stress and anxiety are constant companions in these men's lives. A federal expert indicated that many Latinos experience stress from living in communities that don't want them there, whether or not they are undocumented immigrants. A practitioner concurred that many men of color live under suspicion and face harassment and constant surveillance; they also experience anxiety and fear for their safety from living in dangerous neighborhoods.

·      Religiosity. As part of his call for race analysis and a more nuanced representation of race in subgrouping research, a federal expert argued that religiosity is important to many men of color and should be more systematically examined in research on low-income fathers and men of color.

·      Peer norms. Experts agreed that peer and social norms help shape fathers' behavior.

·      Fathers access to his children. Experts agreed that understanding the relationship dynamics between the father and the mother(s) of his child(ren) is critical for helping the father become more involved in his child's life, and that programs can help foster this access. For example, a practitioner described his collaboration with a not-for-profit, "Art with the Heart," in which fathers engage with their children and their child's mother in community art projects.

Experts identified the need to consider additional psychosocial factors not addressed in the Black Box study but that are particularly relevant to men of color, including:

·      Structure and discipline. A practitioner indicated that this is what many of the men who come to his program need, even if they don't realize it. This is an example of a proximal outcome targeted by fatherhood programs that might also be useful to explore as a mediating pathway through which fathers outcomes are affected.

·      Isolation. Related to feelings of stress and depression, many experts mentioned that men of color targeted by fatherhood programs often experience feelings of isolation and loneliness.

Finally, in addition to considering psychosocial characteristics of men and their interpersonal relationships, many experts emphasized the importance of considering structural and systemic factors reflecting the conditions and contextual realities these fathers face, including:

·      Culture. Experts agreed that, in addition to peer and social norms, broader cultural norms are important, for they help define what it means to be a good father.

·      Structural opportunities for father involvement. A practitioner argued that noncustodial fathers need structural opportunities to engage with their children, and that the sustainability of the fatherhood program "business model" rests in the ability to measure and demonstrate the importance of such involvement.

·      Attitudes and behaviors of "street bureaucrats." A researcher highlighted the power that street bureaucrats have in directing fathers to services, and that their negative attitudes toward the men they serve can adversely affect the fathers' access to needed services.

In conclusion, a federal expert pointed out that research begins with assumptions and theories and that these assumptions and theories must be grounded in the realities of men's lives. This federal expert and others emphasized the importance of understanding whether certain motivations, conditions, contexts, experiences, and norms may be unique to, or play out differently for, various racial, ethnic, and cultural sub-populations. Only research studies that are grounded in these realities can provide dependable results that can effectively inform programming and lead to real results for men.

10. Implications for Programs

Throughout the roundtable, discussions of psychosocial factors and subgrouping often touched on issues relating to program design, delivery, and intake.

Regarding program design, experts pointed out the importance of tailoring program messages to the needs and goals of participants. For example, a researcher highlighted findings from the Fathers and Sons project, a prevention program targeting early sexual behavior, violence, and substance abuse among pre-adolescents. He noted that many men disagreed with the anti-violence message because their communities involve dangers that require their children know how to defend themselves. Similarly in his study of the Harlem Children's Zone, this researcher noted that the "no spanking" message was not well-received, because men said that they needed their children to listen to them under all circumstances-men argued that a swat on the behind was preferred to being hit by a car, for example.

Drawing from his research in public health, a researcher described a program seeking to promote men's healthier eating through increased intake of fruits and vegetables. He developed a taxonomy of ethnic identity, then tailored the healthy eating messages to the 16 distinct profiles based on their sources of motivation for eating healthier.

Another researcher said that while he liked that the Black Box project seeks to move beyond demographic characteristics in learning who responds best to intervention, he pointed out that this requires moving beyond the individual-level framework of psychosocial characteristics to looking at what programs actually do. This includes not only the services provided but also how participants are treated and the ambiance of the setting.

A practitioner reflected that his program sought to provide what men lacked in their lives: social support, structure, and discipline. He suspected that the men that benefited most from his program were those who lacked these at program entry. In particular, he noted that providing support groups for men in a comfortable setting really allows men to open up about their stresses and concerns; they begin unpacking deeper issues, but unfortunately, programs often don't have the resources to offer counseling to address these deeper issues.

A researcher thought that program providers need to be mindful about the timing and sequencing of service delivery. Programs might be most successful, he posited, if information and services are provided at a time when participants are best able to make use of them-whether due to a forcing event (like a custody hearing) or only after their readiness is "turned on." A federal expert suggested that programs may actually want to target readiness to change as a proximal outcome prior to providing services that individuals may not yet be ready for.

Experts also had ideas about how knowledge about fathers' psychosocial factors could help with targeting and triaging services. A researcher suggested that not every program effort had to be high reach and highly intensive. If we could figure out the "active ingredients" and how to best tailor services to needs of particular individuals, this researcher said, then we could deliver the right intervention to the right individuals and, thus, more cost-effectively use resources.

An evaluator also made this point, drawing from his research on homelessness and public housing. He noted that services need not be all or nothing and, in fact, when services are oversubscribed (like the tens of thousands in Chicago eligible for limited public housing assistance), the key is figuring out who might benefit from intensive and comprehensive services and who needs only light touch services. This evaluator mentioned that there is plenty of room to learn about predictive tools that might help triage services.

A federal expert noted child support researchers are using predictive analytics to sort and segment caseloads to better target and sequence services. She thought that psychosocial variables have a place in this effort but that the variables would need to be easily measured by an intake worker.

11. How Findings Can Inform Fatherhood Evaluations

At the outset of the Black Box study, the goal of the roundtable was to identify strategies for incorporating project findings into evaluations of fatherhood programs and initiatives. However, our study revealed that research on the psychosocial determinants of behavior change among low-income fathers is in its infancy, and roundtable participants agreed that there is not enough evidence to make concrete recommendations regarding baseline measures of psychosocial factors that should be included in future fatherhood evaluations.

Despite lack of empirical evidence, experts agreed that there was sufficient theoretical basis to supplement the use of demographic characteristics with an exploration of psychosocial factors that may shape a father's ability to benefit from a fatherhood program. Such exploratory research would help move the field forward by helping to build theory and generate hypotheses to test empirically.

In addition to considering psychosocial factors in creating baseline subgroups, experts believed that psychosocial factors implicitly or explicitly targeted by fatherhood programs-such as readiness to change, motivation to change, and social support-should be routinely measured and modeled as proximal outcomes, for these proximal outcomes reflect the pathways through which outcomes of ultimate interest-father involvement, employment, child support, partner relationships, and father well-being-may be achieved.

12. Challenges in Using Psychosocial Factors to Define Baseline Subgroups

Experts discussed three major challenges to using psychosocial factors to define subgroups for use in experimental program analyses: sample size requirements, measurement, and tensions between the needs of evaluators and practitioners.

·      Sample size. Many experts pointed to the common challenge of adequately powering impact and subgroup impact analyses, regardless of the subgrouping variables used, given the difficulty in obtaining sample sizes considered large enough to detect expected effect sizes. At the same time, however, experts agreed that psychosocial factors may well play a role in shaping which men benefit from intervention. Experts highlighted the need to ground the definition of subgroups in theories of behavior change and to generate theory-based subgroup hypotheses in order to do a better job of identifying subgroups for whom a program may be more and less effective. Though sample size would still matter, less "noise" from more theoretically-grounded subgroups may help to mitigate the sample size (power) issue.

·      Measurement. Experts reflected that one reason demographic variables were often used to create subgroups is that their operationalization is straightforward-even if, as a researcher noted, it is not entirely clear what demographic factors like race and ethnicity mean or what they are proxies for. Extensive research is needed to develop and validate measures of subjective constructs reflecting psychosocial factors. Experts noted that we are far from being able to recommend good measures of key psychosocial constructs because theories of behavior are rarely used to identify relevant psychosocial constructs, nor is there much empirical evidence to guide the selection of key constructs. A researcher commented that the Black Box literature review was designed to address these limitations by "shaking the trees" to see what psychosocial factors could and have been examined pertaining to low-income fathers. Mathematica's Pia Caronongan noted there is also little commonality in how the same psychosocial construct is measured.

Experts recommended investment in developing and testing psychosocial measures; an evaluator suggested "letting a thousand flowers bloom," and a researcher suggested a more systematic process by providing researchers ready access to measures through a centralized system of validating and measuring such variables.

·      Tensions between evaluators, practitioners, and funding authorization. Experts described tensions between the technical requirements of a well-designed impact study and the realities of programs and the populations they serve. For example, evaluators strive for well-powered studies and thus large sample sizes, but this might require broadening recruitment beyond those expected to benefit from services, which would only serve to dilute program effectiveness. In some cases, increasing sample sizes may even require expanding program eligibility beyond what is allowed legislatively. A policy researcher suggested respecting practitioners' instincts regarding who they believe are most likely to benefit from services and helping practitioners increase recruitment of participants for whom their program is a good match.

The need for larger samples also works against evaluating smaller programs and programs that offer an array of services that not every participant is expected to need or receive. A researcher advocated "throwing a wider net closer to the ground" by evaluating smaller programs with a strong program theory, even if these are non-experimental evaluations.

A practitioner reflected that what he's learned about program evaluation he learned from experts conducting rigorous research-many of whom were in the room-and that most practitioners are not so fortunate. He emphasized the need for a bridge between research and practice, whereby researchers educate practitioners on the value of research by communicating study findings in such a way that practitioners learn how to design and better manage their programs.

13. Directions for Future Research

At the close of the roundtable, we posed the question, "What more do we need to know?" Below is a summary of ideas and suggestions that experts proposed.

·      Develop a culturally-relevant theoretical framework. Roundtable experts agreed that we need a better understanding of the lives of low-income men of color. This would include qualitative studies and small-scale quantitative studies designed to help build a theoretical framework for identifying key factors-psychosocial factors, but also structural and systemic factors-that could influence the extent to which men of color participate in and benefit from fatherhood programs.

·      Consider participatory research. A federal expert cautioned researchers not to rely on preconceived notions or make assumptions based on personal experience when designing studies involving disadvantaged men and men of color; otherwise, study design and research questions can be "off the mark." Researchers extolled the value of participatory research for ensuring topics of study are relevant to and resonate with study "subjects."

·      Address systemic bias against exploring new measures. A researcher lamented the "vicious cycle" in the social sciences of measuring and reporting on the same five demographic characteristics because these are the easiest to measure-even if their theoretical underpinnings are not clear. He called for research that explicitly examines and tests various ways of measuring a host of psychosocial factors in an effort to test empirically the various fatherhood theories as they are developed. This researcher advocated for a centralized system of measuring and validating variables to more efficiently move the field forward on measures development and minimize reinventing the wheel. Another researcher concurred that even when psychosocial variables have been examined-such as depression-they are not contextualized to reflect the lives of African-American and Latino men.

·      Better understand program processes and how men respond. Experts uniformly agreed that we need more information on the programs themselves, how they are experienced by the men they serve, and why they do (or don't) work. A researcher advocated research identifying the "active ingredients" in fatherhood programs and process studies exploring why programs appear to work for some men and not others. He also emphasized the importance of information from basic needs assessments to better understand what fathers want in a program and what they feel they need. Another researcher concurred that it would be useful to know when key behavior change processes are "turned on" and how programs can foster this process. A federal expert agreed that it would be useful to know the circumstances under which readiness to change and other hypothesized determinants of behavior change are activated, whether interventions could be developed to explicitly target readiness to change as a critical proximal outcome, and whether individuals exposed to such interventions would then naturally seek out available services on their own.

Regarding the exploration of subgroups, a researcher pointed out the importance of testing various hypotheses-do programs work better for men with more or few sources of social support?-through small-scale quantitative studies before going to scale. A federal expert concurred. He said that large-scale impact studies are very expensive and are designed to answer a single question (does the program work?), but the fatherhood field can benefit from asking many smaller questions first.

·      Examine proximal outcomes. Researchers argued that outcomes studies are important for exploring potential pathways through which programs may affect fathers and, ultimately, their children. One of these researchers proposed a research agenda by suggesting that we start by drawing from theories of human behavior to develop a theory of behavior change among low-income men of color, then design and test research-based programs on a small scale. We could then explore whether these programs appear to affect proximal outcomes, such as readiness to change and willingness to engage with his children, and whether any such changes appear linked to longer term outcomes, such as father involvement and, eventually, child well-being. A couple researchers echoed the call for research on measures, especially to get to the point where researchers can recommend one or two key things that fatherhood programs should measure at baseline (as predictors of outcomes or as possible subgroups) and as proximal outcomes hypothesized as necessary precursors to changes in outcomes of ultimate interest. Only then can theoretically-relevant subgroups be tested, this researcher argued, and only then should program evaluation research go to scale.

·      Explore various methodological approaches to creating subgroups. Experts also suggested more research was needed on various methodological and statistical approaches to creating subgroups and their relative utility. A federal expert highlighted the increased use of predictive analytics in segmenting child support caseloads and wondered whether this work could help inform the targeting of services at the program level. A researcher thought it would be interesting to see results from a cluster analysis of key psychosocial factors to see whether naturally occurring subgroups existed, taking into account a number of theoretically-relevant variables. A federal expert who was not able to attend the roundtable but who sent written comments on the Black Box Synthesis Report suggested researchers consider methods such as latent class analysis and other approaches presented at the 2009 Interagency Meeting on Subgroup Analysis ( Similarly, a policy researcher suggested researchers examine a recent paper by Laura Peck for an approach to examining impacts for post-baseline or "endogenous" subgroups that retains the experimental design.[13]

·      Bridge research and practice. A practitioner emphasized the need for researchers to communicate their findings in a way that helps practitioners design, implement, manage, and improve their programs.

·      Fund research on culturally-diverse populations. In responding to a question about why research on fathers-especially disadvantaged fathers of color-is so limited, a federal expert answered that money has not been behind researchers who study diverse populations because there hasn't been a recognition or awareness of the changing demographics in recent years nor the implications of these shifting patterns for the next 10 to 50 years. He added that even small studies could have a tremendous impact on the field. An evaluator pointed out that HHS has the power to shape research by including language encouraging subgrouping research as part of their RFPs and that the best way to test scales and metrics and identify theory-based sub-groups is to actually do the research.

E. Summary and Conclusions

Roundtable participants agreed that there is value to considering fathers' psychosocial characteristics that may predispose them to benefit from fatherhood programs but that this research is still very nascent. Experts called for qualitative research to explore the lives of low-income men of color and their experiences in fatherhood programs, small-scale quantitative research to explore the links between psychosocial characteristics and fatherhood outcomes, and program evaluation research that employs a variety of innovative strategies for creating subgroups using both psychosocial and structural/systemic factors.

Research on what works in fatherhood programming is also still in the formative stages. Experts strongly believed that fatherhood research has a way to go before we are in a position to recommend psychosocial variables (and quality measures of those variables) for use as baseline subgroups in program impact research. Before we can test the predictive utility of psychosocial subgroups in large-scale fatherhood evaluations, smaller-scale and qualitative studies-grounded in theoretical frameworks reflecting the lives of men of color-are needed to identify both psychosocial and structural/systemic issues that may suggest for whom and under what circumstances a fatherhood program is most effective, and the processes by which men appear to benefit from programs.

[1] A key principle in social marketing, audience segmentation refers to the division of a target audience into homogeneous subgroups according to an individual's constellation of knowledge, beliefs, social norms, and behaviors pertaining to the outcome or behavior targeted for change (Slater 1996).

[2] Multivariate methods may still be subject to omitted variable bias if, even with the inclusion of covariates, there are other unobserved variables that are correlated with both the predictor and outcome.

[3] The attitude measures consisted of different items in each time point. Change in attitudes over time was assessed by standardizing the attitude variables at each time point and calculating the difference between the standardized scores.

[4] The direction of this association was contrary to the author's hypothesis.

[5] The direction of this association was contrary to the author's hypothesis.

[6] The direction of this association was contrary to the author's hypothesis.

[7] The direction of this association was contrary to the author's hypothesis.

[8] The direction of this association was contrary to the author's hypothesis.

[9] The direction of this association was contrary to the author's hypothesis.

[10] LaVeist, T. (1996). Why We Should Continue to Study Race But Do a Better Job: An Essay on Race, Racism, and Health. Ethnicity and Disease, 9, 21-29.

[11]Schuerman, J.R., Rzepnicki, T.L., and Littell, J.H. (1994). Putting Families First: An Experiment in Family Preservation. New York: Walter De Gruyter.

[12]Lattimore, P.K., Barrick, K., Cowell, A., Dawes, D., Steffey, D., Tueller, S., and Visher, C.A. (2012). Prisoner Reentry Services: What Worked for SVORI Evaluation Participants? Final Report. Research Triangle Park, NC: RTI International.

View full report


"rpt_insidebb.pdf" (pdf, 1.35Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®