## Introduction

Predictive analytics is increasingly seen as a technology that can improve child welfare outcomes, turning hindsight into insight and insight into value. Predictive analytics can be defined as analysis that uses data, statistics, and algorithms to answer the question "Given past behavior, what is likely to happen in the future?" It is important to note that predictive analytics has the potential to produce benefits only when model results are used to intervene differentially based on the model output, leading to improvement on the outcome of interest.

This tool will guide the user through a series of criteria to help determine whether predictive analytics is an appropriate approach for the child welfare question you are considering. Before getting started, be prepared to walk through each of these sections with a specific question in mind such as, "Can we predict the number of children who will be reunited with families next year?" or "Can we predict which children in foster care are likely to experience a placement disruption in the next six months?" Reading and discussing the criteria below enables you to decide whether you have the building blocks to use predictive analytics for your particular question in child welfare. If not, explore the suggestions provided for addressing each item. Keep in mind that answering "Yes" to each criterion does not eliminate all risk for your project. Any "No" answer indicates that predictive analytics is not currently an appropriate approach. As you work through each criterion, click the ⓘ for more information about the issue and why it is important.

For a more detailed discussion on this topic, see Predictive Analytics in Child Welfare: An Introduction for Administrators and Policy Makers.

### Data sufficiency ⓘ

#### Is there sufficient, available quality data that is relevant to the chosen question?

- Does the data capture attributes of the child and perpetrator environment that can explain the question?
- Is there data on the prediction outcomes, both positive and negative?

Before building a predictive analytics model, an agency needs to determine whether it has the breadth of data required to analyze the question. Breadth of data is described as the extent to which a particular dataset provides a complete picture of a child, their environment, and the particular outcomes being analyzed for this question. To start, child welfare agency employees should conduct a preliminary examination based on the research literature and practice experience of what attributes could impact outcomes, which could serve as the initial set of data required to analyze the question. This should be informed by a review of the emerging literature on both predictive analytics in child welfare and the topic addressed by the predictive analytics implementation, as well as early lessons learned by agencies that have begun working with predictive analytics (see Predictive Analytics in Child Welfare: An Assessment of Current Efforts, Challenges and Opportunities).

Finally, an agency should consult with the data owners to ensure that the data meets certain quality standards. Just because a data element exists does not mean that the element is worth using; if the data are not up to certain quality standards, they could incorrectly influence a predictive analytics model. This is usually referenced through the saying, "Garbage in, garbage out" to reflect that poor-quality data inputs result in poor-quality model outputs.

This preliminary examination will help agencies determine whether they need to look further to identify whether there are other factors that may have been overlooked initially. The greater the breadth of data that are available, the more likely it is that useful models can be built. If every desired variable cannot be included in the final dataset, analysts can still build predictive models; however, administrators need to understand that model results are only as accurate as the data that is used to feed the model. If key factors related to the outcome of interest are not available, the agency should proceed cautiously—it is possible that the missing factors do not convey useful information, but it is also possible that the inability to include key variables will significantly reduce the model's accuracy. Despite missing information, the exercise of building a model may still be valuable given the correct interpretation and understanding of the model's shortcomings. Child welfare administrators should discuss the available data with their data scientists to determine if they have sufficient data to proceed.

Consider talking to your subject matter experts about which data elements are necessary. In addition, you can review relevant academic literature for suggestions on data elements that might be useful for exploring the outcome of your question. After determining the necessary data elements, coordinate efforts with corresponding data owners to join the elements into your database. This effort might require additional data use agreements if working outside of your agency.

Predictive analytics is not appropriate for your project at this time.

### Data quantity ⓘ

#### Is there enough data, that is, an ample number of observations of the event you are trying to predict and the various circumstances that may contribute to the outcome, to provide an adequate base of information on which to build a model(s)?

Because predictive analytics depend on statistical properties of data to predict and classify, the more data available for a given event, the more likely the model will be useful. Additional observations typically help to obscure randomness and enable more accurate predictions. While there is no standard for minimum number of observations, most data scientists agree that having as many observations as possible to describe the outcomes is preferable. In the case of predicting categorical outcomes (such as the type of response made to an abuse hotline call), enough data should be captured on each outcome to describe differences between them. Child welfare administrators should make sure to discuss the available data with their data scientists to determine if they have enough observations to run the desired predictive modeling algorithms. This criterion functions as a question-defining criterion, to help hone in on whether the desired outcome is the best application of predictive analytics given a child welfare agency's limited resources and risk tolerance for producing a useful model.

Consider talking to your data scientists about how much data is necessary. While data might be available, discuss the idea of collecting additional data on the population being analyzed and the outcomes you are looking to predict. The data scientists should be able to provide you with a rough estimate for the amount of data they feel would be sufficient for modeling that question accurately. After determining the necessary quantity, coordinate efforts with corresponding data owners to retrieve the data into your analysis environment. This could involve going further back in history or collecting additional data in the future.

Predictive analytics is not appropriate for your project at this time.

### Identified implementation strategy ⓘ

#### Is there an effective implementation strategy identified for use once the predictive analytics have been completed?

- Can the predictive model results be used in such a way that has a measurable, positive impact on the child welfare system?

Predictive analytics has the potential to produce benefits only when model results are used to intervene differentially based on the model output. It is imperative that the agency be clear about what it would do differently to intervene with families and have a strong theory of change (or better yet, research evidence) demonstrating how the differentiated intervention would lead to improvement on the outcome of interest. In general, a set of implementation strategies should be identified that would allow the child welfare agency to take action through direct use of the model results or by informing policy changes. The child welfare agency needs to have both a policy-related plan for using model results and a technical-related plan for integrating predictive results into existing IT systems (e.g., dashboards, flags in a screening tool, etc.). The plan should also provide for the continual assessment and, as needed, revision of the implementation plan as the model is put into production.

For any effort to be worthwhile, successful implementation should have a measurable impact on the children and families served by the child welfare system. Outputs of these models should be useful and actionable, not simply "nice to know." A child welfare agency might also be able to use the results in such a way to inform policy decisions to improve efficiency and allocation of resources. An agency would need to consider the spectrum of impact these results have before deciding if the model is worth the investment. In the context of assessing this criterion, the agency needs to decide what metrics they will use to measure impact. What outcomes does the agency want to achieve by implementing the model? Before building a predictive model, the data scientist will work with subject matter experts to determine what level of performance on specified metrics is acceptable enough to proceed.

Revisit your question and discuss how modeling this particular outcome could improve your organization's performance in meeting its goals. If technology is the barrier to implementing a predictive analytics solution, consider a discussion with your information technology department on what the barrier is and their ideas for overcoming it. If the policy is the barrier to implementing a predictive analytics solution, consider discussing with stakeholders any ethical or political issues involved.

Predictive analytics is not appropriate for your project at this time.

### Resource requirements ⓘ

#### Can the predictive analytics efforts be completed at a cost that is projected to be less than the perceived benefit after implementation?

As with any large project, cost and resource availability is a limiting factor that can dictate the project's success or failure. Predictive analytics require considerable expertise and technical infrastructure to succeed, and there are significant risks involved. For the endeavor to be worthwhile, the expected benefit to child welfare must justify the investment. This criterion transcends both the pre-modeling and model assessment phases and needs to be continually revisited throughout the entire duration of a predictive analytics project.

When considering available staff resources, recognize that subject matter experts most likely already have full workloads and, consequently, limited availability to contribute to new initiatives surrounding predictive analytics. Furthermore, many child welfare agencies do not have access to experienced data scientists or analysts, so hiring for new positions or contracting with external organizations to conduct the predictive analytics work is an added cost that needs to be considered.

Technological resources also need consideration under resource requirements—does the child welfare agency have access to the specialized technological tools required to implement predictive analytics? When choosing which software to utilize, a child welfare agency will often have a choice between an open source or proprietary solution to support their project. Each comes with advantages and disadvantages that should be discussed before selecting the software. Regardless of the chosen type of software, a child welfare agency also needs to have access to adequate hardware resources—such as servers, virtual machines, etc.—that provide enough computational power to run the software's algorithms. All of this investment has associated cost in terms of budget, as well as human resources to implement and maintain.

Lastly, in assessing the resources required for a predictive analytics project, an agency needs to consider the full timeline required to complete the project. A predictive analytics project is often a lengthy endeavor that cannot be completed in a few short days.

Consider the overall resources needed for a typical analytics based process. Due to the nature of a predictive analytics project needing more than one iteration on the data and analysis, contemplate planning for the need of additional resources. While the number of iterations can be hard to predict, having a rough estimate can help you avoid running out of resources before the model can be implemented.

Predictive analytics is not appropriate for your project at this time.

### Stakeholder support ⓘ

#### Is there enough approval – within the child welfare agency, the government at large, and among key stakeholders – to support the implementation of the predictive analytics effort? Is the agency prepared to be transparent about both the analytics process and the results?

The child welfare agency needs to be able to obtain the support of its stakeholders on the usefulness of the predictive analytics, and obtain agreement that such an investment is worth the cost and any associated risks. The agency must adequately address any ethical concerns surrounding the use of child welfare data for predicting a potential future outcome. While predictive analytics are widely used across industries, the potential implications are much more significant in child welfare. The consequence of incorrectly identifying abuse that is not present, or missing abuse is far higher than incorrectly identifying a consumer’s movie preference. For more information on ethical concerns behind predictive analytic projects in child welfare, see Predictive Analytics in Child Welfare: An Assessment of Current Efforts, Challenges and Opportunities.

Given the potential of implementing predictive analytics in child welfare, instilling public confidence in the predictive models and being transparent about the process is crucial. Without stakeholder and public buy-in for a predictive analytics project, barriers to success, ranging from political opposition to difficulty obtaining data sharing agreements and implementing findings, will likely increase. While there will always be concerned parties, a successful predictive analytics project will likely have multiple stakeholders intimately involved in the process, from data sharing to model building to model implementation. If a predictive analytics project does not consider stakeholder input, there is potential for the project to be shut down or stopped before implementation despite its potential usefulness. This criterion transcends the planning and implementation phases and needs to be continually assessed throughout the potential predictive analytics project.

Explore the possible benefits that the predictive analytics could have from the perspective of your stakeholders. Identify any particular issues that might be of concern to those stakeholders and discuss any concerns proactively. If it might be the use of particular data elements, discuss the use of them and the relative importance each data element provides to the solution. If the concern might rely on the impact to the outcome, such as over intervening in a child's life, work with your data scientist to understand and estimate the impact from that solution before moving forward with the project.

Predictive analytics is not appropriate for your project at this time.

### Validity of the model ⓘ

#### Is the modeling process rigorous and appropriate for the chosen question?

- Does the model accurately represent the real world?
- Do subject matter experts approve of the model results?

The 20

^{th}century statistician George E.P. Box famously said, "All models are wrong but some are useful." Because the real world is random and unpredictable, no predictive models will be perfect; however, some models will be better than others at approximating behavior. Immediately after running a modeling process, data scientists need to put the model through a validation process to understand how well this particular model approximates the real world. Child welfare administrators should discuss the model validation—including the items described below—with their data scientists before implementing a predictive model.The first step to model validation is assessing if you have chosen an appropriate algorithm. Does the algorithm predict categories or quantities, and is this consistent with your question? Does the data satisfy any necessary conditions to produce reliable results? Once the model algorithm and assumptions have been validated, the data scientists will likely turn to the model results and compute measures of accuracy, precision, and other metrics that can describe how good the model might be.

A final important step to model validation is looking at the important features as suggested by the modeling output and how they interact to predict the desired outcome. Do the variables make sense with the subject matter expert's intuition and expertise? While not all important features may be known ahead of running the predictive model, the chosen features should not be antithetical to knowledge about the question or how the child welfare system works. If a feature is important and yet unexplainable, this could be a sign that something failed in predictive modeling process or a fault in the underlying data.

Discuss with your data scientists on their suspicions why the accuracy might be too low. There could be many reasons as to why this happened including lack of data or poor choice in algorithm. If data is lacking, consider revisiting the data that is being used and work with your subject matter experts to identify other elements that could describe the outcome you are modeling. This might involve reaching out to other organizations to collect additional data points to add to your model. In addition, discuss the benefits and challenges offered by the algorithm selected by data scientist. There are many algorithms that might be applied to modeling a particular outcome for which none is perfect. Discuss applying more resources to expand the tuning of that particular algorithm or selecting a different algorithm to model this outcome.

Predictive analytics is not appropriate for your project at this time.

### Accuracy of the model ⓘ

#### To what extent does your model correctly predict the outcome of interest?

- Are the false positive and false negative rates within the risk tolerance for the agency?

Model accuracy is typically the most commonly discussed criteria when evaluating predictive analytics. Accuracy is defined as the ability for the model to predict the true value given a set of inputs, either in terms of classifying something in a category or predicting a value. This is often done by withholding a set of historical data that the model has not seen before and comparing model predictions with known, historical outcomes. In terms of predicting a categorical value, the model should minimize rates of false positives (e.g., incorrectly identifying risk where it doesn't exist) and false negatives (e.g., failing to identify risk where it exists). However, no model can simultaneously optimize in both directions, and there will always be a trade-off between minimizing false positives and false negatives. For models predicting a quantity, data scientists can help to calculate various metrics that assess how well that model fits your data. These metrics are used to describe the model's overall accuracy but can also describe how well it performs in extreme cases.

Regardless, the model must outperform tolerances established by the agency for both types of errors to be considered successful. If a model is not highly accurate, it could lead to missing abuse that is present, or incorrectly identifying abuse that is not present. Tolerances for such errors will depend on the consequences for the child, family and agency and the implications of false positive and false negative results given the planned differential intervention and the outcome being modeled. Child welfare administrators should review the accuracy of a predictive model with their data scientists to understand how the model performs relative to other predictive models.

Discuss with your data scientists on their suspicions why the accuracy might be too low. There could be many reasons as to why this happened including lack of data or poor choice in algorithm. If data is lacking, consider revisiting the data that is being used and work with your subject matter experts to identify other elements that could describe the outcome you are modeling. This might involve reaching out to other organizations to collect additional data points to add to your model. In addition, discuss the benefits and challenges offered by the algorithm selected by data scientist. There are many algorithms that might be applied to modeling a particular outcome for which none is perfect. Discuss applying more resources to expand the tuning of that particular algorithm or selecting a different algorithm to model this outcome.

Predictive analytics is not appropriate for your project at this time.

### Precision of the model ⓘ

#### Can the model reliably predict accurate results for multiple cases?

- Does the model have enough consistency with its predictions to be implemented in the field?

A model may be able to predict the correct outcome, e.g. have high accuracy or validity, but it may not reliably repeat that correct prediction. This idea is known as the precision of a model. A model may be accurate and not precise, precise and not accurate, or any combination of the two. When a model is accurate and precise, the model predicts the correct outcome a large percentage of the time. Conversely, when a model is accurate but not precise, it can predict the correct outcome, but there is a lot of variation that can result in inconsistent predictions. When implementing a model, the agency needs to assess the ability of the model to consistently predict accurate results and determine if this rate is within their desired risk tolerance. Like with model validity and accuracy, child welfare administrators need to review model precision with their data scientists to better understand the performance of the predictive model.

Discuss with your data scientists potential reasons why the precision might be too low. There could be many reasons as to why this happened including too much data. Consider revisiting the data that is being used and work with your subject matter experts to identify unnecessary elements that do not clearly describe the outcome you are modeling. In addition, discuss the benefits and challenges offered by the algorithm selected by data scientist. There are many algorithms that might be applied to modeling a particular outcome for which none is perfect. Discuss applying more resources to expand the tuning of that particular algorithm or selecting a different algorithm to model this outcome.

Predictive analytics is not appropriate for your project at this time.