Minutes of the Technical Assistance Workshop, May 3-5, 2000. Topic 2: Data Standardization


The Data Standardization session was led by Allen Harden of Chapin Hall. Harden said that the session might be better named data "decomposition and standardization," because you can't standardize data without thinking about how to break it apart, and noted that the aim of the session is to show how to standardize data for purposes of comparison.


Measurement. Using a variety of illustrations, Harden addressed the need to devise measures suited to the topic of interest. When thinking about constructing measures, it is important to recognize early on what you are measuring, that is, what is the unit of analysis. In this work, it is often a child, but might be other things, such as attributes of families or places. Sometimes a statistic, an absolute magnitude, is all a manager cares about. But, this is rare. It is more common for the policy community to need more information, such as information for comparison. Comparisons can require rates, means, unduplication, and other adjustments of data.

When making comparisons among data, it is important to understand the definition of the terms. Harden used kinship care as an example, noting that different state definitions of kinship care inhibited comparisons of foster care population statistics. He pointed out the importance of interviewing the sources of the data to ensure that you know what generated the measure, how it is defined, and who is included in the counted group.

Population of reference. Along with the measure, it is necessary to define the population of reference, the denominator of a proportional measure. A simple percentage is the share of children with particular attributes. That prevalence statistic, may, by itself, be sufficient.

But trying to get information on what produced the statistic in the first place requires a more complicated approach. This is especially true if the numerator is an event or something else that is countable, then it can be important to define the denominator as the population that can be appropriately be thought to be at risk of experiencing this event. When we can define an event and the likelihood of that event occurring to a population of interest, this is an incidence statistic. (Harden noted that because the language used to define these measures comes out of public health and epidemiology, it is negative sounding, including phrases like "at risk." He noted that one can be "at risk" of success.)

Using the example of child welfare, Harden said that the initial prevalence statistic can represent two things--rate of entry (or incidence) and also duration in care.

Comparisons. Harden moved through an array of comparisons to which measures might be put. These included over time, by age of population, by region of residence, by gender, and others. He pointed to the work done by demographers on fertility as an example. A crude measure is the birthrates for a particular populations, more informative are age-specific birthrates that get at the timing of births.

An audience member said that the idea of standardization was attractive, but much of the data he encountered was compromised. He asked for guidance on how much compromise was too much. Harden said that these determinations are made based on what the data user ultimately wants to know, when he or she needs to go out and get better information. He also noted the importance of attaching the right explanations to the data so that there is no confusion about what the data can and cannot explain. Harden showed some comparisons of child populations in Chicago.


Discussion revolved around appropriate uses of data.