Minutes of the Technical Assistance Workshop, May 3-5, 2000. Topic 3: Small Area Analysis


Fred Wulczyn and John Dilts of Chapin Hall led the Small Area Analysis session.

Geographic Information Systems and Small Area Analysis


In his introduction, Wulczyn announced that Asher Ben-Arieh would lead a discussion around issues he raised in his talk the day before. Wulczyn then provided a cursory history of small area analysis, tracing its origin in Chicago to the early days of the Cook County juvenile court a century ago, when the court began to map the addresses of the children coming before the juvenile court. The recognition that many came from the same neighborhoods helped foster an investigation of the ways in which neighborhoods were implicated in child development and that neighborhood flaws, not the character flaws of children and families, might result in children coming to the court's attention. Early geographic mapping was done by hand. A purpose of this session is to cover the process of conducting small area analysis and to cover some of the terminology associated with it. It is also meant to show how maps can be both useful tools and can create pitfalls. Wulczyn said of maps, "As is so often the case, pictures do say 1000 words, but there are at least 1000 words that are unsaid by those pictures and we want to show you both sides of that scenario in our presentation today."

Geographic Information Systems

Dilts set out to discuss geocoding and mapping. He defined geocoding and sketched his purposes, saying:

Geocoding is the process of taking data of any sort and making it analyzable in terms of its spatial characteristics. That sort of analysis can be a visual analysis, the kind you do when you're drawing maps, but it can also be statistical analysis. There are all kinds of techniques that statisticians have used and geographers have used for a long time to analyze data that has geographic characteristics attached to it.

There are two aspects to geocoding. One is the standardization of address records. The other is the process of matching that standardized data to its geographical reference point.

Standardization is necessary because address records are particularly complex and are typically stored as strings of data. Such strings can be difficult for computers matching to interpret unless they are standardized. Steps in standardizing can include separating the elements of the address into units that that can be recognized for matching, like house numbers, street names, or zip codes; by cleaning out extraneous information; and by always using the same abbreviation to represent the same thing. The approach to standardization depends upon the number and quality of records. A few hundred records might be best standardized by hand. But large record volumes or data of varying quality might be best standardized by computer. Chapin Hall uses a standardization program called AutoStand.

Dilts discussed how addresses, once standardized, could be matched with a particular geographic location by the use of geographic reference files, such as the Census Bureau's TIGER (1) files or other files. TIGER files contain geographic coordinates and Census information for every address within a given area. There are two methods of matching the address to a location--deterministic matching and probabilistic matching. Such desktop packages as Map Info, Arc info, or Map Point perform deterministic matching in linking addresses and geographic records. Data problems can interfere with the reliability of deterministic matching, making probabilistic matching a better choice. Probabilistic matching is done by a specialized software. Chapin Hall uses Auto Match. Auto Match links records on the basis of information that is similar, within certain parameters that are set by the user. It's called probabilistic because the matching is based on an underlying statistical probability that two records are in fact the same record. Once you've done this you can move on to the fun step of trying to analyze your data either visually or statistically.


Wulczyn asked for questions.

An audience member asked if Auto Match users can set their own levels of probability? Dilts replied:

Yes, absolutely. The software allows total control of the match in terms of the amount of importance that you want to attach to various elements. It might be very important that the house number match exactly, but not so important that street name match exactly. You may allow a little bit of difference in the spelling of the street names that sort of thing. It allows you to determine those levels. It also allows you to determine the overall levels of certainty that you want that the two match.

A number of state representatives described the mapping software in place in their areas and on the difficulties of obtaining the necessary specificity. Wulczyn commented:

On the geocoding side, the lowest level of granularity is the address. Literally, the TIGER file is latitude and longitude coordinate for every address that the census has. And from that point you can wrap it in any kind of political or social subdivision, zooming in and out depending upon the situation. The issue, if you only have zip codes you don't necessarily have to go through a geocoding process, you already have a piece of geographic information. The question is how well does that overlap with political, social, or other types of subdivisions.


Like statistical analysis, mapping helps tease out the patterns in the data and make them apparent. With a map, information can by layered by overlaying successive slides containing different types of related information. Wulczyn and Dilts showed a dot-density map of the homes of substantiated abuse and neglect cases. That information was layered on a map of Chicago that contained the boundaries of the city's 77 community areas. Of that map, Dilts said:

You can begin to impose a little order on the data and allow comparisons to be made. But dot-density maps are themselves pretty limited because they don't allow you to . . . quantify comparisons between the areas. You can eyeball this and draw the conclusion that some areas have much less than others, because there's a density clustering around these dots, but that's about all you can say. Given that, the thing that one has to do next is to begin aggregating the data within these areas and to create what's called a thematic map.

Wulczyn added, "The question here is . . . how do you use the maps to tell an effective story? That's essentially one of the things that we're trying to do within the social indicators context is to be able to describe what's happening." Wulczyn then showed how examining smaller geographic areas and by studying the density of incidence of abuse and neglect rates, the user can adjust the way he or she constructs a mental model "about the underlying social dynamics in these areas." He further developed this illustration by showing maps of small geographic areas that included both abuse and neglect rates and rates of AFDC (Aid to Families with Dependent Children) receipt for small areas. He showed how the seeming relationship between high rates of abuse and neglect among families receiving AFDC indicated by the map was an oversimplification.

The presenters continued adding information to the map, explaining how far mapping would take them and noting the point when they reached the "limits to how much insight these visually appealing maps actually impart." By combining the maps with line charts, the presenters matched rates of child abuse and neglect cases with rates of AFDC participation to illustrate the relative levels of incidence, a comparison that helped reveal a weak relationship between high rates of AFDC participation and high rates of child abuse and neglect reporting. Wulczyn said

So what was visually appealing upon closer examination, again the story shifts a bit. It suggests there is something more subtle going on. We have spikes here. The point is that in adjacent areas, adjacent not in a geographic sense, not neighboring Census tracts in a geographical sense, but neighboring areas with respect to how many AFDC participants they have or the rate per thousand, in that sense of a neighbor, they could have very different child abuse and neglect reporting rates. And maps are as likely to obscure that point as they are to reveal the point. So (we must) go back to a more standard X/Y plot... Remember, we started out this series of slides with the traditional X/Y plot . . . . we're not abandoning the need to look at things in a more traditional way because it reveals that there may be problems with imparting too much causality to the relationship between AFDC and child abuse and neglect reporting rates.

A member of the audience said,

If you were looking at Child Protective Services (CPS) and wanted to do an intervention in that area, what this is saying to me is that maybe the common thinking about the relationship between CPS and income isn't holding true. And then you would want to map some other things like foster care, family size, ethnicity, and that would start to tell you more and you would start to change your variables.

Wulczyn replied:

That's right. And that's the advantage, you're essentially creating multiple links to a variety of standardized data sources. And so the Census data will give you some things about household structures, so on and so forth. If you have other sources of administrative data or survey data that you can integrate with this sort of larger schemata of information, it maintains the organization of information so that the process of going through and generating questions isn't delayed because you have to get the information together. It allows you to do things in a very interactive way and change your perspective in a real-time sort of way whereas in the old days it would take a long time to get the information.

During the remainder of the session, there was some discussion of mapping software, including costs and limitations. Ann Segal said that Jake Jacobson of Charlotte, North Carolina was using a variety of mapped data, including foster children's residence, Medicare receipt, bus routes, utility shut-offs, and other information in order to help planning for multi-generational daycare centers that would serve as senior centers and child care centers. In addition to mapping, he evaluated children and used other tools. Segal noted that Jacobson's maps are not public documents, but are used for internal planning purposes, a way of protecting confidentiality.