Core Dataset Project: Child Welfare Service Histories. Data linking and merging


Probabilistic record-linkage

The most reliable means of matching records proves to be a process called probabilistic record-matching, first developed by researchers in the fields of demography and epidemiology (Newcombe, 1988; Winkler, 1988; Jaro, 1985, 1989; Baldwin, Acheson, & Graham, 1987). Probabilistic record-matching is based on the assumption that no single match between variables common to the source databases will identify a child with complete reliability. Instead, probabilistic record-matching calculates the probability that two records belong to the same child using multiple pieces of identifying information. Such identifying data may include name, birthdate, gender, race/ethnicity, and county of residence. When multiple pieces of identifying information from two databases are comparable, the probability of a correct match is increased. A few commercial software programs perform record-matching and can be customized to perform matches between two databases.

Once a match has been determined, a unique number is assigned to the matched records so that each record can be uniquely identified. The end result of computer matching is a new file, in our parlance a "link-file", which contains the unique number assigned during matching, the child's identifying data (name, birthdate, race/ethnicity, gender, and country of residence), and all the identification numbers assigned by agencies from which the child received service. For example, if Janie Smith has been a foster child and received mental health services, the new file will contain her foster care and mental health ID numbers, her new unique number, and her name, birthdate, race/ethnicity, gender, and county of residence. In the aggregate, link-files serve to establish the relationships among data in source databases and provide a means of retrieving groups of records that meet specific criteria.

Child protection and child welfare data are stored in two separate systems in Illinois:

CANTS, the Child Abuse and Neglect Tracking System, and CYCIS, the Child and Youth Centered Information System. Since children are identified by different IDs in CANTS and CYCIS, we have applied probabilistic record-linkage techniques to identify the same child in both systems. Children are matched on a host of variables (i.e. name, address, date of birth, sex and race) each of which can be given different weights to allow for data entry error.

Michigan assigns a recipient ID to every person receiving state services. As a result, its child protection and child welfare data systems were already "linked".

View full report


"chapin1.pdf" (pdf, 71.08Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®