The databases of each agency were linked on the basis of common information on each of the individuals in each of the databases. Determining what data were necessary and reliable provided a challenge. This required extensive review of the existing data and cooperation on the part of the human service agencies to allow access to their administrative data.
Deterministic vs. Probabilistic Matching
The greatest technical challenge involved in creating an integrated database is accurately linking the records of individual clients across agencies. This process is complicated by the fact no single variable can be relied upon to establish the identity of a child from the records of various agencies. Though each child receiving a service is typically given an identification number (ID) unique to a particular program, each agency and department uses its own system of identification numbers. Indeed, a single agency may issue a single client more than one ID, since IDs may be assigned each time a case is opened or a child or family receives services. Other variables that might be used to establish an "all-or-nothing" match are equally problematic: even names and birthdates that "match perfectly" may refer to two different individuals, as a result of incorrectly entered data or other human error.
The most reliable means of matching records proves to be a process called probabilistic record- matching, first developed by researchers in the fields of demography and epidemiology (Newcombe, 1988; Winkler, 1988; Jaro, 1985, 1989; Baldwin, Acheson, & Graham, 1987). Probabilistic record-matching is based on the assumption that no single match between variables common to the source databases will identify a child with complete reliability. Instead, probabilistic record- matching calculates the probability that two records belong to the same child using multiple pieces of identifying information. Such identifying data may include name, birth date, gender, race/ethnicity, and county of residence. When multiple pieces of identifying information from two databases are comparable, the probability of a correct match is increased. A few commercial software programs perform record-matching and can be customized to perform matches between two databases. The software program called Automatch was used for this analysis.