Minutes of the Technical Assistance Workshop, May 3-5, 2000. Topic 1: Data Linkage


Speaking at this session was Bong Joo Lee of Chapin Hall. Each participant in the session introduced her- or himself and explained why she or he was attending. As a prelude, Bong defined a data warehouse, saying that it is an integrated information system that is primarily used for decision support in contrast to a client-tracking system. Client-tracking systems need real-time capability, a data warehouse is not and cannot be made to yield real-time data. A data warehouse links data from a variety of sources to form a coherent picture.

Administrative Data

Lee pointed out two advantages offered by administrative data over survey data. One was that they provide information on a complete population of interest rather than a sample. This will allow economical analyses of very small subpopulations. Expense can be further reduced because administrative data are collected as an integral part of the functioning of government departments and agencies. If you can find a way to use administrative data, then you don't need to incur the additional expense of collecting new data.

Record Linking

Linking records involves

  • Standardizing and cleaning data to ensure compatibility and avoid duplicate records
  • Selecting a deterministic or probabilistic linking method.

Deterministic linking methods give equal weight to different types of information a record may contain. For example, a deterministic approach might place equal reliance on a match between the names on two records or a match between two birth dates.

Probabilistic approaches allow the researcher to exploit the probability that a match on particular items is more or less likely to indicate that the individuals named on two records are, in fact, the same individual. For example, birth date information is subject to errors made by a mistake on a single digit, and the number of possible birth dates is relatively small. Names, in contrast, are more likely to be recognizable even if a single error is made. Probabilistic linking allows the researcher to create an approach that weights the value of matches on such information appropriately.