Key Themes: Reflections from the Child Indicators Projects. Political, Legal, and Technical Issues In Data Linking: Reflections From The Child Indicators Project


Mairéad Reidy. Ph.D.,

Senior Research Associate
Chapin Hall Center for Children
University of Chicago,
(773) 256 5174 (phone)

This short paper is based on discussions between the fourteen states participating in the ASPE Child Indicators Project. It focuses on state reflections on the political, legal, ethical, and technical challenges they face in data linking. It is not a comprehensive review of the challenges of data linking but rather focuses on those issues pertinent to participating states and discussed during the Child Indicators Technical Assistance workshops.

Sponsored by the U.S. Department of Health and Human Services (HHS) Office of the Assistant Secretary for Planning and Evaluation (ASPE), with additional support from the Administration for Children and Families (ACF) and The David and Lucile Packard Foundation, the Child Indicators project has aimed over the past 3 years to promote state efforts to develop and monitor indicators of health and well-being of children during this era of shifting policy. The fourteen participating states are Alaska, California, Delaware, Florida, Georgia, Hawaii, Maine, Maryland, Minnesota, New York, Rhode Island, Utah, Vermont, and West Virginia. Chapin Hall Center for Children provided technical assistance to grantees. Grantees typically exchanged knowledge and expertise through a series of technical assistance workshops coordinated by and held at Chapin Hall Center for Children. The workshops encouraged peer leadership and collaboration among states, and provided states with an opportunity to work with and learn from one another on areas of common interest. This short paper draws on the discussions of these meetings as well as individual consultation with states. I am grateful to participants for sharing their insights.

Purposes of Linked Data

  • The broad goal of linking or integrating administrative records among Child Indicator Initiative participants is to generate new knowledge about the prevalence and patterns of service use of children and their families.

    Sometimes data linking is necessary to satisfy federal reporting requirements, but more typically it is done to answer specific questions about outcomes or service utilization among clients. The Minnesota Department of Human Services for example has linked TANF data with Medicaid, housing, employment records and child support data both for the TANF federal report and for use in a TANF longitudinal study. They have further linked Medicaid records with SSI records to help to identify children with disabilities who are not receiving Medicaid services.

Data Sharing Across Agencies

There is a general consensus that although great progress has been made on the technological aspects of data linking and establishing a common identifier, we still have a lot of unanswered questions regarding the political and legal challenges around confidentiality.

Political Concerns with Data Sharing

  • There is a need to build bridges between state initiatives, state agencies, and communities to help promote buy-in on the importance of data sharing by all responsible agencies, and to ensure public engagement in the sharing process.
  • A clearer articulation of the benefits of data linking must be used to rally support at the agency level around data linking. When approaching agencies to ask them for data, it is essential to be up front about the benefits of data linking.
  • Even when there is buy-in on the importance of data sharing, agencies who are the owners of their own data often have not established rules for data sharing. Their decisions to share data can seem arbitrary. Sometimes it is speculated that staff at certain agencies are reluctant to share data for tracking purposes because of the increase in workload such an agreement would necessitate. Rules for data sharing would be welcomed to avoid such arbitration and speculation.

Legal Concerns with Data Sharing Across Agencies

States noted the following legal concerns in their data linking work:

  • There is considerable variation in privacy laws across states.
  • When planning new data collection, or when planning to integrate administrative data, states suggest it is important to follow the following steps:
    1. Know the requirements of confidentiality before collecting data.
    2. Use lawyers as consultants at the planning level as this can help determine how far study and data collection can go without violating laws.
    3. Examine all aspects of obtaining active or passive consent before settling on one form of obtaining consent.
  • A variety of approaches to legally share data was suggested by participants; each had their own concerns.
    1. Informed Consent
      An informed consent process could empower a family to approve the use and sharing of data related to them. Questions were raised about whether families understand the rights they are signing away. In legal circles, there are claims that an individual must understand what he or she is signing away for that act to be binding.
    2. Use of Social Security Numbers or 'Blocking' to link data
      An umbrella tracking system, based on clients' social security numbers, could allow researchers to identify when and in which services individuals enroll. However, the line where confidentiality begins and ends is blurred. Some states regard using social security numbers as identifiers to be a breach of federal law and some do not. Also, in large states, like California, fraudulent social security numbers are easily bought and sold. Tracking can also be done through "blocking," a process using a combination of individual characteristics--such as name and date of birth. This works well in some states. However, in substate jurisdictions with small populations, this kind of information could lead to disclosure of an individual's identity and thus violate confidentiality.
    3. Universal Identification Number
      Agencies could create a universal identifier based on encrypted names. This would allow tracking without breaching confidentiality. However, encryption requires high technological capacity and collaboration among agencies.

Criteria for a Common Identifier

California houses the largest Medicaid database in the country and is in the process of cleaning other data in the system to link them with it. Researchers in California have initiated an attempt to form a common identifier of clients in the system to minimize duplication and to allow tracking across systems once linking occurs. Researchers have thus identified six criteria that the common identifier has to meet:

  1. Universality. The identifier would be assigned with ease.
  2. Durability. After assigned, the identifier would have the capability to follow an individual for his entire service-use history.
  3. Non-invasive. Assigning the identifier should not violate confidentiality.
  4. Flexibility. The identifier has to be able to move through and beyond agency boundaries with ease.
  5. Uniqueness. There must be enough digits/letters within the identifier so that it is unique to the individual.
  6. Financially feasible. The whole process has to take place under tight budgets.

Technical Issues with Data Sharing Across Agencies

Though there are many obstacles in constructing linked data, participants maintained the solutions lie in creativity. Everybody involved can contribute and must remain flexible. Some important considerations to establishing databases were highlighted including:

  • It is critical to assign common identifiers at uniform periods in the clients' lifetimes (e.g., birth, immunizations, etc.).
  • It is important to distinguish between household and family data. An individual can live in multiple households and with multiple families. Relationship data, although while difficult to maintain, is the most consistent data one can keep as far as a household is concerned--a mother will always be a mother; foster parents are foster parents until their time is terminated.
  • It is essential to link data incrementally.
  • When matching, there are two potential forms of error rate. One is mismatching rate, which is defined as the rate at which matches are being made, but to the wrong people. The other is the rate at which no matches are being made at all. When data matching, there is a tendency to have one sort of error rate more than the other. Each type is inevitable; however, researchers must pay attention to why matching problems are occurring and adjust their matching techniques according to which type of error rate is most acceptable for their particular study.

Data Warehousing: Lessons Shared About the Linking Process

The Minnesota Department of Human Services in developing a data warehouse put forward the following lessons learned over the course of their work:

  • To approach warehouse development incrementally
  • When feasible, to link the data in the source systems prior to extracting data to the warehouse.
  • The source data is most reliable when it is part of the purpose of the system.
  • Similar data submitted by disparate systems often yield unreliable comparisons.