Against this backdrop, our recommendations fall naturally into three categories: legal, technical, and institutional. Interestingly, in our interviews and in those reported in another study(21) we found differences of opinion about the proper set of prescriptions. One perspective is that the only way that data access will work is if there is a specific legislative mandate requiring it. Otherwise, it is argued, agencies will have no incentives to solve the many problems posed by efforts to make data more accessible. The other perspective suggests that just requiring public agencies to engage in making data available does not mean they will have the capacity or the ability to actually implement it. Rather, the priority should be on providing the tools and resources necessary to support research access to administrative data, with sparing use of statutory mandates. There seems to be some truth in both perspectives, and we make recommendations on both sides.
Two sets of legal issues seem most pressing to us:
- Develop model state legislation allowing researchers to use administrative data. Although we have some models for legislation that would help researchers gain access to data, we do not have a thoroughgoing legal analysis of what it would take to facilitate access while protecting confidentiality. We strongly suspect, for example, that such legislation must carefully distinguish research from other uses by developing a suitable definition of what is meant by research. In addition, it must describe how researchers could request data, who would decide whether they can have access, how data would be delivered to them, and how the data would be safeguarded. At the federal level, H.R. 2885, "The Statistical Efficiency Act of 1999" appears to provide an important means for improving researcher access to confidential data.
- Clarify the legal basis for research and matching with administrative data, with special attention to the role of informed consent and Institutional Review Boards--Most of the projects using administrative data have relied on "routine use" and "program purposes" clauses to obtain access to the data, but IRBs prefer to base permissions to use data on informed consent, which is typically not obtained for administrative data. These approaches are somewhat at odds, and they have already started to collide in some circumstances where IRBs have been leery of allowing researchers access to data because of the lack of informed consent. Yet informed consent may not be the best way to protect administrative data because of the difficulty of ensuring that subjects are fully informed about the benefits and risks of using these data for research. At the same time, "routine use" and "program purpose" clauses may not be the best vehicle either. Some innovative legal thinking about these issues would be useful. This thinking might provide the basis for implementing our first recommendation.
New techniques may make it easier to protect data making the data accessible to researchers:
- Develop better methods for data alteration, especially "simulated" data. Although there are differences of opinion about the usefulness of simulated data, there is general agreement that simulated data would at least help researchers get a "feel" for a data set before they go to the time and trouble of gaining access to a confidential version. It would be very useful to develop a simulated dataset for some state administrative data, then see how useful the data are for researchers and how successfully they protect confidentiality.
- Develop "thin-clients" that would allow researchers access to secure sites where research with confidential data could be conducted. Another model for protecting data is to provide access through terminals--called "thin-clients"--that are linked to special servers where confidential data reside. The linkages would provide strong password protection, and ongoing monitoring of data usage. All data would reside on the server, and the software would only allow certain kinds of analysis. As a result, agencies would have an ongoing record of who accessed what data, and they would be able to block some forms of sensitive analysis such as disclosure matching.
The primary lesson of our interviews with those doing Welfare Leavers Studies is that institutional factors can contribute enormously to the success or failure of an effort to use administrative data:
- Support agency staff who can make the case for research uses of administrative data. There is a large and growing infrastructure to protect data, but there is no corresponding effort to support staff who can make the case for research uses of administrative data. Without such staff, agencies may find it much easier to reject data requests, even when they are justified on legal and practical grounds.
- Support the creation of state data archives and data brokers who can facilitate access to administrative data. One way to get a critical mass of people who can help researchers is to develop data archives and data brokers whose job is to collect data and make the data available within the agency and to outside researchers. In our presentation of Data Access Principle 5, we described several models for what might be done to create central clearinghouses that negotiate and assist in legal and technical issues related to data access. A data archive or data warehouse stores data from multiple state agencies, departments, and divisions. In some cases, an archive matches the data and provides data requesters with match-merged files. In other cases, data archives provide a place where data from multiple agencies are stored so that data requesters can obtain the data from one source and match it for themselves. Data brokers do not actually store data from other agencies but "brokers" or "electronically mines" data from other agencies on an ad hoc or regular basis. These organizations then perform analyses on the data and report results back to the requesting agency. The data are stored only temporarily at the location of the data broker, before being returned to the providing agency or destroyed.
- Support the creation of university-based research data centers. Another model worth exploring is university-based research data centers modeled after the Census Bureau's Research Data Centers. These centers, located around the country, provide a site where researchers can use nonpublic Census data to improve the quality of census data by getting researchers to evaluate new ways to push the data to their limits. The centers are locked and secure facilities where researchers can come to work on microdata, but only after they have developed a proposal indicating how their work will help to improve the data and signed a contract promising to meet all the obligations to protect it required of Census Bureau employees. Once they have passed these hurdles, they can work with the data in the CRDC facility, but they can only remove output once it has undergone disclosure analysis from an on-site Census Bureau employee. A similar model could be developed for administrative data.
- Use contract law to provide licenses and criminal and civil law to provide penalties for misuse of data. Licensing arrangements would allow researchers to use data at their own workplace. Researchers would describe their research and justify the need for restricted data, identify those who will have access to the data, submit affidavits of nondisclosure signed by those with this access, prepare and execute a computer security plan, and sign a license agreement binding themselves to these requirements. Criminal penalties could be invoked for confidentiality violations. This model would work especially well for discouraging matching in cases where unique identifiers, but not all key identifiers, have been removed from the data.
"01.pdf" (pdf, 472.92Kb)
"02.pdf" (pdf, 395.41Kb)
"03.pdf" (pdf, 379.04Kb)
"04.pdf" (pdf, 381.73Kb)
"05.pdf" (pdf, 393.7Kb)
"06.pdf" (pdf, 415.3Kb)
"07.pdf" (pdf, 375.49Kb)
"08.pdf" (pdf, 475.21Kb)
"09.pdf" (pdf, 425.17Kb)
"10.pdf" (pdf, 424.33Kb)
"11.pdf" (pdf, 392.39Kb)
"12.pdf" (pdf, 386.39Kb)
"13.pdf" (pdf, 449.86Kb)