When we began our examination of automated record-keeping operations, we expected that we could leave out entirely data systems maintained exclusively for statistical reporting or research. We were mindful that in the mid-1960's a series of proposals4 to establish a national statistical data center had alerted the public to some of the dangers inherent in computer-based record-keeping operations. We also knew that the Freedom of Information Act contains no clear statement of Congressional intent with respect to the disclosure of individually identifiable data maintained for statistical reporting and research. We had assumed, however, that statisticalreporting and research data systems, by and large, would not contain data in personally identifiable form, and that if they did, the anonymity of individual data subjects would be protected by specific statutory safeguards. We were not prepared for the discovery that in many instances files used exclusively for statistical reporting and research do contain personally identifiable data, and that the data are often totally vulnerable to disclosure through legal process. This holds for data in Federal agency files as well as for data in the possession of State agencies and private research organizations.
Changes in social policy, which computer technology has to some extent facilitated, are in large part responsible for the existence of unprotected statistical-reporting and research files. Since the late 1950's, the Federal Government has been distributing increasingly large sums of money to the States on the basis of formulas that take account of special population characteristics. The recipient State governments, in turn, have been redistributing this money among their own political subdivisions, using grant-in-aid formulas that tend to generate new requirements for statistical data about people at nearly every level of government. Often coupled with these grants, moreover, have been planning requirements demanding highly detailed information about the populations of small geographic areas.
Program evaluation requirements, first levied on grant-in-aid recipients by Federal agencies and later explicitly written into some of the agencies' authorizing legislation, have been a further stimulus to the proliferation of statistical-reporting and research files containing data about people. From their initial emphasis on simple input accounting (how much was spent, by whom, for what purpose, on how many people, with which characteristics), evaluation studies have rapidly come to focus on measuring program effects.5 Because effects measurement usually requires before-andafter data on program participants, it has become necessary to preserve individual identities in evaluation research files. Interest in the specific events and processes that may account for changes in participant behavior over time has also grown along with interest in output measurement. Many of the factors that account for a participant's behavior are so subtle that they can only be isolated if records of people's movements and experiences are kept over an extended period.
A third factor that has enlarged the number of data files containing information about identifiable individuals is the broad support given to fundamental research in the social and biomedical sciences. In fact, files for research in these two areas may be the most numerous of all, and they exist in a variety of settings. Many such files are coming into the possession of government agencies as a consequence of contract arrangements that make agencies the proprietors of data generated in government-supported research and demonstration projects. Not all of these files contain information that identifies individual data subjects, but of those that do, the ones dealing with controversial social and political issues are particularly vulnerable to misuse in the absence of specific statutory safeguards.