The obverse of the problem of data confidentiality is the need to make basic data more accessible for reuse or reanalysis by all qualified persons or institutions. Personal data systems for statistical reporting and research are largely in the hands of institutions that wield considerable power in our society. Hence, it is essential that data which help organizations to influence social policy and behavior be readily available for independent analysis.
The ubiquitous computer has increased both the quantity of data potentially available to users and the number of potential users. Unfortunately, however, the data dissemination capability of many funding and collecting institutions has not grown commensurately. Among the general purpose statistical operations of the Federal government, the Census Bureau has led the way in making data from standard statistical series easily available to users in a form that protects the anonymity of respondents. Other agencies, notably the National Center for Health Statistics, have followed suit.7 The Department of Health, Education and Welfare is currently preparing a guidebook of its "public use" data files.8
Laudable as these efforts are, it should be emphasized that they are being made, for the most part, by agencies or offices within agencies whose primary mission is statistical reporting and research. They do not address the problem of access to the statisticalreporting and research files that operating agencies develop in the course of evaluating programs or in adding to the general knowledge of program administrators. It is true, as noted earlier, that anyone with enough money, time, and perseverance can probably gain access to substantial amounts of data not generally available for public use. Yet the individual researcher, or the independent critical expert, however perseverant, may not even know that important data exist, much less where to find them. If he does find them, and if he can afford to have them put in usable form, the documentation may not be sufficient to permit reconstruction of the conditions and suppositions under which the data were collected. An agency holding data collected under a pledge of confidentiality may not be willing to go to the trouble (or may itself not be able to afford the cost) of expunging elements that would serve to identify individual data subjects in order to make the data available.
In principle, there need be no conflict between informing the public about how the government conducts its business and protecting individual data subjects from harm. If data cannot be made available for reuse or reanalysis without disclosing the identity of data subjects, special precautions may have to be taken before making basic data accessible to qualified persons outside the collecting organization, but such precautions can be taken. For example, each data subject could be asked at the time of the initial data collection if he would consent to participate in a follow-up study, on the understanding that consent would be sought anew each time a further follow-up study is undertaken. Although such arrangements may add to the expense and difficulty of some data collections, a public institution that uses scientific approaches and methods has a duty to make the work it sponsors or supports available for critical appraisal.
Making fully documented data available for reuse and reanalysis by persons competent to assess the interpretations that have been made of them can bring two benefits. First, the knowledge that other investigators will have an early opportunity to challenge its conclusions should tend to heighten the quality of the original collection and analysis, and second, advances in the sciences may produce more powerful techniques of analysis that could make it possible to glean additional information from data in the course of re-examining them.