Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Privacy and Health Research

Publication Date

[Click here to download a zipped WordPerfect 5.1 version of this report]

A Report to the U.S. Secretary of Health and Human Services

OFFICE OF THE ASSISTANT SECRETARY
FOR PLANNING AND EVALUATION

From:
William W. Lowrance, Ph.D.
Consultant in Health Policy
18A, Quai du Seujet
CH-1201 Geneva
Switzerland
+41 22 738-4243 telephone
+41 22 738-4288 telefax
lowrance@iprolink.ch

Division of Data Policy
Office of Program Systems
Office of the Assistant Secretary for Planning and Evaluation
U.S. Department of Health and Human Services
200 Independence Avenue, SW
Washington, DC 20201

"

Preface

In September 1996 the U.S. Secretary of Health and Human Services, Dr. Donna E. Shalala, requested this study as background for policy decisions that her Department and American society, along with their counterparts in other countries, urgently must confront.

The study was conducted by Dr. William W. Lowrance, an external consultant, who for administrative purposes was appointed an interim government employee during the project. The project was supported by the Office of the Assistant Secretary for Planning and Evaluation.

The purposes of the study were to:

  • Identify privacy issues surrounding research on personally identifiable health data, paying special attention to the international aspects.
  • Review the ethical, legal, and general social context surrounding the privacy and confidentiality of health data.
  • Describe relevant privacy-protection practices and problems, and identify emerging issues.
  • Review the European situation, especially for American readers; and review the U.S. situation, especially for European readers.
  • Analyze, especially, the implications for the U.S. of the new European Union Data Privacy Directive and related policy and legal changes.
  • Recommend policy approaches and technical processes for ensuring that, as research proceeds to enhance the health of the public in general, the privacy of individuals is respected.

The author interviewed several hundred leaders in the U.S. and Europe, in government, academic, and private-sector research institutions; in government regulatory and public-health agencies; and in intergovernmental organizations. He also met with patient advocates, public policy experts, legal analysts, privacy advocates, and privacy commissioners. And he reviewed the relevant literature.

Everywhere, the author found deep interest in the issues, and unease about the present situation—concern that the very concept, "privacy," needs recasting; concern that health data are handled with too little respect for the people whose frailties they describe; and concern that research, disease prevention, and health care all will suffer if the current privacy, confidentiality, and security issues are not handled properly.

Executive Summary

This Report examines how society can best pursue two very important goods simultaneously: Protect individuals' privacy; and at the same time, preserve justified research access to personal health data, to gain health benefits for society.

As the fundamental nature of health care, and of health data and their uses, is changing dramatically, society must—now—examine and re-decide how much it cares about protecting health privacy. Health researchers must be certain that they are taking all reasonable measures to safeguard the data they collect and use, and to maintain the respect for privacy that is embodied in the very compact with society under which they work. And society must reformulate and update some of the rationales and criteria under which the health experience of individuals may be studied to benefit society.

Health research, compared with all the other potential avenues for intrusion, hardly threatens privacy. Many effective protections are in place. But possibilities for harm always exist. The challenge is to transpose and translate the traditional ethical and technical practices, which have served society reasonably well, to meet the contemporary demands.

Current Strains

New approaches are being taken in providing health care, which is posing new research questions and changing the setting with which much research is conducted. Much more research now is being performed on data from private-sector managed-care organizations, for instance.

Also, new approaches are being taken in research, such as elaborate computerized analysis of large multipurpose health databases. And health factors such as genetics are being explored as never before.

For many reasons, the public are rightly apprehensive about the erosion of privacy of information about their health, generally. Among other matters, the security of computerized health records and electronically-transmitted health data is not fully assured. And potentially many harms can be suffered from unwarranted disclosure.

Respect for individuals will be best served not by insisting on absolute privacy, which is unattainable in modern life anyway, but by seeking informed consent to reasonable use of health information under strictly delimited conditions; by safeguarding personal data carefully; by genuinely affording fair-information-use rights to data-subjects; and by enforcing sanctions against improper use.

The Diversity of Health Data

Health data of concern for ethics and policy, and for this Report, include not only primary medical and hospital records, but also pharmacy and laboratory data; vital records; administrative and financial data; data from surveys, clinical trials, adverse-drug-event reports, and outcomes and health-economics studies; registries organized by diseases, by treatment regimens, by demographics, or by other categories; and many other compilations. The data may be personally identified, or key-coded (pseudonymized), or fully anonymized.

Data from Research, Research on Data

Contemporary health research is generating a multitude of benefits for humankind, and the future benefits look at least as promising. As the above heading indicates, health research generates new data by observation and experiment, but also—in part because its questions are of such an "applied," practical nature—it often proceeds by analyzing data that were originally collected for another purpose. The two approaches can have different implications for privacy.

The purposes of research are many, and they overlap. Research is conducted:

  • To advance basic biomedical science
  • To know patterns of health, disease, and disability
  • To reduce public-health threats
  • To understand utilization of health care
  • To evaluate and improve practices
  • To make effective innovations
  • To analyze economic factors
  • To appraise markets.

The Report discusses these purposes, and the approaches, the character of the data, and the privacy-protection problems involved.

Identifiable---Key-Coded---Anonymized

From a privacy-protection perspective, there is a very wide distinction between personally identifiable data and truly anonymized data. But in practice the demarcation between these extremes is not sharp. Attending assiduously to where particular data lie on the spectrum between them, and especially to data that are somewhere in the middle, is a crucial protection strategy.

At present, large amounts of data lie in-between—they are not completely anonymized, but they are not readily identified, either. The power of computers to perform elaborate, powerful, rapid searches, and the pressures for access, mean that merely assigning simple pseudonyms affords little protection.

For data whose identifiability has, up to now, been only lightly obscured, greater efforts must now be made either: (a) to much more effectively remove personally identifying information, or to aggregate, and thus anonymize, the data; or (b) to seek the data-subjects' informed consent and hold the data under a suitably protective regimen if identifiability is retained.

For key-coded data—that is, data for which personal identifiers are removed and secreted but which are still potentially traceable via a matching code, held separately—a variety of measures must be taken to mask the identifiability near the source, separate and lock up the identifiers, safeguard the linking codes, and carefully manage linking-back to the data-subject when it is required.

Reasons for Retaining Identifiability

For many purposes researchers must potentially be able to trace back, even if through intermediaries, to the data-subject. Irreversible anonymization is not necessarily desirable. There are a number of important reasons why retaining personal identifiability—either openly labelled or via key-coding—may be essential:

  • To allow technical validation of reports, such as to confirm the correspondence of various data with the data-subjects, or even to verify the very existence and identity of subjects, in order to prevent scientific errors or fraud.
  • To avoid duplicate records or redundant cases, such as to be certain that two case reports are independent and not just the same case recorded in two files.
  • To facilitate internal scientific data-quality control, such as enabling working-back to original records and ancillary data.
  • To allow case follow-up if more evidence or confirmation are needed.
  • To check data-subject consent records, or to examine Institutional Review Board stipulations or opinions on a case.
  • To allow tracking of consequences after some research intervention, to be able later, if necessary, to notify the patient or physician in order to recommend reexamination or other measures in-between research and health care.
  • To ensure accurate correspondence in linking data on data-subjects, or cases, or groups, or specimens, among different files or databases, perhaps over a long period, even over decades, and possibly to follow-on to descendants.

Consent, and Irb Review

The Federal Common Rule and other laws and regulations require many protections for human subjects of research. The main social instruments are informed consent of the data- subject, and Institutional Review Board (IRB) supervision. Both of these mechanisms have served society well. But both now need to be renewed.

For formal clinical trials and some other research, informed consent is routinely sought, Institutional Review Boards supervise the research, and other protections are enforced. But for many other kinds of research, for a variety of reasons notice is not routinely given nor explicit consent sought, and indeed these may be practically impossible to seek. The policy and pragmatic questions are obvious.

Retrospective studies, such as epidemiological reviews initiated years after the medical events, pose both special research opportunities and special ethical problems. So do secondary studies in databases. How should identifiability and consent be dealt with when such reviews are undertaken? And in general when data are collected, how broad consent should be sought for perhaps unanticipatable future studies?

There is no doubt that IRBs enhance research-subject protections and provide much public reassurance. They are an integral part of biomedical research. But it is less clear that IRBs have been attending as vigorously to privacy risks as they have to physical and emotional risks. For many IRBs the workload already is heavy. Now they may well have to be asked to become more deeply engaged with the privacy and confidentiality aspects of subject protection than they have been, in database research as well as in direct experimentation, and with genetic privacy. Whether they are able and willing to do so should be assessed.

Principles

The following principles are recommended for organizations that conduct, sponsor, or regulate health research involving personally identifiable data. They can be transposed into professional guidelines, standard operating principles, regulations, or laws. Criteria and procedures should be established that are specific to the context.

  • Overall in health research, cultivate an atmosphere of respect for the privacy of the people whose health experience is being studied.
  • Collect or use personally identifiable data only if the research is worthwhile and identifiability is required for scientific reasons.
  • Urge Institutional Review Boards and other ethics review bodies to become fully engaged with the privacy, confidentiality, and security aspects of subject protection, in secondary research on data as well as in direct experimentation.
  • Respect such standard fair-use practices as announcing the existence of data collections, allowing data-subjects to review data about themselves, and the like. If for scientific reasons exceptions have to be made to normal practice, this should be discussed as part of the informed-consent process before the study starts.
  • Attend sensitively to informing data-subjects and gaining informed consent.
  • Safeguard personal identifiers as close to the point of original data collection as possible.
  • Enforce a policy of "No access to personally identifiable information" as the default—then base exceptional access on need-to-know.
  • Generally limit the cordon-of-access to personally identifiable data. Allow access for formally justified research uses and to appropriate researchers. Maintain and monitor access "audit trails."
  • Remove data-subjects' personal identifiability as thoroughly as is compatible with research needs. If key-coding, aggregating, or otherwise removing personally identifying information, do so with adequate rigor.
  • Maintain proper physical safeguards and cybersecurity measures. Periodically challenge them, to test their adequacy.
  • Develop policies on seeking or allowing secondary use of personally identifiable data, and on the associated conditions and safeguards.
  • Before either (a) transferring data to other researchers or organizations, or (b) using data for new purposes, make conscientious decisions as to whether to proceed and what the privacy protections should be. Then if proceeding, implement appropriate protections.
  • Sensitize, train, and certify all personnel who handle personally identifiable data or supervise those who do. Make data stewardship responsibilities clear. Promote internal and external accountability.

Major Current Issue Clusters

The Report identifies many problem areas. The following are four large groups of issues that, while not entirely new, are growing rapidly in scale and complexity, and must urgently be attended to:

  • Secondary uses of data, and data linking
  • Research on private-sector health data
  • Cybersecurity
  • Genetic privacy.

Issue cluster: Secondary research use of data, and data linking

Secondary use is, as it sounds, use of data subsequent to the original use. Much highly beneficial health research depends on it.

As databases are maturing and increasing in size and quality, their appeal as research resources also is growing. Thus the databases of healthcare finance systems and managed-care organizations, among others, are much in demand. The data hunger of managed care, and of national healthcare systems, is insatiable. Ultimately the public will benefit from research studying these systems themselves as systems, as well as from research that uses data in the systems for external purposes.

If it is decided that personally identifiable data must be used, then the most difficult issue is consent. The Report proposes a scheme of "Consent scenarios in secondary research" in the hope that it will attract discussion and development.

Related to secondary research is data linking, in which associations (links) are made between data on the same data-subject(s) in more than one data collection. Beyond such considerations as consent, the concern about particular linking studies usually is whether they might assemble "too much" information about data-subjects or the social groups of which they are representative, even if personal identities are not revealed, and/or whether the linking can lead to data-subjects' becoming identifiable by deduction.

Issue cluster: Research on private-sector health data

Immense volumes of personally identifiable data and lightly masked key-coded data, as well as effectively key-coded or anonymized data, are handled by managed-care organizations, pharmaceutical and related companies, and other private-sector institutions. Some State legal controls apply, as do the Privacy Act and Federal laws where there is Federal involvement.

But for many health data held in the private sector, few legal controls apply in theory or are enforced in practice regarding such matters as data-subject consent, public notification, Institutional Review Board supervision, or transfer of the data for secondary study. Effective privacy, confidentiality, and security safeguards may well be in place, but this may not be fully evident.

The status of private-sector health data deserves to be reviewed. Probably it should be brought under a uniform Federal regimen.

Issue cluster: Cybersecurity

It is not a exaggeration to say that all over the world, the protection of the confidentiality and security of health data, especially data that are stored, processed, and transferred electronically, is under review. Until the several intersecting (and perhaps conflicting) goals are clarified and these problems are resolved, the envisioned future of lifetime electronic medical databases, elaborate health-data networks, and the like, will not be realized.

These issues are very different from those surrounding the security of paper records and physical filing cabinets (although, these are involved, too). Thus the rubric, "cybersecurity," is used here to connote the new character of the problems.

For research, how are various consents and differential access conditions to be trailed along with various data as the data are moved around, combined with other data, linked to other data, split apart and reassorted, and processed by different users for different purposes?

Issue cluster: Genetic privacy

As the newsmedia are constantly reminding us, the world has entered an entirely new era in genetics: The human genome is being mapped, incredibly sensitive and precise genetic tests have been developed, genetic screening has become commonplace, and an almost incredible array of genetic interventions is being explored. As an area of medicine and public-health practice, so much of the new genetics work is so innovative that for many purposes it must be considered "research."

What is changing rapidly is that we are becoming able to identify genetic factors that increase disease risk but are not uniquely the determinants of disease.

Ethical and policy solutions are being sought that will protect against using genetic data prejudicially against people's interests, such as eligibility for employment, financial credit, or health or life insurance.

Research on stored tissue samples, such as blood samples, biopsied tumor or other pathology materials, semen, and other human tissues that contain nucleated cells involves special questions. Identifiability, consent, and disclosure are the core issues.

Developing ethical guidance over genetic privacy is crucial to the future of both genetic research and applied genetics. Because genetic science is becoming more deeply integrated with other kinds of biomedical knowledge, genetic ethics must be integrated with basic biomedical ethics and not developed entirely separately.

The International Flow of Data

Personally identifiable health-research data are exchanged internationally every day, by governments, pharmaceutical firms, and others, and this will inevitably increase. Data on Americans are transferred, and American-based institutions do much transferring. Uniform international standards for protecting privacy, confidentiality, and security urgently must be developed.

New Laws in Europe

In October 1995 the European Union (E.U.) adopted a "Directive on the Protection of Individuals with regard to the Processing of Personal Data and on the Free Movement of Such Data." By October 1998 all fifteen E.U. Member States must bring their national laws into congruence with the Directive.

In February 1997 the Council of Europe adopted a "Recommendation on the Protection of Medical Data," the principles of which the 39 Members (which includes all the E.U. countries) are urged to transpose into their national laws.

Thus the European countries are revising both their general privacy laws and their laws covering health data. The resulting changes in European laws, regulations, codes, guidelines, and practices will have important implications for international health research, and for movement of data from Europe to the U.S. and other non-European countries.

New Health Insurance Law in the u.s.

A new "Health Insurance Portability and Accountability Act," which became law in August 1996, established several provisions relating to confidentiality of medical records as they are handled in health insurance, billing and payment data, and the like. How these are worked out will have implications for how data are accessed and processed in health research.

Proposed New u.s. Laws

Versions of an omnibus "Medical Records Confidentiality Act" are being considered by the U.S. Congress, as is a "Genetic Confidentiality and Discrimination Act." Some States are revising their medical-privacy laws covering information on mental health, HIV–AIDS status, or genetics. All of these will have implications for research.

Dialogue between the u.s. And Europe

For the U.S., it will be very important over the next few years to engage in high-level, broadly based dialogue with European leaders over the implementation of the E.U. Directive and the Council of Europe Recommendation. Discussions will have to be held with national governments and with intergovernmental organizations. Health care and health research must be addressed specifically; they simply cannot be dealt with in the same way as banking, credit, tax, education, transport, or criminal data. Private-sector organizations involved with health research should participate fully. So should regulatory agencies that require international transfer of health data.

Focal issues regarding health research will be:

  • Specifics in the implementation of the E.U. Directive by the Member States, and pan-E.U. decisions taken by the E.U. Working Party, the Commission, and the European Parliament.
  • Especially, the determination of "adequacy" of conditions for transfer of data from the E.U. to the U.S. and elsewhere outside the E.U.
  • The adoption by Members of the Council of Europe of the "Recommendation on Protection of Medical Data" and its implications for practice.
  • Recognition of special needs in health research (such as the need to take ethnic and sexual factors into account, the need to accommodate secondary studies in databases, the need to retain data for a long time, and the like). 
  • Recognition of the special requirements already established in government regulation of research, development, and postmarketing study of pharmaceuticals, biological products, diagnostics, and medical devices.
  • Recognition of the need to harmonize with the forthcoming E.U. Clinical Practice Guidelines (now in draft) and other international research guidelines.
  • Emphasis on the need for uniform criteria and standards that will foster the international flow of health data.

In all of this, the U.S. government and other American organizations should not only be asking for concessions and exemptions, but also taking the opportunity of this period of reform to improve the ways they themselves handle these matters, and exerting international leadership.

1. Coupled Societal Goods: Privacy and Research

At issue right now—as health care is rapidly becoming industrialized, collectivized, and computerized—is to what extent society will preserve the cherished tradition of patient–healthcare provider confidentiality with its many implications, and the related relationships of trust with those who perform health research.

Also at issue is whether the public's current apprehensiveness about invasions of privacy by various forces will result in "backlash" legal restrictions that will jeopardize aspects of health research, ultimately to society's detriment.

The policy and technical challenges are to devise improved ways for preserving individuals' informational privacy, while at the same time preserving justified research access to personal data in order to gain health benefits for society. Much is at stake.

Current Legislative Attention

Several important legislative events are focusing attention on the issues.

  • In October 1995 the European Union (E.U.) adopted a broad "Directive on the Protection of Individuals with Regard to the Processing of Personal Data and on the Free Movement of Such Data," the principles of which the E.U. Member States must embrace in their national laws by October 1998. This Directive applies to health data as well as to many other kinds of data.
  • In February 1997 the Committee of Ministers of the Council of Europe adopted a formal, specific "Recommendation on the Protection of Medical Data" which is expected to influence practices throughout Europe.
  • In August 1996 the U.S. adopted the "Health Insurance Portability and Accountability Act," which among other things established provisions relating to confidentiality of medical data as they are handled in health insurance, billing, and payment, which will have implications for research and will set precedent.
  • An omnibus "Medical Records Confidentiality Act," a "Medical Privacy in the Age of New Technologies Act," and a "Fair Health Information Practices Act" are being considered by the U.S. Congress, as are a "Genetic Confidentiality and Discrimination Act" and other genetic privacy bills.
  • Many U.S. States have adopted, or are considering adopting, specialized laws on confidentiality of genetic, or mental health, or vaccination, or HIV–AIDS data.
  • The Organization for Economic Cooperation and Development, the Canadian and other governments, and many nongovernmental organizations are developing policies or standards on encryption and electronic transmission of health data, on genetic privacy, and on medical confidentiality in general.

But even if all these formal activities weren't occurring, now would be a propitious time to review the issues. Indeed, for reasons that will be made evident below, now is almost too late.

As the fundamental nature of health care, and of health data and their uses, is changing dramatically, society must—now—examine and re-decide how much it cares about protecting health privacy. Health researchers must be certain that they are taking all reasonable measures to safeguard the data they collect and use, and to maintain the respect for privacy that is embodied in the very compact with society under which they work. And society must reformulate and update some of the rationales and criteria under which the health experience of individuals may be studied to benefit society.

Heightened Public Concern

The public are rightly concerned about the erosion of privacy of information about health, for at least the following reasons taken together.

  • There has been a rapid increase in the amount, diversity, and intimacy of health-related data recorded.
  • Computerization of health data storage, manipulation, linking, searching, transfer, and other processing continues to increase. Its progress is inevitable. This is bringing higher vulnerability to both accidental and intentional disclosure of sensitive data, and to misuse and abuse.
  • The scale of health-related databases, and of prescribing and billing records, has increased beyond all precedent, as has the interlinking and transferring of data among different, and different kinds of, databases, often at great distances, often internationally.
  • Increasingly, health care is being provided by large, complex institutional and commercial systems—managed care in the U.S., and versions of it, or nationalized health care (which, in a way, is managed care taken to extreme) in many other countries. This is bringing much more auditing, analysis, and critical evaluation of healthcare practice and economics data, and increasing commerce in medical data per se.
  • The number and diversity of parties seeking access to health data continue to increase.
  • New kinds of research are being conducted, such as detailed human genome mapping, and elaborate computerized studies in large multipurpose health databases.
  • Individuals feel a general loss of control over their privacy, both health-related and in most areas of life.
  • There continue to be intrusions into health data by employers, schools, insurors, courts, newsmedia, and various snoops. This can be personally offensive, and it can harm individuals' employability, access to insurance or financial loans, personal relations, and community standing.

For the general public all of this has induced a cynical resignation, with undertones of resentment. There seem to be relatively few complaints about privacy intrusions by research. But important research access to data may well suffer, largely for wrong reasons. And research programs do probe people's bodies and lives in intimate ways, record and analyze data that people feel sensitive about, and move data around.

Privacy, Confidentiality, Security

Privacy is a deeply felt but elusive concept. Everyone is sensitive to having his privacy violated. The concepts of "personal matters" and "intimate knowledge" are familiar, as is the notion that individuals live in a "private sphere" over which they are to be granted autonomy. The right to private life was proclaimed in the Universal Declaration of Human Rights and has been reaffirmed in every other human rights declaration since 1945.

But defining privacy in a way that is applicable to all persons and situations is impossible. Everyone believes that some, indeed many, core aspects of his life "are nobody else's business." Yet what one person is fiercely secretive about, another may openly reveal.

Privacy is not an ersatz notion, just an elusive and relative one. It is a concept difficult to formalize. Philosophically it tends to be derived from, or gain force by being associated with, other societal goods, such as freedom of self-determination.1

Informational privacy is not explicitly protected by the U.S. Constitution. Nonetheless, many aspects of personal life that can be considered "private" are protected under a patchwork of Federal and State laws, and by interpretations derived from such Constitutional principles as due process or restriction on unreasonable searches and seizures. Obligations to respect confidentiality of shared information are standard elements in the law of contracts. Some U.S. Federal agencies' statutes, such as those governing the scientific work of the National Center for Health Statistics, set firm constraints on the redisclosure of personally identifiable data. So do the State laws on confidentiality of medical records.

One of the few widely cited legal expressions in this area is that of Louis Brandeis and Samuel Warren in 1890, who, themselves quoting an authority on tort law, defended the privacy "right to be let alone."2 Yet that doesn't carry much compulsion in the modern world (if indeed it did in the good jurists' era).

In his 1967 book, Privacy and Freedom, Alan Westin defined informational privacy as meaning "the claim of individuals, groups or institutions to determine for themselves when, how and to what extent information about them is communicated to others.""Privacy," according to Lawrence Gostin, "is the right of individuals to limit access by others to some aspect of their persons."4 The U.S. National Information Infrastructure Task Force, in 1995, formulated it this way:5

Information privacy is an individual's claim to control the terms under which personal information—information identifiable to an individual—is acquired, disclosed, and used.

Obviously, privacy is a highly relative matter—relative to personal and societal values, and relative to the context.

Obviously too, in the contemporary world it is easy for people, even at great remove, to know things about others without the subjects being aware of the knowing, which adds much more difficulty to the definitional problem.

Privacy can be demanded, and sometimes obedience to that demand can be compelled. But privacy, at essence, is something that we grant to others out of basic human respect.

Privacy and confidentiality are related to each other but are not identical notions. Privacy is much broader and is closer to moral fundamentals. Alan Westin, again, made a useful distinction:6

Privacy is the question of what personal information should be collected or stored at all for a given function. It involves issues concerning the legitimacy and legality of organizational demands for disclosures from individuals and groups, and setting of balances between the individual's control over the disclosure of personal information and the needs of society for the data on which to base decisions about individual situations and formulate public policies.

Confidentiality is the question of how personal data collected for approved social purposes shall be held and used by the organization that originally collected it, what other secondary or further uses may be made of it, and when consent by the individual will be required for such uses. It is to further the patient's willing disclosure of confidential information to doctors that the law of privileged communications developed.

Such distinctions are implied in the opening sentence of the "Information Practices" form that is discussed with patients entering the hospital at the U.S. National Institutes of Health: "We, here at the Clinical Center, strive to provide privacy for all our patients and to maintain the confidentiality of the sensitive personal information they share during the course of treatment."7

The U.S. Office for Protection from Research Risks asserts that "Confidentiality pertains to the treatment of information that an individual has disclosed in a relationship of trust and with the expectation that it will not be divulged to others in ways that are inconsistent with the understanding of the original disclosure without permission."8

Relating to privacy and confidentiality is "security." In a disturbing, constructive recent report on protection of computerized health records, a panel of the National Research Council construed it this way:9

Security consists of a number of measures that organizations implement to protect information and systems. It includes efforts not only to maintain the confidentiality of information, but also to ensure the integrity and availability of that information and the information systems used to access it.

As Alan Westin put it, "Security of data involves an organization's ability to keep its promises of confidentiality."10 Willis Ware once combined the three terms in one sentence: "If the security safeguards in an automated system fail or are penetrated, a breach of confidentiality can occur and the privacy of data subjects be invaded."11

Often issues are cast as "fair information practice" rather than as "privacy or confidentiality protection," to acknowledge that privacy is relative, not absolute; to convey the expectation that in complex modern societies most data will be put to multiple uses; and to imply the weighing-off of different interests, under considerations of fairness.

Fair information practices that are invoked include:12

  • Being open about the existence and purposes of data collections
  • Allowing individuals to inspect data about themselves and request corrections or amendments
  • Following lawful and proper procedures when collecting data
  • Only collecting or keeping data that are relevant, correct, and timely
  • Limiting uses of data
  • Limiting disclosures of data
  • Protecting data against unauthorized access, use, alteration, and destruction
  • Maintaining accountability of the data holders.

(1) Philosophical and ethical sources on privacy include Ferdinand David Schoeman, editor, Philosophical Dimensions of Privacy: An Anthology (Cambridge University Press, Cambridge, 1984); and David H. Flaherty,Protecting Privacy in Surveillance Societies (University of North Carolina Press, Chapel Hill, 1989).

(2) Louis D. Brandeis and Samuel D. Warren, "The right to privacy," 4 Harvard Law Review 193–197 (1890), quoting from Thomas Cooley'sTreatise on the Law of Torts of 1878. The authors were addressing the new privacy threat from unannounced photography.

(3) Alan F. Westin, Privacy and Freedom, p. 7 (Atheneum, New York, 1967).

(4) Lawrence O. Gostin, p. 454 of "Health information privacy," Cornell Law Review 80, 451–528 (1995).

(5) U.S. Information Infrastructure Task Force, Privacy Working Group, Information Policy Committee, "Privacy and the National Information Infrastructure: Principles for providing and using personal information," § I.A.2 (National Telecommunications and Information Administration, U.S. Department of Commerce, Washington, DC, June 6, 1995). Available on the Internet at <http://www.iitf.nist.gov/ipc/ipc-pubs/niiprivprin_final.html &gt;.

(6) Alan F. Westin, Computers, Health Records, and Citizen Rights, National Bureau of Standards Monograph 157, p. 6 (U.S. Government Printing Office, Washington, DC, 1976).

(7) Form NIH-2753 (10-94).

(8) U.S. National Institutes of Health, Office for Protection from Research Risks, Protecting Human Research Subjects: Institutional Review Board Guidebook, p. 3-27 (U.S. National Institutes of Health, Bethesda, Maryland, 1993 with later addenda).

(9) National Research Council, Committee on Maintaining Privacy and Security in Health Care Applications of the National Information Infrastructure, Computer Science and Telecommunications Board, For the Record: Protecting Electronic Health Information, p. 1-1 (National Academy Press, Washington, DC, March 1997).

(10) Alan F. Westin, as cited in endnote (6).

(11) Willis Ware, "Lessons for the future: Privacy dimensions of medical record keeping," Proceedings, Conference on Health Records: Social Needs and Personal Privacy, sponsored by the Department of Health and Human Services, p. 44 (U.S. Government Printing Office, Washington, DC, 1993).

(12) An early formulation of such principles as a list was U.S. Department of Health, Education, and Welfare, Secretary's Advisory Committee on Automated Personal Data Systems, Records, Computers, and the Rights of Citizens (U.S. Government Printing Office, Washington, DC, 1973). For history and commentary on fair information practices see U.S. Congress, House of Representatives, Committee on Government Operations, "Health Security Act Report," H.R. Report No. 103–601, pp. 81–82 (1994), in which the hand of Robert M. Gellman is clearly discernable.

Privacy and Confidentiality in Health Care(13)

An inevitable logical starting-point is the hallowed medical privacy tradition dating back at least as far as Hippocrates—but one doesn't have to be cynical to surmise that even Dr. H's own receptionist may have gossiped about patients' foibles and maladies.... The precept of nondisclosure is an ideal. But it has been, and should still be, central to the patient–physician relationship, and to the similar relationship with nurses, pharmacists, health social workers, and other care-providers.

The assurance that revelations made within the healthcare relationship will be held confidential encourages people to seek care in the first place, and then to be open in the exchanges involved—divulging information truthfully, asking questions even though doing so may be awkward or embarrassing, cooperating with procedures, and generally nurturing mutual confidence in the relationship. This is essential to effective health care, including public-health surveys and many other activities beyond primary care.14

Thus there is the expectation, embodied in most medical licensing laws and in professional codes, that medical care is delivered within a "medical circle" supervised by physicians and performed within accredited clinics and other institutions. Nurses, pharmacists, physical therapists, laboratory technicians, orderlies, data clerks, and the rest of the "healthcare team" are bound by licensing, ethical obligations, and/or their employment contracts, to respect patients' privacy.

Given the unlikelihood of strict supervision and enforcement within complex, bustling healthcare organizations, institutional "cultures" that emphasize respectful ethical practice are at least as important for patients' privacy as legal rules are.

As will be mentioned repeatedly in this Report, a major problem is that today physicians' span of control simply does not extend to follow or protect data as they are examined by all the different parties who claim rights to access. New responsibilities and liabilities need to be delineated.


(13) An excellent general review is Lawrence O. Gostin, as cited in endnote (4).

(14) During the course of this study the author was dismayed at the number of people, encountered in passing, who mentioned that they have stopped going to their gynecologists, for instance, or mistrust screening or counseling programs, or are reluctant to ask reimbursement for health care, because they "know" that medical confidences will not be respected, or because they fear negative discrimination. Sad to say, their apprehensions may be justified.

Privacy and Confidentiality in Health Research(15)

The ethos surrounding research on humans was recast and codified after World War II, as the world coped with the revelation of the medical atrocities perpetrated by the Nazis. The resulting "Nuremberg Code"—the opening sentence of which was, "The voluntary consent of the human subject is absolutely essential"—established principles having to do with the purposes of the research, gauging of risk and benefit to the subject, qualifications of researchers, and subject rights generally.16 Consent is central in all privacy negotiations.

Initially in 1964, then through subsequent revisions, these ethical concepts were developed and disseminated much further by the World Medical Association's "Declaration of Helsinki: Recommendations Guiding Medical Doctors in Biomedical Research Involving Human Subjects."17 The Declaration's sixth principle is: "Every precaution should be taken to respect the privacy of the subject...." Over the years a number of groups have firmed-up the philosophical foundations and guided the application of the Helsinki principles.

In the U.S. one of the most influential inquiries was the 1979 "Belmont Report" of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. It probed the (often, soft) distinctions between routine healthcare practice and medical experiment. Then it crystallized three principles of subject protection, and at length discussed their application to various research situations:18

  • Respect for persons (treating subjects as autonomous agents, and giving special protection to subjects whose autonomy is reduced)
  • Beneficence (maximizing benefits to subjects and minimizing harm)
  • Justice (distributing benefits and burdens of research fairly).

These Belmont principles have been elaborated upon in many settings, and they serve as guides to researchers and Institutional Review Boards. Explicitly and implicitly, they have been widely applied to privacy and confidentiality decisions.

In the early 1970s such prophets as Alan Westin raised the alarm about erosion of privacy as the world moved into the computer age.19 Partly out of concern about computerized (then also called "automated") data systems, in 1974 the U.S. passed the landmark "Privacy Act," covering personally identifiable data held by the Federal government (discussed on page 59).

A Privacy Protection Study Commission, which had been created by the Privacy Act, in 1977 issued a sweeping report on the way "records mediate relationships between individuals and organizations and thus affect an individual more easily, more broadly, and often more unfairly than was possible in the past."20 It covered most government and commercial activities.

With respect to medical data the Commission's conclusions and predictions were absolutely correct. It noted the rapid broadening of the scope of data covered, and the decrease in data control by medical practitioners. Regarding secondary use of data, and consent, the Commission was prescient:

It appears that the importance of medical-record information to those outside ofthe medical-care relationship, and their demands for access to it, will continue togrow. ... There appears to be no natural limit to the potential uses of medical-record information for purposes quite different from those for which it was originally collected.

Moreover:

As third parties press their demands for access to medical-record information, the concept of consent to its disclosure, freely given by the individual to whom the information pertains, has less and less meaning.

The Commission emphasized three privacy-policy objectives:21

  • Minimizing intrusiveness
  • Maximizing fairness
  • Legitimizing expectations of confidentiality.

That was in 1977. The broad public agreement with the Commission's findings was not—still has not been—matched by legislation to attend to the problems.

A very important step for Europe was the passing by the Council of Europe, in 1981, of a carefully worked out "Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data," the principles of which have become practice in most of the Member States (discussed on page 54).

Various bodies have examined the issues since then, especially as the U.S. debated healthcare reform and dashed/stumbled toward "managed care" in the early 1990s.22


(15) Classic sources are Robert J. Levine, Ethics and Regulation of Clinical Research, Second Edition (Urban and Schwarzenberg, Baltimore, 1986); and Tom L. Beauchamp and James F. Childress, Principles of Biomedical Ethics, Third Edition (Oxford University Press, New York and Oxford, 1989).

(16) Trials of War Criminals before the Nuremberg Military Tribunals under Control Council Law No. 10, volume 2, pp. 181–182 (U.S. Government Printing Office, Washington, DC, 1949).

(17) World Medical Association, "Declaration of Helsinki," latest revision 1989, available on the Internet at <http://www.ncgr.org/gpi/odyssey/heldec.html &gt;.

(18) U.S. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, "The Belmont Report: Ethical Principles for the Protection of Human Subjects," DHEW Publication No. (OS) 78-0012, with Appendices (OS) 78-0013 and (OS) 78-0014 (U.S. Government Printing Office, Washington, DC, 1978); the Report also was published in Federal Register 44, 23192–23197 (1979).

(19) Alan F. Westin, as cited in endnote (6).

(20) U.S. Privacy Protection Study Commission, Personal Privacy in an Information Society, pp. 290–291 (U.S. Government Printing Office, Washington, DC, July 1977).

(21) Ibid., pp. 15–21.

(22) U.S. Congress, Office of Technology Assessment, Protecting Privacy in Computerized Medical Information, Report No. OTA-TCT-576 (U.S. Government Printing Office, Washington, DC, September 1993).

Potential Harms from Wrongful Disclosure

Wrongful disclosure of confidential health data may occur either through carelessness— through gossip in a clinic, for instance, or lazy discarding of clinical records—or through deliberate transgression, either by someone associated with the data-holder or by an outsider.

Harm may be inflicted through the very fact of disclosure—that is, simply through other people's coming to know things that the data-subject, and presumably the entrusted data custodian, expected to be kept confidential. The subject may feel embarrassed, vulnerable, or otherwise violated, as well as feel betrayed by the data-holder, and personal or other relationships may suffer.

Or, harm may be incurred if discrimination is brought against the interests of the subject (in employment hiring or promotion, access to health or life insurance, access to housing, qualifying for a loan, exposure in legal proceedings, etc.) based on wrongfully disclosed information.

Abuses may be personally offensive and harmful, though not necessarily illegal; or they may be clearly illegal (such as blackmail).

Commentators usually surmise that threats to health data are more likely to be perpetrated from inside the data-holding organization, through curiosity, nosiness, mischief, or malice, than from outside. This is especially a vulnerability of computerized systems having many nodes and only weak security controls. Outside attackers of health data range from computer pranksters, to business competitors, to private detectives pursuing evidence of unfitness in divorce or child- custody cases, to journalists probing the lives of celebrities or other public figures.

An important issue for policy is whether to focus controls and sanctions on protection of confidentiality per se (i.e., protecting against unwarranted disclosure in-and-of-itself), or, on punishing inflictions of harm that occur because data are used improperly; or both. (The author believes it should be, both.)

Some Ancillary Privacy-Rights Claims

Privacy has variously been cited as a rationale to cover a great many situations.23 Here we can just note a few potentially relevant for health research. Among data-subject rights claimed have been:

  • Right to know information about oneself
  • Right to know what it is that others know (as is granted in most data-protection and fair-use laws)
  • Right not to know (such as medical diagnoses, prognoses, genetics)
  • Right to prevent others, even family members or partners, from knowing (which is very controversial in public-health ethics regarding infectious diseases and burdensome illnesses)
  • Right to continuation of informational privacy even after death (such as regarding the cause of death itself)
  • Right to object to use of data derived from oneself, even if it is anonymized
  • Right to insist that data on oneself not be irreversibly anonymized (so as, for instance, to be able to learn of the findings).

Each such claim has to be judged in its context.


(23) David H. Flaherty discusses these listed and more, in Protecting Privacy in Surveillance Societies (University of North Carolina Press, Chapel Hill, 1989).

Weighing Privacy against Research Need

This study takes it as given that because members of society benefit greatly from health research, research—if it is for justifiable purposes, and is conducted with proper protection of subjects—must continue to be allowed controlled access to individuals' health data.

In the compact between health researchers and the public, good-faith respect for privacy, and therefore stewardship of confidentiality, is necessary and expected. (24) The idea of a compact pervades the "Guidelines for the Conduct of Research Involving Human Subjects at the National Institutes of Health," for instance, which are prefaced by the admonition:25

Society has granted a conditional privilege to perform research on human beings. The condition is that it must be conducted in a way that puts the rights and welfare of human subjects first.

In a similar spirit the "Nondisclosure Statement" that all employees of the U.S. National Center for Health Statistics are required to sign declares:26

The success of the Center's operations depends upon the voluntary cooperation of States, of establishments, and of individuals who provide the information required by Center programs under an assurance that such information will be kept confidential and be used only for statistical purposes.

The challenge is to devise criteria, standards, laws, regulations, systems, and professional practices for controlling physical and cyber access to data, managing personal identifiability, and securing informed consent—while, at the same time, facilitating justified research access.

Yes, what is taken to be "justified" is itself a crucial issue. Privacy demands can hardly be judged in isolation. Balance must be sought between the good of privacy and the good of contributing to the improvement of society's health through research.

This "selfishness example" makes the fundamental point more clearly than many tomes of social philosophy:27

Doctor: Here; this medication will help your condition.
Patient: How do you know?
Doctor: A study of 10,000 people's experience showed that it helped 9,247 of them get better. 
Patient: Good, I'll take it. But don't let anybody know whether I get better.

Patient-advocacy organizations should be urged to express publicly the willingness of the patients they represent to have themselves or their data studied in research, and in what kinds of research, and under what conditions. Some patient organizations have done this, and some have helped recruit volunteers to studies. Other groups, such as womens' groups and organizations concerned with genetic conditions, have done the same, as has the U.S. Indian Health Service for its constituents.

Relevant basic legal logic was enunciated in 1980 by the U.S. Third Circuit Court of Appeals:28

The factors which should be considered in deciding whether an intrusion into an individual's privacy is justified are the type of record requested, the information it does or might contain, the potential for harm in any subsequent nonconsensual disclosure, the injury from disclosure to the relationship in which the record was generated, the adequacy of safeguards to prevent unauthorized disclosure, the degree of need for access, and whether there is an express statutory mandate, articulated public policy or other recognizable public interest militating toward access.

Similar logic must be applied in balancing individuals' privacy against the potential benefit to society of insights derived by studying individuals' health experience.

Respect for persons will best be served not by insisting on absolute privacy, which is unattainable in modern life anyway, but by seeking informed consent to reasonable use of health information under strictly delimited conditions; by safeguarding personal data carefully; by genuinely affording fair-information-use rights to data-subjects; and by enforcing sanctions against improper use.


(24) The notion of science's compact with society was developed in William W. Lowrance, Modern Science and Human Values, pp. 78–89 (Oxford University Press, New York and Oxford, 1985).

(25) U.S. National Institutes of Health, Office of Human Subjects Research, preface to "Guidelines" brochure (1995).

(26) NCHS Staff Manual on Confidentiality, p. 5.

(27) John P. Fanning, personal communication.

(28) United States v. Westinghouse Electric Corp., 638 F.2d 578 (Third Circuit, 1980).

2. Health Data and Data Holders

Several current changes in the context within which health data are collected and used must be recognized. First, the boundaries between classical medical care and "public health" are becoming ever less distinct. Over the past decades the rubric, "health," has been broadened to include many matters—from hyperactivity in children, to teenagers' nose shape, to memory loss associated with aging—that earlier were not viewed as matters of health, much less of medicine. At the same time, medical science has come to accord much more importance to such ordinary life factors as diet and stress as determinants of health, and therefore addresses them in medical care.29

This Report covers the whole range, and where distinctions are not sharp it refers to "health," which, after all, is the end of medicine. "Health data" includes all data collected under physicians' supervision, but also a wide range of other data that relate to health.

An expansive definition such as Lawrence Gostin's is necessary:30

The term "health data" is broadly defined as all records that contain information that describes a person's prior, current, or future health status, including [cause of disease], diagnosis, prognosis, or treatment, or methods of reimbursement for health services.

To quote an example from a statute, the newly enacted "U.S. Health Insurance Portability and Accountability Act" reaches broadly, as it must (§1171(4)):

The term "health information" means any information, whether oral or recorded in any form or medium, that—(A) is created or received by a health care provider, health plan, public health authority, employer, life ensurer, school or university, or health care clearinghouse; and (B) relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual.

Second, of course, health care has been evolving into systems, andsystems of systems. As a consequence, the traditional clinician's notes, scribbled down or dictated and later transcribed, and then locked up in filing cabinets, increasingly are being recorded along with other files in electronic media, usually networked.31

And third, the trend clearly is toward not only recording health information in computerized form, but indeed basing health care around the "lifetime linked-data dossier" on the person. Many advantages are evident for assembling health data from disparate sources, understanding the person's life-and-health trajectory, providing health-promotion input and health care, transmitting orders and analyses, and networking and consulting at distances. Many advantages are evident for billing and paying, administrative review, and research. And as well, there can be many advantages for the patient's own awareness and documentation of his health "story."

Public-health records are being computerized just as quickly. In the future envisioned by seers, many aspects of public-health surveillance (such as scanning for infectious disease outbreaks), compilation of statistics (use of hospital outpatient services...), development of registries (vaccination...), and other analytic collections (effects of pharmaceuticals...) will simply be derived, whenever and in whatever form needed, from the networked lifetime dossiers.

These visionary technical developments, which are very exciting but not without negative aspects, are being explored diligently by many institutions.32,33 A potential vulnerability, even the "Achilles heel," of this movement is whether it will be able to deal adequately with privacy, confidentiality, and security.


(29) Kerr L. White, Healing the Schism: Epidemiology, Medicine, and the Public's Health (Springer-Verlag, New York, Berlin, and Heidelberg, 1991).

(30) Lawrence O. Gostin, as cited in endnote (4).

(31) One appreciates the remark, the validity of which is now fading, by the distinguished physician Sir Douglas Black (personal communication): "The only privacy protection the poor patient has left is the doctor's bad handwriting."

(32) Institute of Medicine, Richard S. Dick and Elaine B. Steen, editors, The Computer-based Patient Record: An Essential Technology for Health Care (National Academy Press, Washington, DC, 1991). A distinctive aspect of this report is its promotion of computer-based medical records, beyond merely computerized versions of traditional records. (A Revised Edition is forthcoming, 1997.)

(33) Institute of Medicine, Committee on Evaluating Clinical Applications of Telemedicine; Marilyn J. Field, editor,Telemedicine: A Guide to Assessing Telecommunications in Health Care (National Academy Press, Washington, DC, 1996).

"Data" Vocabulary

Although definitions need not be belabored here, a few concepts and items of vocabulary are necessary.

Data is taken to mean discrete bits of information. As one dictionary has it: "Data are facts or figures from which conclusions may be inferred." For most research now, data are converted into numerical form for processing by computers.

Data-subjects are the people about whom data are collected.

Databases are collections of data, recorded in standardized fashion, ordered for reference or research purposes.

Database research, then, is research that analyzes data in such collections.

Information is data set within a context of meaning. Raw data (such as lists of numbers that stand for blood-enzyme concentrations, or units on a mental-depression scale) make no "sense" as facts unless the measurement method and descriptive scale are known. And before any scientific meaning can be inferred, the data must be tied with data on other characteristics of the data-subjects and the circumstances.

Personally identifiable data are data that are associated with real persons, or that can be associated with real persons by deduction from descriptors such as birthdate, physical characteristics, occupation, residential location, social identification number, or history. Synonyms are "personal data" and "individually identifiable data." Often for brevity the descriptors, such as the person's name, that associate the data with a real person are referred to just as "identifiers."

Processing or handling of data, in an ethical or legal sense, may refer to recording, storing, retrieving, duplicating, transferring, destroying—in effect, any action through which someone may become cognizant of, or move, or alter, data.34 Verb lists of this kind are unavoidable; privacy of the data-subject can be affected by any such operations.


(34) The European Union Data Privacy Directive (95/46/EC) at Article 2(b) defines "processing" as being "any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction."

The Universe of Health Data

So many kinds of health data are collected that it would be distracting and soporific to do more here than take note of the major categories. But it is essential to recognize: (a) that great research power resides in a diversity of health data, and (b) that privacy issues surround many kinds of data beyond those in primary medical records.

Health data include:

  • Primary medical, hospital, and clinic data (including various managed-care data)
  • Prescribing, pharmacy, clinical laboratory, and imaging data (x-ray, magnetic resonance, sonagram...)
  • Administrative and financial data (billing, payment, insurance, audit...)
  • Vital records (birth, adoption, death...)
  • Exposure registries (asbestos, x-rays, childrens' lead...)
  • Disease registries (melanoma, tuberculosis, burn, congenital malformation...)
  • Other monitoring and surveillance registries (drinking water fluoridation, infant nutrition, hearing conservation...)
  • Genetic data registries (pedigree analyses, screening, gene maps...)
  • Intervention registries (vaccination, cardiac pacemaker...)
  • Military health and hazard-exposure data (Agent Orange, artillery noise...)
  • Occupational health and hazard-exposure data (coal dust...)
  • Incident-, accident-, and disaster-exposure data (Love Canal, Three Mile Island, Bhopal...)
  • Tissue samples (blood, semen, ova, pathology...), with associated data
  • Surveys of attitudes and practices (diet, alcohol consumption, dental hygiene, condom use...)
  • Clinical-trial and other experimental data
  • Regulatory data (in the Food and Drug Administration, in city and State health departments...).

All kinds of data may reveal intimate information. Prescription data, for instance, often indicate the disease, or at least the kind of disease, being treated. Blood-type holds implications about parentage. Just the very fact that a person has entered into a relationship with a psychotherapist, or a drug-abuse treatment center—as revealed, say, by billing records or clinic appointment logs—can be held against the person by employers or others.

Further, besides carrying technical observations relating to the main purpose of an encounter between a person and a healthcare or research system, records may contain subjective remarks on general health or lifestyle ("coughs a lot, probably heavy smoker"), incidental observations ("child has numerous bruises and small burn scars on back" or "spouse opposed to surgery"), or speculations ("taking anabolic steroids?"or "bulimic?").

Especially-Sensitive Data

Obviously some kinds of data are felt by data-subjects or the public in general to be especially sensitive. A commonly cited example is that HIV–AIDS data are much more sensitive than, say, data about wrist fracture. Whether sensitivity is somehow justified will always be debatable within the context. But for purposes of ethical practices, policy, and law, widely held public concerns must be recognized and respected appropriately.

Among the categories often taken to be highly sensitive are data about:

  • Mental health
  • Aberrant behavior (child battering...)
  • Alcohol or other chemical habituation
  • Reproduction (infertility, pregnancy, ova and sperm donation, spontaneous or elective abortion...)
  • Embarrassing problems (sexual impotence, urinary incontinence...)
  • Cancers
  • Sexual orientation, attitudes, practices, and functions
  • Sexually transmitted diseases
  • HIV–AIDS
  • Genetics.

But although these are among the more obviously delicate kinds of data, a person may just as well have anxieties about employers or others becoming aware of data regarding asthma, for instance, or epilepsy, cirrhosis of the liver, or a weak back.

Sensitivity may have to do with revelation of a past that a person has moved beyond and does not wish others to know about, or be reminded of himself. It may imply improper or socially marginal behavior. It may stem from resentment at ill fortune in the lottery of life, or from imputation of careless behavior, or implication of disfunctionality. And of course it may stem from fear of negative discrimination.

This raises serious questions for policy. Should distinctions be made among kinds of health data with respect to how they are protected? Should special sensitivities be recognized? Should protections be scaled relative to the potential for social or physical harm, or emotional offense, to data-subjects?

A U.S. Task Force on the Privacy of Private-Sector Health Records expressed this view (with which the author agrees):35

The Task Force believes that any file containing health information should be considered a candidate for protection since it is the information itself, and not the form in which it is maintained, which could result in an invasion of privacy if released. ... Although the Task Force agrees that it is appealing to classify information according to sensitivity, it questions whether this is the most effective approach to protecting data that may potentially cause harm to an individual. Disease-specific segregation of records necessitates complicated administrative arrangements.... In addition, the definition of what constitutes a sensitive medical record may differ from decade to decade and from individual to individual. ... Protecting all health records adequately is the issue that must be addressed.


(35) U.S. Department of Health and Human Services, Task Force on the Privacy of Private-Sector Records, Final Report, p. 4 (report prepared under HHS Contract # 100-91-0036 by Kunitz and Associates, Inc., 6001 Montrose Road, Suite 920, Rockville, Maryland 20852, September 1995).

The Diversity of Data Holders

Just as varied as the types of health data, of course, are the types of individuals and organizations who hold or process the data. Data are processed by:

  • Clinics, hospitals, nursing homes, hospices, laboratories, independent physicians, and other primary-care and diagnostic services (breast cancer screening...)
  • Pharmacies (including large commercial chains)
  • Private-sector managed-care providers, healthcare services (physical therapy, speech therapy, social work...), disease-management businesses (diabetes maintenance...), and pharmacy-benefit management companies
  • Blood supply systems and manufacturers of blood products, ova and sperm banks, and organ transplant and other tissue brokers
  • Academic and other nonprofit health research centers
  • Government organizations (healthcare providers; payors; research and statistics centers; regulators; public-health authorities; immigration, military, penal, and social-services organizations...)
  • Manufacturers of pharmaceuticals, vaccines and other biotechnology products, diagnostics, and medical devices
  • International quasi-governmental organizations (World Health Organization, International Red Cross and Red Crescent...)
  • Health, life, and casualty insurance companies
  • Nonprofit patient organizations
  • Employers, schools
  • For-profit research firms (contract research organizations...)
  • Commercial data vendors.

Thus health data are held by a greater variety of organizations than ever before. Data flow, often at very high volume, within and among many of these organizations.

Although physicians, and staff nominally under their supervision, still collect much of the most intimate data, they are not necessarily any longer in position to control the movements, uses, or fate of the data. Data from a routine patient encounter with the healthcare system quickly are transmitted among care-providers and their local institutions, various technical support services, the paying institutions, and a variety of supervisors, inspectors, auditors, and researchers—many far removed from the data-subject, many not medically certified, and possibly many not sworn to confidentiality. Eventually the encounter may be examined in practice review, filed into statistical tabulations, recorded into ongoing registries, or scrutinized in research.

Databases Useful for Research

Among the most important resources for research are databases and registries of health experience. Some are highly specialized but not very large; some are broad and enormous. Some are maintained only for research; some are primarily maintained for administrative or other purposes but are available for research. They may be organized by illness (leprosy...), by exposure (oral contraceptives...), by mode of intervention (kidney transplant...), by general healthcare experience (nursing home stay...), or by population (residents of Saskatchewan).

Perhaps the largest collection of health databases in the world is the set of U.S. "Medicare" database systems, which every year processes the records of over 600 million reimbursement claims. (Medicare is the Federal health insurance program for people age 65 and over, people with serious disabilities, and people suffering from serious kidney disease.) The Medicare databases, which are managed by the Health Care Financing Administration (HCFA), contain enrollment and eligibility data, claims for payment, data on the ways healthcare services are used, and many specialized data (such as on end-stage renal disease).36

Much very useful research is performed on HCFA data, which as collected is personally identifiable. Public-use files are made available in which, HCFA certifies, "all identifiers have been encrypted, ranged, or blanked." For research projects which meet the criteria for release of identifiable data, HCFA supplies data under Release Agreements pursuant to "routine uses" announced under the Privacy Act. The protections are strict. (See page 59 regarding the Privacy Act, and page 68 regarding conditions on use of Medicare data.)

"Medicaid" databases also are important research resources. (Medicaid programs are regimes under which the States pay for basic health care for low-income, blind, or otherwise disadvantaged people, using joint Federal–State funds.) Like Medicare data, Medicaid data are administrative and billing records.

Although the data may not be of highest quality and are not fully standardized nationally, they nonetheless provide large amounts of diverse information about health and health care about millions of patients "in the real world." Sophisticated computer programs allow searching for data on patient age and sex, diagnoses, use of medicines and medical procedures, costs, and other factors. Researchers are allowed access to the data under restrictive conditions.37

Health databases useful for research are maintained in many places.38 In Europe, just to mention a few examples to suggest their variety, they include the 30 Regional Centers of the French Pharmacovigilance System, the Danish Psychiatric Central Register, the Crohn's Disease Register for the Brussels region, and the Prescription Event Monitoring System run by the Drug Safety Research Unit in Southampton. All of these hold personally identifiable data, as they must.


(36) U.S. Health Care Financing Administration, Bureau of Data Management and Strategy, Data Users Reference Guide (HCFA, Baltimore, Maryland, September 1995), and Overview of Health Care Financing Administration Data: Resource Guide (HCFA, Baltimore, Maryland, April 1996).

(37) Jeffrey L. Carson and Brian L. Strom, "Medicaid databases," pp. 199–216 of Brian L. Strom, editor,Pharmacoepidemiology, Second Edition (John Wiley & Sons, Chichester and New York, 1994).

(38) Several hundred databases were surveyed and described in a series of International Drug Benefit/Risk Data Resource Handbooks(covering North America, Europe, Japan, Australia, and New Zealand) prepared under the auspices of the International Medical Benefit/Risk Foundation, Geneva. Information on the Handbooks can be obtained from Dr. Judith K. Jones, The Degge Group, Ltd., 1616 North Fort Meyer Drive, Arlington, Virginia 22209-3109.

The International Flow of Data

Health data are zipped around the world all day every day, by government research agencies, pharmaceutical firms, academic researchers, and many others. Data on Americans are transferred, American institutions do much data-transferring, and data are transferred for important American purposes.

A great many health data are imported into the U.S., and many are exported. Such U.S. agencies as the Centers for Disease Control and Prevention, working cooperatively in and with many other countries, import personally identifiable data, under safeguards. The National Heart, Lung, and Blood Institute, in joint programs with Canada and European countries, exchanges data internationally, under safeguards. So does the National Cancer Institute.

Huge volumes of clinical-trial data collected in medical centers are transferred all the time, on behalf of companies that develop and manufacture pharmaceuticals, diagnostics, and medical devices, and the National Institutes of Health, and the World Health Organization, and many others working to improve medical "tools." So are drug, device, and vaccine adverse-effect reports, which provide essential feedback.

Thus personally identifiable health-research data are exchanged internationally, for very good reasons, all the time, and inevitably this international data flow will increase. The importance of pressing for uniform international standards for protecting privacy, confidentiality, and security, is evident.

3. Data from Research, Research on Data

Contemporary health research is generating a multitude of benefits for humankind, and the future benefits look at least as promising. The following sketches can hardly do justice to the myriad complex activities. But they indicate some of the research purposes and approaches, the character of the data, and the privacy-protection problems involved.

As the above title indicates, health research generates new data by observation and experiment, but also—in part because its questions are of such an "applied," practical nature—it often proceeds by analyzing data that were originally collected for another purpose. The two approaches can have different implications for privacy.

The purposes of research are many, and they overlap. Research is conducted:

  • To advance basic biomedical science
  • To know patterns of health, disease, and disability
  • To reduce public-health threats
  • To understand utilization of health care
  • To evaluate and improve practices
  • To make effective innovations
  • To analyze economic factors
  • To appraise markets.

Research to Advance Basic Biomedical Science

Basic research develops the fundamental science that underpins all applied research. It uses every experimental approach possible, every kind of instrumental observation, every epidemiological and other analytic technique. It uses social-scientific methods where these can illuminate basics. It studies simplified "model" systems, in search of insights and techniques that will help study (messier) natural systems. It develops methods.

Much of the task of basic research is to study baseline functioning and health: metabolic mechanisms, hormonal controls, immune responses, and the phenomena of conception, inheritance, development, cognition, memory, and aging. It studies the materials of the body, flows of energy, and how the body interacts with various environments.

Basic research also studies abnormal functioning, and disease states and processes. And it studies bacteria, viruses, fungi, worms, mites, radiation, noise, toxins, dietary factors, stress factors, dusts, allergens—all the agents and risk factors that can affect health.

Much basic biomedical research does not need to use personally identifiable data, but some of course does.

Research to Know Patterns of Health, Disease, and Disability

All over the world, health and disease are monitored. Starting with prenatal observations and birth data, throughout life health-related measurements and observations accumulate. Analyses are made to portray the "natural history" of diseases and disabilities—how they start, progress in a person or spread to others, and run their courses. Also analyzed are risk factors, and the effects of preventions and interventions. Now genetic patterns in populations are being analyzed much more actively, and in far greater detail, than ever before.

Public-health surveillance, recording the occurrence of events in populations, is one of the longest-established of public-health functions. A representative definition is this one by Stephen Thacker:39

Public health surveillance is the ongoing systematic collection, analysis, and interpretation of outcome-specific data for use in the planning, implementation, and evaluation of public health practice. A surveillance system includes the functional capacity for data collection and analysis as well as the timely dissemination of these data to persons who can undertake effective prevention and control activities.

The tasks of surveillance include assembling vital statistics (on births and deaths, and sometimes other events); profiling health status within populations; analyzing patterns of illness and disability, and health risks and risk factors; and studying how people interact with healthcare systems.

Practitioners of surveillance sometimes protest that what they do is not "research." Dr. Thacker insists: "The boundary of surveillance practice excludes actual research and implementation of delivery programs. Because of this separation,epidemiologic cannot accurately be used to modify surveillance."

This is controversial. Perhaps part of the problem is that much surveillance is of necessity based on less-than-fully-standardized, non-validated reports from local physicians and laboratories. Too, in emergencies, such as when surveillance is quickly mounted to trace a contagious disease outbreak, the observations may lack scientific elegance. And generally, surveillance does not itself test a hypothesis (about cause, for instance), but rather passively collects data (although the data generated by surveillance may be used to test a hypothesis). Surveillance may indicate that something is happening, but not necessarily why, or what the factors are. But it does perform highly structured searches for data that, among several purposes, become input for research. This deserves continued discussion. The reason it may matter for privacy is that it can have implications for how the activity is treated under human-subjects protection regulations.40

Notifiable disease reporting is a standard public-health activity everywhere. Under the National Notifiable Diseases Reporting System in the U.S., local and State health departments routinely forward case reports, including data on age, gender, and race, on around 50 diseases (measles, mumps, tuberculosis, hepatitis A and B, syphilis...) to the Centers for Disease Control and Prevention (CDC). The CDC then quickly publishes analyses, which help public-health experts and authorities discern patterns of occurrence, and intervene.41 The CDC receives the case reports with the identifiers removed. Many other such surveillance programs are in operation all over the world. The World Health Organization publishes summaries.

A spirit of openness and reassurance can encourage a community to cooperate. Robert Hahn has proposed this "Ethical checklist for public health surveillance":42

  1. Justify the surveillance system in terms of maximizing potential public health benefits and minimizing public and individual harm.
  2. Justify use of identifiers and the maintenance of records with identifiers.
  3. Have surveillance protocols and analytic research reviewed by colleagues, and share data and findings with colleagues and the public health community at large.
  4. Elicit informed consent from potential surveillance subjects.
  5. Assure the protection of the confidentiality of subjects.
  6. Inform health-care providers of conditions germane to their patients.
  7. Inform the public, the public health community, and clinicians of findings of surveillance.

Health statistics programs collect a very large variety and volume of facts, to provide the descriptive backdrop against which society can decide how to optimize interventions, use resources most effectively, and cope with change.

The U.S. National Center for Health Statistics (NCHS), a component of the National Centers for Disease Control and Prevention, analyzes data from existing records, and it gathers data itself via interviews and examinations. The NCHS's National Health Interview Survey periodically collects data on a very wide range of health-status measures and illnesses, and on hospital use, dental care, hearing impairment, nursing home experience, and many other matters; the most recent survey interviewed 120,000 people. To provide data on infant-death risks, NCHS maintains linked files of live births and infant deaths. It also gathers data such as birthweight, which is a reliable index to both maternal and infant health, on a sampled national basis. To help epidemiologists identify subjects for in-depth causal analyses, NCHS assembles selected mortality data from the States into the National Death Index.

In the next round (IV) of its famous National Health and Nutrition Examination Survey (NHANES), the NCHS will examine some 30,000 carefully sampled people to determine health trends. Special coverage will be given to such subgroups as Blacks, Mexican-Americans, low- income persons, preschool children, and the elderly. Like earlier rounds, the Survey will be based on extensive confidential interviews, physical examinations, and laboratory tests. It will amass about 8,000 pieces of data on each subject. These NHANES surveys are data quarries, from which insights derived from data on relatively few people help improve health for countless others in the larger society, including people outside the U.S.

Many of the data collected by NCHS are personally identifiable data. The Center's statute stipulates that personally identifiable data must be carefully protected, and that they may not be used for any purpose other than that for which they were collected unless the data-subject gives new informed consent to the new use.43 It shares identifiable data with researchers in other U.S. government agencies only if the data-subjects have been informed of and consented to such sharing, and then only under highly restrictive interagency agreements. NCHS never releases identifiable data to anyone else. It does release data for public use, but only after all personal identifiers, and all information that might allow deductive identification of the subjects, have been removed.

Registries. Public-health agencies and other organizations maintain many registries in addition to those for notifiable diseases. Registries usually collect data on individuals' or populations' experience over time, perhaps linking data from several sources (occupational hazard exposure + disease incidence...), and may be cumulated so that the progression of events can be studied. Registries may cover locally important diseases (Lyme disease...), for instance, or occupational illnesses (carpal tunnel syndrome...), or consequences of disasters (Chernobyl...).

A crucial function of registries and other databases can be the identifying and monitoring of health problems of minority and underserved groups, and the effectiveness of interventions.44 Special precautions may need to be taken in order to protect the identities and rights of minority data-subjects.45


(39) Stephen B. Thacker, p. 3 in "Historical development," pp. 3–17 of Steven M. Teutsch and R. Elliott Churchill, editors, Principles and Practice of Public Health Surveillance (Oxford University Press, New York and Oxford, 1994). This book, written mostly by experts at the U.S. Centers for Disease Control and Prevention, is an excellent overview.

(40) A recent "viewpoint" essay on this from the CDC was Dixie E. Snider, Jr. and Donna F. Stroup, "Defining research when it comes to public health," Public Health Reports 112, 29–32 (1997); a "counterpoint" essay was Wendy K. Mariner, "Public confidence in public health research ethics," Public Health Reports 112, 33–36 (1997).

(41) The CDC's weekly Mortality and Morbidity Report and much related CDC information is available on the Internet at <http://www.cdc.gov/epo/mmwr/mmwr.html &gt;.

(42) Page 188 of "Ethical issues," in Teutsch and Churchill, as cited in endnote (39).

(43) Public Health Service Act § 308(d); 42 United States Code 242m(d).

(44) U.S. Department of Health and Human Services, "Directory of Minority Health and Human Services Data Resources," prepared under U.S. Agency for Health Care Policy and Research contract No. 282-90-0031 by Moshman Associates, Inc., of Bethesda, Maryland (October 1995); published only on the Internet at < http://www.hhs.gov/progorg/aspe/minority/ >.

(45) For research examples and ethical context, see Jonathan R. Sugarman, Martha Holliday, Andrew Ross, and Doni Wilder, "Improving health data among American Indians and Alaska Natives: An approach from the Pacific Northwest" and other chapters in Audrey R. Chapman, editor, Health Care and Information Ethics: Protecting Fundamental Human Rights (Sheed & Ward, Kansas City, Missouri, April 1997).

Research to Reduce Public-Health Threats

Whether or not they are to be considered "research," a classic category of investigations have to do with coping with disease outbreaks and epidemics, and with other emerging or emergency threats.

In this regard it is hard not to think again of the renowned work of the U.S. Centers for Disease Control and Prevention (CDC). The CDC regards itself mainly to be a "public-health practice" agency, as differentiated from a "research" agency. The CDC is depended upon, both by the U.S. and by other countries, for quick response to disease outbreaks, whether classical "food poisoning" or rabies, or known but rare diseases (bubonic plague, yellow fever...), or new or exotic ones (Ebola...). It also charts the waves of shiftily-changing influenza viruses that drift around the globe seasonally, and many other threats. It performs much work outside the U.S., and of course it cooperates with host governments and exchanges data.

A recent report on emerging infectious diseases warns of staggering problems ahead:46

Despite historical predictions to the contrary, we remain vulnerable to a wide array of new and resurgent infectious diseases. ... Our vulnerability to emerging infections was dramatically demonstrated in 1993. A once obscure intestinal parasite,Cryptosporidium, caused the largest waterborne disease outbreak ever recognized in this country; an emerging bacterial pathogen,Escherichia coli O157:H7, caused a multi-state foodborne outbreak of severe bloody diarrhea and kidney failure; and a previously unknown hantavirus, producing an often lethal lung infection, was linked to exposure to infected rodent. ...

Methicillin-resistant Staphylococcus aureus, a common cause of hospital infections, may be developing resistance to vancomycin; penicillin resistance is spreading in Strepto- coccus pneumoniae; cholera will likely be introduced into the Caribbean islands from the current pandemic in Latin America, and the new strain, Vibrio cholerae O139, is spreading throughout southern Asia.

To combat these as well as the many more classically known infectious diseases, a great many personally identifiable data will have to be studied, by the CDC and others. And the research will have to be truly international, as much of HIV–AIDS research is.47

Survey studies. Some important research on public-health threats involves social- scientific methods. Attitudes are surveyed, to inform public-health promotion and disease prevention campaigns. For example, the U.S. National Institute for Child Health and Human Development has conducted large, highly confidential, interviews of adolescents' sexual attitudes and practices; so has the Centers for Disease Control and Prevention. The consent process is conducted carefully, and the promised confidentiality is guarded closely.

Efforts can be made to respect privacy during data-gathering itself. In a large adolescent health ("Add Health") study under the U.S. National Institute of Child Health and Human Development and other agencies, privacy in interviews involving potentially sensitive questions was afforded by having the adolescents self-administer survey questions via dedicated computer terminals.48


(46) U.S. Centers for Disease Control and Prevention, "Addressing emerging infectious disease threats: A prevention strategy for the United States," Preface (CDC, 1600 Clifton Road, Atlanta, Georgia 30333, 1994); available on the Internet at <http://www.cdc.gov/ncidod/publications/eid_plan/home.htm &gt;. For general background see Institute of Medicine; Joshua Lederberg, Robert E. Shope, and Stanley C. Oakes, Jr., editors, Emerging Infections: Microbial Threats to Health in the United States (National Academy Press, Washington, DC, 1992).

(47)A gripping book not to read before trying to sleep is Laurie Garrett, The Coming Plague: Newly Emerging Diseases in a World Out of Balance (Farrar, Straus and Giroux, New York, 1994).

(48) Primary investigator, Dr. J. Richard Udry (University of North Carolina at Chapel Hill).

Research to Understand Utilization of Health Care

Many approaches are taken for studying how people avail themselves of health care, why they do or don't take various actions, and what factors relate to the behavior. In efforts to enhance women's health, for instance, records are accumulated on Pap smears, breast exams, mammographic screening, obstetric examinations during pregnancy, and countless other interactions with healthcare systems. These data can then be linked with other health data to ask evaluative questions about how well preventive actions or other interventions "work." (What was the women's subsequent health history? How predictive did the screening turn out to be? Could the techniques or frequency of the screening have been improved? What best-practice does it imply for other women?)49

Survey interviews are conducted to try to understand the attitudes that shape behavior (for example, to learn why people don't take prescribed medications faithfully). Health services research is performed on the influence of facility location, costs and cost-sharing, and countless other factors that influence use.


(49) For many illustrations of how analysis can inform health care policy and delivery see Roberta Wyn, E. Richard Brown, and Hongjian Yu, "Women's use of preventive health services," pp. 49–75, and other chapters of Marilyn M. Falik and Karen Scott Collins, editors, Women's Health: The Commonwealth Fund Survey (The Johns Hopkins University Press, Baltimore and London, 1996).

Research to Evaluate and Improve Practices

Much research is performed to evaluate public-health programs, clinical practices, and the effects of innovations.50

Outcomes research is performed to analyze "what works best," so to speak, and for what subset of persons and situations, and under what conditions, and perhaps at what costs.51 Some of this research focuses on individuals, and some on populations. It has to do with whether various preventive or other healthcare services are available, and with how effectively they are used. It examines large samples of real cases, analyzes them statistically, and structures the data in analytic frameworks. Thus outcomes research informs the planning of public-health services delivery (such as smoking-cessation programs for pregnant women), for instance, and the optimal path through branching clinical judgments (what to do for patients with gallstones...).

Clinical practice guidelines have been prepared by many governmental, professional, patient, and managed-care organizations, based on outcomes studies and other evaluative research.

Leading roles have been played by the U.S. Agency for Health Care Policy and Research, which has prepared guidelines on problems ranging from sickle cell disease, to middle ear infections in children, to prostate enlargement. The Agency continues to work to advance the methods of practice evaluation and improvement.52

Drug utilization review, the analysis and critiquing of the use of pharmaceuticals in particular clinical settings (such as the use of antibiotics before and after surgery), provides facts that, along with cost-effectiveness and other considerations, help optimize use of drugs. Such review, along with other factors, supports the development of formularies, lists of pharma- ceuticals that are approved for dispensing in the institution, and recommendations for use. Such review may determine whether the drugs qualify for cost reimbursement. Analogous studies are performed on surgical techniques, anesthetics, and use of diagnostic and other services.

Follow-up tracking follows the consequences of interventions. The U.S. Food and Drug Administration requires tracking of people in whom medical devices (such as artificial joints, cardiac pacemakers, heart valves, breast implants, and testicular prostheses) have been implanted. The patients are urged to keep themselves registered. The tracking enhances the patients' care, in that it allows communication with them to advise of relevant new knowledge or warn them to see their doctor or take other protective steps. Of course also it provides data for research on patterns of use, outcomes, costs, patient attitudes about the implants, and other factors, which helps improve the devices and their use.

Quality-of-life studies and patient-attitude surveys use interviews, focus-group discussions, and other social-scientific methods to enquire into the valuations people make of various health states and treatments.


(50) A very readable set of case essays is Howard S. Frazier and Frederick Mosteller, Medicine Worth Paying For: Assessing Medical Innovations (Harvard University Press, Cambridge, Massachusetts and London, 1995).

(51) Michael F. Drummond, Greg L. Stoddart, and George W. Torrance, Methods for the Economic Evaluation of Health Care Programs(Oxford Medical Publications, Oxford, 1992).

(52) The Guidelines and much other related information from the AHCPR are available on the Internet at <http://www.ahcpr.gov/guide/ &gt;.

Research to Make Effective Innovations

A prime example of innovation is the elaborate work of developing and improving the use of pharmaceuticals, medical devices, diagnostic instruments and tests, vaccines, and other "tools" of health care.53 After much preliminary screening, an experimental entity or procedure is subjected to a long series of clinical trials, perhaps on tens of thousands of volunteers in many countries, to evaluate its efficacy, risks, and other attributes. Refinements are made, and many evaluations are conducted. Eventually the sponsoring company or agency submits the data to government regulatory authorities—in the U.S., the Food and Drug Administration (FDA)—to be reviewed for licensing.

In this process huge quantities of personally identifiable data are amassed. The raw data are collected in clinical settings. Then the data are transferred by the physician–investigators to the sponsor, usually assigning a key-code pseudonym to the data first, which allows tracing-back to the subject via the physician. The sponsor analyzes and prepares the key-coded data for regulatory submission, and transfers the data, still (or re-) key-coded, to the regulatory agency for review. All of this is conducted under international guidelines of good clinical practice, and under the various national human-subjects regulations and regulatory statutes. In the U.S. the confidentiality of these data are covered by regulatory controls and protected by the Federal Privacy Act.

The FDA audits selected trials, usually ones that are considered pivotal to the regulatory decisions. In 1995, for instance, specially credentialed inspectors from its Center for Drug Evaluation and Research conducted over 400 inspections, in each audit reviewing all of the subjects' records including the consent forms and IRB records. If they must take photocopies of personal data away from the site, they first remove the identifiers. They conduct inspections of sites in other countries, through arrangements made by the product sponsors.

After an innovation becomes licensed for general use, research continues. In pharmacovigilance, the company and regulators watch for previously unknown effects of drugs, and respond quickly to spontaneous adverse-event reports communicated by doctors, patients, or others. These data usually are identified by the patient's initials and physician's name. To allow tracing-back to the patient, identifiability—at least key-coded, at least back to the physician— must be preserved, in case it becomes scientifically necessary to review the full medical record and circumstances. The FDA "MedWatch" program, which collects the adverse- event reports for drugs and medical devices, received around 170,000 reports last year. MedWatch shares the identity of the reporting physicians with the manufacturers unless a physician requests it not to. After extensive evaluation these data inform decisions about keeping the product on the market, or revising uses, formulations, dosing, route of administration, labels, or packaging.

Similarly, a "Vaccine Adverse Event Reporting System" (VAERS), administered jointly by the FDA and the Centers for Disease Control and Prevention, collects and analyzes reports on vaccines, again keeping the patients' identity confidential. A prime focus is vaccination of children—to provide feedback that helps improve the vaccines as preventive tools, and to guide public-health missions in ensuring high rates of vaccination. The data are made available for research, after all identifying information is removed. 54,55

Postmarketing surveillance may be carried out for a variety of purposes, to learn more about the innovation's medical effects, both beneficial and harmful, as it is tested by wider natural experience. Pharmacoepidemiology, "the study of the use of and the effects of drugs in large numbers of people," is a prime tool in postmarketing research on medicines.56 Similar techniques apply to surgery and other interventions.


(53) Bert Spilker, Guide to Clinical Trials (Raven Press, New York, 1991).

(54) Robert T. Chen, Suresh C. Rastogi, John R. Mullen, Scott W. Hayes, Stephen L. Cochi, Jerome A. Donlon, and Steven G. Wassilak, "The Vaccine Adverse Event Reporting System (VAERS)," Vaccine 12, 542–551 (1994). Information on VAERS is available on the Internet at <http://www.fda.gov/cber/vaerstxt.html &gt;.

(55) For general background on data needed for analyzing vaccine effects, see Institute of Medicine, Vaccine Safety Committee; Kathleen R. Stratton, Cynthia J. Howe, and Richard B. Johnston, Jr., editors, Adverse Events Associated with Childhood Vaccines: Evidence Bearing on Causality (National Academy Press, Washington, DC, 1993).

(56) Brian L. Strom, p. 3 of "What is pharmacoepidemiology?" pp. 1–13 of Strom, as cited in endnote (37).

Research to Analyze Economic Factors

As every newspaper reader is aware, every aspect of health care now is being subjected to economic analysis—to size up the costs of illness and costs in specific episodes of care, evaluate cost-effectiveness of different interventions, see what effects various cost-related incentives have, and understand the component costs in healthcare systems. Virtually every healthcare institution and payor is performing such analyses.

Much of this economic research can be performed on anonymized/aggregated or key- coded data, but detailed analyses of individual patient experiences and the costs incurred may require the examination of personally identifiable data. Naturally much of such research draws at least partly on data in the large databases of healthcare payors.57,58


(57) Marthe R. Gold, Joanna E. Siegel, Louise B. Russell, and Milton C. Weinstein, editors, Cost-Effectiveness in Health and Medicine(Oxford University Press, New York, 1996).

(58) Laura A. Genduso and James G. Kotsanos, "Review of health economic guidelines in the form of regulations, principles, policies, and positions," Drug Information Journal 30, 1003–1016 (1996).

Research to Appraise Markets

Research on healthcare markets has to be noted here because often such market research now is being performed by, or for, units of organizations that have access to personal data collected for clinical research or disease management.

Obvious examples are the pharmaceutical enterprises, which analyze patterns and future projections of disease, the prescribing patterns of physicians and healthcare organizations, and the economic market for their current products and those under development. Much of this is no different from the market research performed by all businesses. Some is conducted by service companies that gather data, such as dispensing data from pharmacies, and convey them, in nonidentified form, to the drug companies. But what is special is that parts of the large firms may potentially have access, through the information amassed in their main innovative R&D research, to personally identifiable health data.

Several business developments of the past few years have raised this issue. Large research-based pharmaceutical firms have merged with large, highly computerized pharmacy supply companies to form pharmacy-benefit management businesses. Pharmaceutical companies have formed disease-management businesses (that is, supplying services, under contract, directly to patients such as diabetics). And pharmaceutical firms have acquired or formed networked physician informatics businesses. All of these activities are bringing these large companies much closer to patient care, and to large volumes of patient data.

May the commercial divisions of these companies access the personal healthcare data (such as the diabetics' data) for market research, or even carry out direct marketing to patients? Inversely, may the traditional R&D units (those that develop drugs, devices, and diagnostics) access the personal data collected by the affiliated pharmacy-benefit or disease-management businesses, to profile users of products or perform outcome studies or other analyses?

The temptations are obvious. A recent report from a panel of the National Research Council observed: "In many of these cases, specific agreements have been established to limit data sharing among affiliated companies, but the complex overlaps make security more difficult to ensure."59 Are effective barriers in place among these activities to protect the data-subjects? The companies should be urged to attend carefully to these matters.

(Also, commercial units of companies conduct product-acceptance research on health- related products, to gauge potential customers' opinions of product design, convenience, cost, packaging, and information. But this is little different from product research in other industries, except if the survey participants are identified as having particular diseases or disabilities. Informed consent and promises of nondisclosure can be incorporated.)


(59) NRC, Computer Science and Telecommunications Board, as cited in endnote (9).

4. Identifiability, Consent, and Protections

One of the most reassuring things a research organization can say with respect to the privacy of the people whose health data it is studying, is: "We don't know the personal identity of our data- subjects; and we really don't want to know." 60

This should not necessarily mean that no-one can trace back to the data-subject if scientific reasons require it. But such tracing-back should itself be at least a small project, the difficulty of which should be scaled to suit the situation.


(60) As many researchers said to the author in interviews during this study.

Identifiable---Key-Coded---Anonymized

From a privacy-protection perspective, there is a very wide distinction between personally identifiable data and truly anonymized data. But in practice the demarcation between these extremes is not sharp. Attending assiduously to where particular data lie on the spectrum between them, and especially to data that are somewhere in the middle, is a crucial protection strategy.

At present, large amounts of data lie in-between—they are not completely anonymized, but they are not readily identified, either. It is routine to decrease identifiability by assigning to data a pseudonym made up of numbers and/or letters. But if, for instance, the overall data category is known (say, epilepsy among men in a certain district) and the data are coded-for by, say, simply personal initials and birthdate, it may not be difficult to deduce who the data-subject is. The power of computers to perform elaborate, powerful, rapid searches, and the pressures for access, mean that merely assigning simple pseudonyms affords little protection.

For data whose identifiability has, up to now, been only lightly obscured, greater efforts now must be made either: (a) to much more effectively remove personally identifying information, or to aggregate, and thus anonymize, the data; or (b) to seek the data-subjects' informed consent and hold the data under a suitably protective regimen if identifiability is retained.

For key-coded data—that is, data for which personal identifiers are removed and secreted but which are still potentially traceable via a matching code, held separately—a variety of measures must be taken to mask the identifiability near the source, separate and lock up the identifiers, safeguard the linking codes, and carefully manage linking-back to the data-subject when it is required.

nstitutions should clearly articulate their policies on use or sharing of personally identifiable data. An example of such a policy statement is this guidance by the U.S. Office for Protection from Research Risks, on HIV studies:61

Where identifiers are not required by the design of the study, they are not to be recorded. If identifiers are recorded, they should be separated, if possible, from data and stored separately, with linkage restored only when necessary to conduct the research. No lists should be retained identifying those who elected not to participate. Participants should be given a fair, clear explanation of how information about them will be handled. ...

As a general principle, information is not to be disclosed without the subject's consent. The protocol must clearly state who is entitled to see records with identifiers, both within and outside the project.


(61) "Dear Colleague Letter," OPRR Reports, p. 3 (December 26, 1984).

Anonymized Data

Much very useful health research is performed on completely anonymized data. If for a particular research project there are no compelling reasons for retaining at least potential identifiability, anonymized data should be used. Though this injunction might sound unnecessary, it is stated here because often, data with identifiers are used just because they happen already to be on hand in identified form.

Data may be non-identifiable if any of the following tactics have been employed:

  • Identifiers simply have never been collected.
  • Identifiers have been removed ("stripped") effectively.
  • Data have been aggregated—that is, within each data sub-element the data have been averaged or grouped into ranges, and only the averages or ranges reported, not revealing the identity of the data-subjects.
  • Data have been "micro-aggregated," with small randomly assembled clusters of cases averaged, in effect generating a set of pseudo-cases that represent the real population.62

The test of whether data actually are non-identifiable is whether a person without prior knowledge of the data or their collection can, from the data and any other available information (such as postal-code charts, or a casually-held key to a code, or a list of the people recruited to the study), deduce the personal identity of data-subjects.

In an area in which the issue is highly contentious, a "consensus statement" from a workshop on genetic research on stored human tissue samples stated emphatically:63

Samples are anonymous if and only if it is impossible under any circumstances to identify the individual source. At present, in settings such as those involving large population groups, it may be possible to ensure anonymity while retaining some information about the individual source, such as ethnic origin, sex, age cohort, or limited clinical data, with the sample. In other settings, such as DNA samples obtained from a small group of individuals at risk for a specific disorder, retention of additional information may compromise anonymity. Samples are not anonymous if it is possible for any person to link the sample with its source. Even if the researcher cannot identify the source of the tissue, the samples are not anonymous if some other individual or institution has the ability.

If data must be transformed before being released for research—whether into irreversibly anonymized or into key-coded form—characteristics that might indirectly lead to identification of the data-subject should be obscured, blurred, or masked. Residential addresses can be translated into regions. Since some postal zones may be sparsely populated or have a distinctive cast of inhabitants, postal-code identifiers might be avoided. Instead of birthdate, perhaps age, or age brackets, can be used. Instead of the exact number of beds in nursing homes, capacity categories can be used. And personal initials are personal.

The extent to which any transformations are employed should be scaled to the characteristics of the sample and the population of which it is a subset, the potential risks to the data-subjects, the subjects' expectations, and other factors.

Many technical methods of "disclosure limitation" can be applied to make deductive identification of data-subjects difficult, if not impossible. In population studies, for instance, only relatively small proportions of the populations can be sampled. For surveys, only a randomly selected subset of the responses might be released instead of all of the responses, to obviate guessing, by elimination, who said what. And so on.64


(62) Alexander M. Walker, "Generic data," Pharmacoepidemiology and Drug Safety 4, 265–267 (1995).

(63) Page 1787 of Ellen Wright Clayton, Karen K. Steinberg, Muin J. Khoury, Elizabeth Thomson, Lori Andrews, Mary Jo Ellis Kahn, Loretta M. Kopelman, and Joan O. Weiss, "Informed consent for genetic research on stored tissue samples," Journal of the American Medical Association 274, 1786–1792 (1995).

(64) National Research Council, Panel on Confidentiality and Access of the Committee on National Statistics, and the Social Sciences Research Council; George T. Duncan, Thomas B. Jabine, and Virginia de Wolf, editors, Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics (National Academy Press, Washington, DC, 1993).

Reasons for Retaining Identifiability

For many purposes researchers must keep the ability to trace back, even if through intermediaries, to the data-subjects. Irreversible anonymization is not necessarily desirable.

There are a number of important reasons why retaining personal identifiability—either openly labelled or via key-coding—may be essential:

  • To allow technical validation of reports, such as to confirm the correspondence of various data with the data-subjects, or even to verify the very existence and identity of subjects, in order to prevent scientific errors or fraud.
  • To avoid duplicate records or redundant cases, such as to be certain that two case reports are independent and not just the same case recorded in two files.
  • To facilitate internal scientific data-quality control, such as enabling working-back to original records and ancillary data.
  • To allow case follow-up if more evidence or confirmation are needed.
  • To check data-subject consent records, or to examine Institutional Review Board stipulations or opinions on a case.
  • To allow tracking of consequences after some research intervention, to be able later, if necessary, to notify the patient or physician and recommend reexamination or other measures in-between research and health care.
  • To ensure accurate correspondence in linking data on data-subjects, or cases, or groups, or specimens, among different files or databases, perhaps over a long period, even over decades, and possibly to follow-on to descendants.

One of the clearest examples of the need to retain potential identifiability is the analysis of pharmaceutical and medical-device side-effect risks. As was mentioned above, the U.S. Food and Drug Administration, like all regulatory authorities, properly requires that data-links to the patient record be maintained (usually through the data-subject's physician) so that adverse-drug-event reports, sent in by physicians, the public, or manufacturers, can be verified and scrutinized in clinical detail if necessary.

Managing Key-Coded Data

Because irreversible anonymization often is undesirable on scientific grounds, the procedures and methods of key-coding of various forms are essential techniques. Some of the practices are very technical. Degree of key-coding or "masking" is relative. It is a question of the extent to which personal identifiability is obscured—which is to say, the impedance against "cracking" of the code and matching the data with the data-subjects.

U.S. agencies, such as the National Heart, Lung, and Blood Institute (NHLBI), emphasize that the first step in protecting personally identifiable data is simply to hold the identifiers close to the point of collection. Before transferring data to other researchers, then, the data should be stripped of identifiers and either key-coded or anonymized. When the Institute sends data to pharmaceutical companies from clinical trials on an investigational new drug, it strips off not only the patient and physician name but location, birthdate, and other data that could point back to the data-subject. It takes similar care when it correlates data from several sources, as when it links heart disease data with socioeconomic data.

Simply designating a reliable person within the research organization to be responsible for stripping identifiers—and formally certifying to the principal investigator and/or an administrator that the resulting set of stripped data is nonidentifiable—can be prudent.

Trusted intermediary organizations, such as public accounting or consulting firms, may be asked to remove identifiers, and perhaps to hold the key linking data with identifiers. For a detailed national analysis of hospital costs based on data provided by the States, the U.S. Agency for Health Care Policy and Research arranged for an intermediary organization to remove identifying information from the patient data, and also information that might identify the hospitals, before the Agency received the data.

In its alcohol related studies, which may be painfully sensitive for the people studied, the U.S. National Institute of Alcohol Abuse and Alcoholism assigns pseudonym (key-coded) identifiers to all subjects and has the key held securely by an independent third party.

The U.S. National Institute for Child Health and Human Development (NICHD) requires that if researchers wish to perform a secondary study on data originally collected by other investigators under an NICHD grant, they must pay a fee to the original researchers to key-code the identifiers and take other protective steps before transferring the data for the secondary study.

The following example illustrates a rigorous approach to separating identifiers from data but retaining the ability to reconnect them if necessary. In several states of Germany an elaborate system is being tested for population-based cancer registries.65 A "trusted office" (Vertrauensstelle), directed by a physician, receives cancer case data from doctors and hospitals, classifies the cases as to type of tumor and so on, and, using cryptographic procedures, assigns pseudonyms, separating the case data from the person-identifying data. Then, using a secure system, it transfers the pseudonymized data to a separately located "registration office" (Registerstelle), which stores the data securely. After a short time the "trusted office" destroys its set of the data. Again separately, a master re-identification key is held by a "supervisory office." The "registration office" cannot match identifiers to the cases it stores. If, later, it becomes scientifically necessary to trace back to the patient's physician to obtain more information, with the approval of an ethics committee the supervising office can use its re-identification key to reassociate the case data with the identifying data. The system has been endorsed in the relevant laws. Whether such a system will be widely applicable is not yet clear; but such approaches deserve to be evaluated.


(65) K. Pommerening, M. Miller, I. Schmidtmann, and J. Michaelis, "Pseudonyms for cancer registries," Methods of Information in Medicine 35, 112–121 (1996).

u.s. Subjects-Protection Policy

A uniform "Federal Policy for the Protection of Human Subjects," often called the "Federal Common Rule," is promulgated by sixteen Federal agencies that conduct, support, or regulate research. It governs such matters as subject rights, informed consent, Institutional Review Boards, disclosure policy, recordkeeping, and a variety of other matters.66

The Office for Protection from Research Risks (OPRR), in the National Institutes of Health, serves as a resource and makes certain that the Federal Common Rule is implemented. Research institutions, such as academic medical centers, which wish to perform research on humans under Federal funding or other Federal auspices must negotiate and enter into a formal "Assurance" with OPRR stipulating the overall means by which the institution will protect subjects and designating an officer responsible for being sure the protections are implemented.67

Initial questions of any investigatory activity are: Is it "research"? Who are "subjects"? According to the Federal Common Rule, research is "a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge" [italics added] (§_.102(d)). Human subjects are "living individual(s) about whom an investigator (whether professional or student) conducting research obtains (1) data through intervention or interaction with the individual, or (2) identifiable private information" (§_.102(f)). Definitions such as these are not mere exercises. Rather, they determine how particular investigatory activities must be approached, whether they fall under Federal scrutiny, and whether they must be supervised by an Institutional Review Board.

The Federal Common Rule exempts from its IRB and other requirements "research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens... if the information is recorded by the investigator in such a manner that subjects cannot be identified, directly or through identifiers linked to the subjects" (§_.001(b)(4)). This is being reconsidered with respect to genetic analysis of stored tissues, now that the genetic mapping techniques have become so revealing.

Beyond the provisions of the Federal Common Rule, many additional regulations in various agencies cover aspects of research involving ionizing radiation, research on high-risk biological agents, alcohol and drug-abuse research, research classified under national security regimens, and other special circumstances.


(66) The basic policy for the Department of Health and Human Services is codified at 45 Code of Federal Regulations 46, subpart A. Other subparts cover protections for fetuses and pregnant women (subpart B), prisoners (subpart C), and children (subpart D), and deal with such issues as in-vitro fertilization (subpart B). The Rule's generic provisions are designated in the form "§_.000," with the generic rules given to the right of the decimal, after the regulation part number assigned by the particular agency. The Common Federal Rule with its preamble reasoning was published in Federal Register 56, 28002–28032 (1991).

Food and Drug Administration regulations appear at 21 Code of Federal Regulations 50 (informed consent) and 56 (Institutional Review Boards), and elsewhere.

(67) Aspects of practice, regulation, and ethical principle are helpfully woven together in Office for Protection from Research Risks, U.S. National Institutes of Health, Protecting Human Research Subjects: Institutional Review Board Guidebook (U.S. National Institutes of Health, Bethesda, Maryland, 1993, with later addenda).

Data-Subject Consent

A universally endorsed ethical precept is that it is permissible to collect and use personally identifiable data, if the data-subject agrees to the conditions of data protection and use. The ideal is prior, informed, freely granted, specific consent. Researchers strive for this to varying degrees, and achieve it to varying degrees. 68,69

Informing for consent almost always must include telling the prospective subject "the extent, if any, to which confidentiality of records identifying the subject will be maintained." 70

For formal clinical trials and much other research, informed consent is routinely sought, Institutional Review Boards supervise the research protocols, and so on. But for many other kinds of research, for a variety of reasons notice is not routinely given nor explicit consent sought, and indeed may be practically impossible to seek. The policy and pragmatic questions are obvious. For example, retrospective studies, such as epidemiological reviews initiated years after the medical events, pose special problems. How should identifiability and consent be dealt with when such reviews are undertaken? And in general when data are collected, how broad consent should be sought for future studies that cannot be specifically anticipated? Important issues are the granting of consent for studies in large multipurpose databases, or retrospective secondary research (discussed in Chapter 7). How meaningful and sufficient is omnibus, indefinite consent?

Alas, as the ethicist Ruth Faden has rightly lamented:71

As a practical matter, how much moral weight the typical consent to access information can bear is dubious. The catchall phrases in the waivers and disclosure statements read and signed by patients and consumers—"Your records will be kept confidential and not be made available, except for statistical purposes," "except for research purposes," and "except for administrative purposes"—are doubtless not very meaningful to most people.

Unless we do a good job of soliciting genuine informed consent or conducting an extraordinarily public education and exchange to provide citizens with an understanding of who now has information and for what purposes, getting consent will not get us off the moral hook.

A real-world example just to indicate the complications is called for here. From the front- line of health social work, Jeanette Davidson and Tim Davidson have brought this sobering message, as relevant for research as for the provision of care:72

With managed care systems the reality is often that the name of the individual or organization receiving the disclosure may change without notice, the information to be disclosed may consist of a verbatim account of the client's most sensitive information given to persuade a gatekeeper to continue to authorize services, and the statement about the client's being able to revoke consent at any time is an illusory proposition given the virtual irretrievability of electronic transmissions of data that are stored in various locations.


(68) A classic source is Ruth R. Faden and Thomas L. Beauchamp, A History and Theory of Informed Consent (Oxford University Press, New York, 1986).

(69) A three-year grant program "to stimulate investigations into the informed consent process in scientific research" (RFA OD-97-001) was announced recently: NIH Guide to Grants and Contracts 25, No. 32 (September 27, 1996).

(70) 45 Code of Federal Regulations 46(a)(5), the Federal Common Rule's informed consent requirements.

(71) Ruth Faden, p. 12 of "Keynote speech," Proceedings, "Conference on Health Records: Social Needs and Personal Privacy," sponsored by the Department of Health and Human Services (U.S. Government Printing Office, Washington, DC, 1993).

(72) Jeanette R. Davidson and Tim Davidson, p. 212 of "Confidentiality and managed care: Ethical and legal concerns," Health Social Work 21, 208–215 (1996).

Institutional Review Board Supervision

External ethical oversight provides additional protection for research subjects. Prime examples in the U.S. are the Institutional Review Boards (IRBs) that supervise human-subjects research conducted under Federal jurisdiction, which is very broad. IRBs are carefully constituted boards that conduct independent oversight of research.73

The IRB is an administrative body established to protect the rights and welfare of human research subjects recruited to participate in research activities conducted under the auspices of the institution with which it is affiliated. The IRB has the authority to approve, require modifications in, or disapprove all research activities that fall within its jurisdiction as specified by federal regulations and local institutional policy.

In the U.S. a research institution must have in place a properly constituted and functioning IRB to be eligible to receive Federal funding for research on humans. Some institutions pledge all of their research, regardless of the source of funding, to the standards of the Federal Common Rule. All Federal agencies conducting research on humans operate under IRBs; the Centers for Disease Control and Prevention has six IRBs, and each of the seventeen Institutes of the National Institutes of Health has at least one. At present some 3,500 IRBs are in operation in the U.S.74

No doubt different IRBs, in practice, deliver differing degrees of supervision (and thereby, protection), depending on their capabilities and how hard they apply themselves. Some Federal programs review IRB performance; others don't. The Food and Drug Administration, in its routine audits, reviews whether data submitted in the regulation of drugs, medical devices, and so on have been gathered and protected in conformance with the pertinent local IRB stipulations on the particular research protocol. Moreover, each year it inspects the work of several hundred IRBs as to adequacy of structure and performance.

"Evaluation of the risk/benefit ratio is the major ethical judgment that IRBs must make in reviewing research protocols," the OPRR Guidebookemphasizes.75 "Risks to research subjects posed by participation in research should be justified by the anticipated benefits to the subjects or society."

Importantly, the Guidebook states: "A risk is minimal where the probability and magnitude of harm or discomfort anticipated in the proposed research are not greater, in and of themselves, than those ordinarily encountered in daily life or in the performance of routine physical or psychological examinations or tests."76 Rationales of this kind are often invoked in judgments regarding design of research protocols and access to personal data in databases. They deserve elaboration now, to cope more fully with privacy risks in addition to physical and emotional risks.

For research conducted outside the U.S., the Federal Common Rule allows "Department or Agency heads to determine that the procedures prescribed by the institution afford protections that are at least equivalent" and allow substitution of the foreign procedures (§_.001(6)(h)).

Some private-sector institutions, such as managed-care organizations, have established IRBs that function similarly. This is becoming even more desirable now as more research is being performed on data from mixed sources, such as pooled or comparative data from private- sector managed-care organizations and government healthcare payors.

Similar criteria and systems of external oversight are operative in most European countries, and elsewhere.

For some kinds of research, especially perhaps for some database research for which highly dependable protections can be assured, a specially constituted national-level IRB might be workable. Some precedent might be seen in the ethics reviews that were conducted by the now disbanded Recombinant-DNA Advisory Committee. In Europe there has been some experience with multi-country IRBs for clinical trials.

There is no doubt that IRBs enhance research-subject protections and provide much public reassurance. They are an integral part of biomedical research. But it is less clear that IRBs have been attending as vigorously to privacy risks as they have to physical and emotional risks. For many IRBs the workload already is heavy. Now they may well have to be asked to become more deeply engaged with the privacy and confidentiality aspects of subject protection than they have been, in database research as well as in direct experimentation, and with genetic privacy. Whether they are able and willing to do so should be assessed.77


(73) OPRR Guidebook, as cited in endnote (67).

(74) Gary B. Ellis, remarks to the National Bioethics Advisory Commission (October 4, 1996).

(75) OPRR Guidebook, as cited in endnote (67).

(76) OPRR Guidebook, as cited in endnote (67). This is Federal Common Rule §_.002(i).

(77) A recent critique is Harold Edgar and David J. Rothman, "The Institutional Review Board and beyond: Future challenges to the ethics of human experimentation," Milbank Quarterly 73, 489–506 (1995).

5. New Laws in Europe

In contrast to the U.S., most European countries have for some years had in effect broad data- protection laws, based on human rights principles. All focus on personally identifiable data. Most deal with legitimacy of need-to-know; with notification of data-subjects, and consent; with data-subject rights, such as the right to examine data about oneself; with data security; and so on. And they establish remedies and sanctions against violations. 78

Usually the laws are administered through independent national "data protection commissions" or "registrars." These bodies investigate complaints, critique the privacy implications of government programs, mediate privacy disputes, perhaps audit organizations' privacy protections, and represent the country's privacy interests internationally. 79 In some countries, such as Germany, provincial, in addition to federal, data-protection laws and agencies also are important. (Australia, New Zealand, Canada and several of its provinces, South Africa, and Japan also have active data privacy laws and agencies.) Again: The U.S. has no equivalent bodies.

In Europe sensitivities about health data run very high. National healthcare systems of course process huge volumes of data about individuals. In Europe medical data increasingly are being processed via electronic media. Electronic "smart cards" are being tried for medical billing (in Germany) or to carry some health data (in France), but progress is slow, because of both medical objections and privacy concerns. A pan-European "electronic health passport" has been proposed which would carry at least emergency medical information such as blood type and allergy information, but movement toward such a system has met with much opposition on privacy grounds. In France the Health Ministry has announced that by 1999 doctors must submit all of their bills electronically; but the medical establishment is resisting. In the U.K., communication of medical data via a new "NHS-Net" Internet service has been promoted by the National Health Service (NHS); but protests by both doctors and the public, largely over security and confidentiality, have forced a standoff, which has not yet been resolved.

In the past few years most legislatures have been readdressing the issues of informational privacy, especially with respect to data processed electronically. Several have adopted, or are currently considering proposals for, new laws covering health data. Now the issues have gained Europe-wide dimensions. All of this has implications for the U.S. and other countries outside Europe.


(78) For comparative analysis see Colin J. Bennett, Regulating Privacy: Data Protection and Public Policy in Europe and the United States(Cornell University Press, Ithaca, New York, 1992).

(79) For background from the view of a privacy commissioner, see Flaherty, as cited in endnote (1).

The European Union Data Privacy Directive

On October 24, 1995, after five years of deliberation, the European Parliament and the Council of the European Union (E.U.) adopted a "Directive on the Protection of Individuals with Regard to the Processing of Personal Data and on the Free Movement of Such Data" (hereafter, Directive).80

The Directive is extremely broad, covering the processing of all information about individuals. Its dual purposes are aptly expressed in its title. It is not specifically oriented to health data, although at a few points it makes reference to public health and medical data. If enforced literally some of its provisions could be inimical to health research.81

The Directive is a "framework directive" establishing general principles, with which the fifteen E.U. Member States must bring their national "laws, regulations and administrative provisions" into congruence by October 1998 (Article 32).82

"Personal data" and "processing" are defined comprehensively (Article 2).

(a) "Personal data" shall mean any information relating to an identified or identifiable natural person ("data subject"); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.

(b) "Processing of personal data" ("processing") shall mean any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction.


(80) "Directive 95/46/EC of the European Parliament and of the Council," Official Journal of the European Communities No. L 281, 31–50 (November 23, 1995). Available on the Internet in English, Dutch, French, German, Italian, and Spanish via the European Union Web site <http://www2.echo.lu &gt;.

(81) A useful early review was Stefaan Callens, "The Privacy Directive and the use of medical data for research purposes," European Journal of Health Law 2, 309–340 (1995).

(82) The Member States of the E.U. are Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, The Netherlands, Portugal, Spain, Sweden, and the United Kingdom.

Elements of the Directive

The Directive does not restrict the processing of data which are not personally identifiable. But for the processing of those that are, consent from the data-subject generally is required.

Article 7 stipulates that "Member States shall provide that personal data may be processed only if:

(a) the data subject has unambiguously given his consent; or
(b) processing is necessary for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; or
(c) processing is necessary for compliance with a legal obligation to which the controller is subject; or
(d) processing is necessary for protecting the vital interests of the data subject; or
(e) processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller or in a third party to whom the data are disclosed; or
(f) [some other circumstances apply].

The data "controller" is "the natural or legal person, public authority, agency or any other body which alone or jointly with others determines the purposes and means of the processing of personal data" (Article 2(d)).

As for consent, Article 2(h) defines it broadly but firmly:

"The data subject's consent" shall mean any freely given specific and informed indication of his wishes by which the data subject signifies his agreement to personal data relating to him being processed.

Notice that the consent is to be "specific and informed." If applied literally, for some secondary research this would require solicitation of more-focused consent than is now sought.

The exception for "performance of contracts" presumably would apply to healthcare agreements between care-providers and patients. (But does this assume that consent is implicit, or, waived? Consent to what?) The exception for "protecting the vital interests of the data subject" presumably would apply to emergency medical treatment and some other situations where consent is not feasible. Tasks "carried out in the public interest" are treated further in Article 8 (see below).

The Directive addresses data-quality issues (Article 6), such as requiring that "every reasonable step... be taken" to ensure that inaccurate data are erased or rectified. It sets out general public-notification requirements. It notes that data should not be stored longer than is required for meeting the initial purposes of collection. This requirement is directly in opposition to many research needs for retaining data for many years even if later uses cannot be predicted. (Recent large-scale studies of several decades worth of data on the effects of oral contraceptives, and of estrogen replacement therapy, are among the many examples of the societal payback from retaining health research data.) Presumably in implementing the Directive national governments will recognize such requirements, which have long been embodied in regulations and good- practice guidelines covering research on medicines, vaccines, and medical devices.

In the interest of fair use, Articles 10 and 11 set out requirements for the notifying of data- subjects (whether the data have been collected from the subjects directly, or indirectly) as to the identity of the "data controllers," the purposes of the processing, and other circumstances. Article 11(2) provides, however, that the notification requirements "shall not apply where"

in particular for processing for statistical purposes or for the purposes of historical or scientific research, the provision of such information proves impossible or would involve a disproportionate effort or if recording or disclosure is expressly laid down by law. In these cases Member States shall provide appropriate safeguards.

Data-subject rights to inspect records about themselves, object to processing, request correction of erroneous data about themselves, and so on, are affirmed (Article 12). Public registration of processing operations is required (Article 21). The Directive covers all personally identifiable data processed in Europe, regardless of the origins of the data or the data-subject.

For administration and accountability, requirements are set for various supervisory authorities in the E.U. structure and in Member State governments. In most E.U. countries much of this apparatus already is in place, but more will have to be established, and duties will have to be adjusted. Judicial remedies, including compensatory liability, for individuals are required to be made available under Member States' laws for breach of the rights specified in the Directive.

Scope of coverage, and exemptions

Article 8, on "the processing of special categories of data," holds a number of provisions that could be problematic for health research.

¶ 1. Member States shall prohibit the processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, and the processing of data concerning health or sex life.

¶ 2. Paragraph 1 shall not apply where... the data subject has given his explicit consent to the processing of those data... [or where some special circumstances, listed, apply].

¶ 3. Paragraph 1 shall not apply where processing of the data is required for the purposes of preventive medicine, medical diagnosis, the provision of care or treatment or the management of health-care services, and where those data are processed by a health professional subject under national law or rules established by national competent bodies to the obligation of professional secrecy or by another person also subject to an equivalent obligation of secrecy.

¶ 4. Subject to the provision of suitable safeguards, Member States may, for reasons of substantial public interest, lay down exemptions in addition to those laid down in paragraph 2 either by national law or by decision of the supervisory authority.

What kinds of health research will be defined as being within the scope of "preventive medicine, medical diagnosis, the provision of care or treatment or the management of health-care services"? (A systematic check should be made against categories of health research such as those described in Chapter 3 of this Report.)

Will governments realize the importance in health research of taking into account factors relating to "ethnic origin" and "health and sex life"? Surely they should. Much essential public- health research is conducted with the very purpose of aiding subpopulations. Because many health factors are related to origin, research often selects groups by such criteria as ethnic origin to study specific afflictions, causes, or interventions. In pharmaceutical risk and efficacy studies, regulators rightly mandate that ethnic and sexual factors be taken account of. Genetics, dietary habits relating to ethnic background, sexual contacts and practices, and other factors strongly determine how health phenomena differ among people.

How broadly will "substantial public interest" be construed? Possibilities are mentioned in the Directive for a variety of national exemptions and derogations; but exemptions will not be recognized unless Member States positively enact them into their national laws. E.U. leaders have been saying publicly that not many "public interest" exemptions should be expected, but that, rather, safeguards should be emphasized.

Who—for instance, epidemiological analysts performing processing tasks in database research—will be considered to be "health professionals" or others "subject to an equivalent obligation of secrecy"? Presumably analysts can be positioned under responsible "data controllers."

Article 6 requires that personally identifiable data must be "collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes." But, no doubt to the relief of many researchers, it goes on to state:

¶ 1(b). Further processing of data for historical, statistical or scientific purposes shall not be considered as incompatible provided that Member States provide appropriate safeguards.

Conditions on international transfer

Article 25 deals with the movement of data, by whatever means, from E.U. Member States to other countries.

¶ 1. The Member States shall provide that the transfer to a [non E.U.] country of personal data which are undergoing processing or are intended for processing after transfer may take place only if ... the [recipient] country in question ensures an adequate level of protection.

How, in practice, will "adequate level of protection" be determined? What criteria will be applied? Article 25 continues:

¶ 2. The adequacy of the level of protection afforded by a [non E.U.] country shall be assessed in the light of all the circumstances surrounding a data transfer operation or set of data transfer operations; particular consideration shall be given to the nature of the data, the purpose and duration of the proposed processing operation or operations, the country of origin and the country of final destination, the rules of law, both general and sectoral, in force in the [non E.U.] country in question and the professional rules and security measures which are complied with in that country.

By whom and by what process will the determination be made? Article 29 establishes an independent Working Party on the Protection of Individuals with regard to the Processing of Personal Data, comprising representatives from all of the Member States (usually, in practice, their privacy commissioners) and representatives from the Commission structure itself. The Working Party has elected as its first chair Peter J. Hustinx, the President of the Registratiekamer of The Netherlands. The "adequacy" question is among the first topics the Working Party is addressing.83,84

Will the transferability determination be made institution-by-institution (medical clinic, pharmaceutical company, university, contract research firm, government agency)? More likely, E.U. officials suggest, the determination will be made on a country-by-country basis, probably sector-by-sector.

Such assessments surely will be more straightforward for non-E.U. recipient countries having strong national or provincial data-protection laws and authority to enforce them. For this reason, E.U. officials strongly encourage the U.S. to pass a such a law. Although no overall data- protection law is under contemplation in the U.S., no doubt a sound Federal medical-records confidentiality law would go a long way toward meeting the E.U.'s concerns and keeping health- research data flowing.

The Directive leaves doors open for Member States to allow data-transfers to recipients in countries not certified as having adequate protection. Article 26(1)(d) mentions "important public interest grounds," for example, and Article 26(2) holds that a Member State may authorize data transfers "where the controller adduces adequate safeguards" in the recipient country, suggesting that "such safeguards may in particular result from appropriate contractual clauses." This seems to encourage parties wishing to transfer data to establish contractual undertakings regarding data protections.


(83) A background review of U.S. law was prepared for the E.U. Commission: Paul M. Schwartz and Joel R. Reidenberg, Data Privacy Law: A Study of United States Data Protection (Mitchie Law Publishers, Charlottesville, Virginia, 1996).

(84) The Commission has requested a study of the "adequacy" issues from Prof. Yves Poullet of the University of Namur; his report is expected to be delivered soon.

Implementation

According to the Treaty of Rome, under which the E.U. operates, the Member States thus have obligated themselves to bringing their national laws into conformance with the principles of the Directive within three years of adoption (i.e., by October 1998). In this "transposing" they can employ whatever instruments of law—statutes, regulations, decrees, and so on—they deem sufficient. Some believe that their protections already meet most of the Directive's requirements. Others are revising their laws substantially.

The Working Party is to coordinate the implementation with respect to uniform application throughout the E.U., periodically report to the Commission on progress, and eventually give the Commission its opinion on the level of protection in the E.U. and in various non-E.U. countries and "on any codes of conduct drawn up at Community level" (Article 30). A variety of Community implementation requirements are specified.

Some European countries that are not members of the E.U., such as Switzerland, have said that they intend to establish equivalent standards.

Codes of conduct as possible guides

A special provision, which recognizes the sector-specific nature of data, may provide an opening for health professionals to set guidelines to which public authorities could defer

(Article 27).

¶ 1. The Member States and the Commission shall encourage the drawing up of codes of conduct intended to contribute to the proper implementation of the national provisions adopted by the Member States pursuant to this Directive, taking account of the specific features of the various sectors.

Some professional societies are considering drafting codes of practice, as are some industry associations. Such codes would have to be adopted by the practitioners in E.U. countries; eventually recognition could be sought from the E.U.

A Dutch example of the usefulness of such a code may be instructive. During the first years of the 1990s the Council for Medical Research, a medical society, voluntarily established a "Code of Conduct for Medical Research" covering research on pre-existing medical data.85 The Privacy Commission (Registratiekamer) was invited to monitor its implementation. Over several years the government found the Code to be effective, and in 1995 adopted the Code as national law.


(85) The organization is the Stichting Federatie van Medisch Wetenschappelijke Verenigingen. See description of the Dutch situation below.

Legal Revisions in Some European Countries

The following sketches of the situations in six European countries are meant simply to illustrate the kinds of legal activities that are taking place now. All European countries have some protections in operation, and all are now evaluating whether they must make adjustments to comply with the E.U. Directive. 1998 is expected to be a busy year in privacy legislation.

Belgium

Basic privacy law: "Law on the Protection of Privacy with Respect to the Treatment of Personal Data" (Loi relative à la protection de la vie privée à l'égard des traitements de données à caractère personnel) (1992). Authority: Commission for the Protection of Private Life (La Commission de Protection de la Vie Privée / Commissie voor de bescherming van de persoonlijke levenssfeer).

At present a law amending the basic privacy law is being drafted, with the E.U. Directive in mind. The Conseil d'Etat has received the draft. The Minister of Justice has said that he plans to send the draft to the Parliament by the end of 1997.

Also, a draft "Royal Decree relating to the protection of individuals in relation to the processing of data of a personal nature for scientific research in the field of medicine or public health" is being considered. Such a Decree would be subsidiary to the revised omnibus privacy law, and so could not have full legal force until that law is passed; but if adopted it would provide interim guidance.

France

Basic privacy law: "Law on Informatics, Records, and Freedoms" (Loi relative à l'informatique, aux fichiers et aux libertés) (January 6, 1978) (Law No. 78-17). Authority: National Commission on Informatics and Freedoms (Commission Nationale de l'Informatique et des Libertés(CNIL)).

Most French commentators say that the French safeguards in place are sufficiently protective that they meet the requirements of the E.U. Directive, and that no changes in the basic law will be required.

In 1994 an Amendment to the basic law was adopted, on "Computerized Processing of Name-Linked Data for the Purpose of Research in the Health Sector" (Loi du 1er juillet relative au traitement des données nominatives ayant pour fin la recherche dans le domaine de la santé) (Law No. 94-548). There is some controversy about implementation of the Amendment, which probably will be brought into effect during the course of 1997. A national committee has been appointed to give the CNIL its opinion on the scientific aspects of protocols that have been submitted (Comité Consultatif sur le Traitement de l'Information en Matière de Recherche dans le Domaine de la Santé). One aspect at issue is whether each research protocol must be submitted for approval in advance by the Comité Consultatif. Among others involved in discussions over implementation, the pharmaceutical industry association is negotiating for a streamlined process which might involve approval of some general research-protocol provisions, and perhaps for annual or other periodic review rather than study-by-study, to simplify and speed the approval process.86


(86) A recent recommendation on the handling of personally identifiable health data generally was Commission nationale de l'informatique et des libertés, "Deliberation no. 97-008 du 4 février 1997 portant adoption d'une recommandation sur le traitement des données de la santé à caractere personnel," Journal Officiel de la République Française, 5806–5808 (April 12, 1997).

Germany

Basic privacy law: Federal Data Protection Act (Bundesdatenschutzgesetz) (1990). Authority: Federal Data Protection Commissioner (Bundesdatenschutzbeauftragter). State (Länder) data protection laws and agencies also are important.

he German privacy laws are already strict—too strict for health research, some believe— and the general opinion seems to be that they will not need to be deeply modified to conform to the E.U. Directive.

However, several detailed amendments to the basic privacy law are being considered that would better meet the special requirements of health research and public-health activities, such as secondary research use of health data. The Ministries of Justice, Health, Research, Labor, Commerce, and Finance are involved in the discussions, which are being led by the Ministry of the Interior (Innenministerium). Part of the background is a 1995 petition from a working group of 100 German medical societies, which, stating that the Federal Data Protection Act over- emphasizes patients' privacy and impedes health research, urged the Minister of Research to seek changes in law so that health research, with its associated informed consent and ethics review, would be controlled separately and its special dimensions accommodated.87

Much activity in Germany now concerns implementation of a 1994 Law on Cancer Registries (Gesetz über Krebsregister), which requires that by the beginning of 1999 all of the Länder must maintain registries of cancer cases, mainly for epidemiological research.88


(87) An exchange over the issues was Thilo Weichert, "Datenschutz und medizinische Forschung–Was nützt ein medizinisches Forschungsgeheimnis'?" MedR 6, 258–261 (1996), and Hans Joachim Bochnik, "Bestehen Datenschützer auf Forschungsblockaden?" MedR 6, 262–264 (1966).

(88) Bundesgesetzblatt No. 79, 3351–3355 (November 11, 1994). For commentary see J. Michaelis, "Towards nation-wide cancer registration in the Federal Republic of Germany," Annals of Oncology 6, 344–346 (1995).

The Netherlands

Basic privacy law: Data Protection Act (Wet Persoonsregistratie) (1988). Authority: Registration Chamber (Registratiekamer).

In 1995 a new "Code of conduct for medical research," covering research on existing medical data, was adopted by the Registration Chamber after years of development. The Code originated from "the desire to achieve a good balance between the requirements of privacy protection on the one hand and those of scientific research on the other." Under this Code, the conditions on processing data in research are guided strongly by whether the data involved are anonymous, key-coded, or identifiable. Data-subject consent is emphasized, as is working with data in non-identifiable form as much as possible.89


(89) The Netherlands Registratiekamer, "Goed Gedrag: Gedragscode Gezondheidsonderzoek" (Registratiekamer, The Hague, The Netherlands, February 1995); "Gedragscode Gezondheidsonderzoek: Verklaring van overeenstemming inzake de gedragscode," Staatscourant 140, 14–16 (July 24, 1995).

Sweden

Basic privacy law: Data Protection Act (Datalagen) (1973). Authority: Data Inspection Board (Datainspektionen).

A special Governmental Data Act Committee has reviewed the shortcomings of the 1973 Act, which was judged to be inadequate for today's needs. In April 1997 the Committee proposed a total revision of the Data Protection Act, embodying many of the principles of the E.U. Directive but respecting Swedish concerns and the protections guaranteed by the Swedish Constitution. 90

The proposed new national law would allow processing of personally identifiable data "for scientific research and statistics purposes where a research ethics committee has approved the project or where the public interests clearly override the risks to integrity."

The proposed new law now will be considered by the Minister of Justice, who, after consultations and revisions, may submit it to the Parliament, perhaps in the autumn of 1997.


(90) The Committee's report, including a summary in English, is available on the Internet at <http://www.skolverket.se/skolnet/dalk/engsmf.html >.

United Kingdom

Basic privacy law: Data Protection Act (1984), and Access to Health Records Act (1990). Authority: Data Protection Registrar.

Regarding implementation of the E.U. Directive, early in 1996 the Home Office circulated a "consultation document" inviting input on a range of issues from interested parties. The Home Office has been taking consideration of all of the responses and preparing to propose a strategy.

In April 1996 the Data Protection Registrar published a thought-provoking document comparing the E.U. Directive and U.K. law., and raising many questions.91 In her recommend- ation to the Home Office a few months later, the Registrar argued that "we need the seamless approach which only new primary legislation can offer."92

A "Disclosure and Use of Personal Health Information Bill," developed by an interprofessional working party led by the British Medical Association, was introduced into the House of Lords in March 1996 by Lord Walton. But it seems not to have progressed very far.

Of course, now the new Labour government will have to develop its policies in this area.


(91) U.K. Data Protection Registrar, Questions to answers: Data Protection and the E.U. Directive 95/46/EC (Wycliffe House, Water Lane, Wilmslow, Cheshire SK9 5AF, April 1996).

(92) U.K. Data Protection Registrar, Our Answers: Data Protection and the E.U. Directive 95/46/EC (Wycliffe House, Water Lane, Wilmslow, Cheshire SK9 5AF, July 1996).

New Council of Europe Recommendation on Protection of Medical Data

The Council of Europe is an intergovernmental organization of 39 countries, head- quartered in Strasbourg. Compared with the E.U., it comprises 24 more countries (but includes all members of the E.U.), draws heavily upon expertise in its member countries and depends on a relatively smaller staff, and its actions are not formally enforceable.93 The two organizations coordinate their work. E.U. Commission staff represent the E.U. in all important activities of the Council of Europe, as they have done during the recent years' deliberations over data privacy in general and those over protection of health and medical data specifically.

In 1981 the Council of Europe passed an influential "Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data," which set out a number of principles.94 Within a few years most major European countries ratified the Convention. It was on the basis of this Convention, and the deliberations that had led up to it, that most European countries developed their own laws and set up data-protection regimes.

The 1981 Convention is not formally binding. But it has set the tone for much data- protection work, and has been referred to many times in judgments on such issues as international data transfer. Some countries require the obtaining of special permission, or the establishment of a contract surrounding a "data corridor," so to speak, between institutions, before allowing transfer of sensitive data from their country to an institution in a country that has not ratified the Convention or where the protections are deemed weak.

Countries which are not members of the Council of Europe have been encouraged to ratify or otherwise adopt the provisions of the Conventions. The U.S. is not in position to do so, because, among other reasons, it lacks Federal privacy law covering data in the private sector.

In February 1997, after five years' deliberation, the Council of Europe's Committee of Ministers—comprising the foreign ministers of all the Members—adopted a detailed "Recommendation on the Protection of Medical Data" (hereafter, Recommendation).95 Many observers believe that because this Recommendation is specific to medical data and is felt to be practicable, and also because it covers all of Europe, it may well become deferred to as the guiding document for Europe. Governments have already approved it in the Council of Europe, so they must expect to implement its principles. And it is thought that for this sector the E.U. eventually may amend its Directive and explicitly defer to the Council of Europe Recommend- ation.

Even though its title refers to "medical" data, the Recommendation in Article 1 makes clear that it covers health data broadly:

The expression "medical data" refers to all personal data concerning the health of an individual. It refers also to data which have a clear and close link with health as well as to genetic data.

The Recommendation's concerns are to protect personally identifiable data, but it notes (Article 1): "An individual shall not be regarded as 'identifiable' if identification requires an unreasonable amount of time and manpower."

Article 3 limits the circle allowed to process health data:

In principle, medical data should be collected and processed only by health-care professionals or by individuals or bodies working on behalf of health-care professionals. ... Controllers of files who are not health-care professionals should only collect and process medical data subject either to rules of confidentiality comparable to those incumbent upon a health-care professional or to equally effective safeguards provided for by domestic law.

Article 4.3 affirms: "Medical data may be collected and processed if provided for by law for public health reasons... or another important public interest."

The Recommendation includes the standard fair-practice requirements to inform subjects, seek informed express consent, allow data-subject access and rectification of data, and the like.

Article 12, "Scientific Research," lays out this series of conditions:

12.1.

Whenever possible, medical data used for scientific research purposes should be anonymous. Professional and scientific organizations and public authorities should promote the development of techniques and procedures securing anonymity.

12.2.

However, if such anonymization would make a scientific research project impossible, and the project is to be carried out for legitimate purposes, it could be carried out with personal data on condition that:

a.

the data subject has given his/her consent for one or more research purposes;

or

b.

[provision having to do with legally incapacitated subjects];

or

c.

disclosure of data for the purpose of a defined research project concerning an important public interest has been authorized by the body or bodies designated by domestic law, but only if:

i.

the data subject has not expressly opposed disclosure; and

ii.

despite reasonable efforts, it would be impracticable to contact the data subject to seek his consent; and

iii.

the interests of the research project justify the authorization;

or

d.

the scientific research is provided for by law and constitutes a necessary measure for public health reasons.

Transfer of personally identifiable data from a country which has ratified the Convention of 1981 of the Council of Europe to countries which have not is to be prohibited—unless "equivalent protection" is ensured, perhaps by contract, "and the data-subject has the possibility to object to the transfer" (Article 11).

An important question for the coming period is how the considerations of this Council of Europe Recommendation on the Protection of Medical Data will intersect with those in the implementation of the E.U. Data Privacy Directive.


(93) The Members of the Council of Europe are Albania, Andorra, Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Moldavia, The Netherlands, Norway, Poland, Portugal, Romania, Russia, San Marino, Slovakia, Slovenia, Spain, Sweden, Switzerland, "the Former Yugoslav Republic of Macedonia," Turkey, Ukraine, and the United Kingdom.

(94) Council of Europe, "Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data," European Treaty Series No. 108 (January 28, 1981).

(95) Council of Europe, "Recommendation of the Committee of Ministers to Member States on the Protection of Medical Data," No. R (97) 5 (February 13, 1997). For context see also the "Explanatory Memoranda."

Dialogue between the u.s. And Europe

For the U.S., it will be very important over the next few years to engage in high-level, broadly based dialogue with European leaders over the implementation of the E.U. Directive and the Council of Europe Recommendation. Discussions will have to be held with national governments and with intergovernmental organizations. Health data and health research must be addressed specifically; they simply cannot be dealt with in the same way as banking, credit, tax, education, transport, or criminal data.

In these discussions private-sector organizations involved with health research should participate fully. So should regulatory agencies that require international transfer of health data.

Focal issues regarding health research will be:

  • Specifics in the implementation of the E.U. Directive by Member States, and pan-E.U. decisions taken by the E.U. Working Party, the Commission, and the European Parliament.
  • Especially, the determination of "adequacy" of conditions for transfer of data from the E.U. to the U.S. and other countries outside the E.U.
  • The adoption by Members of the Council of Europe of the "Recommendation on Protection of Medical Data" and its detailed implications for practice.
  • Recognition of the special needs in health research (such as the need to take ethnic and sexual factors into account, the need to accommodate secondary studies in databases, the need to retain data for a long time, and the like).
  • Recognition of the special requirements already established in government regulation of research, development, and postmarketing study of pharmaceuticals, biological products, diagnostics, and medical devices.
  • Recognition of the need to harmonize with the forthcoming E.U. Clinical Practice Guidelines (now in draft) and other international research guidelines.
  • Emphasis on the need for uniform criteria and standards that will foster the international flow of health data.

In all of this, the U.S. government and other American organizations should not only be asking for concessions and exemptions, but also taking the opportunity of this period of reform to improve the ways they themselves handle these matters, and exerting international leadership.

6. The U.S. Legal Context

Although they are more fragmented, more specific to data categories, and less uniform than those in some European countries, many protections are in place in the U.S.96


(96) For explication via another set of adjectives, see Robert M. Gellman, "Fragmented, incomplete, and discontinuous: The failure of federal privacy regulatory proposals," Software Journal 6, 199 (1993). A more recent article by Gellman is: "Can privacy be regulated on a national level? Thoughts on the possible need for international privacy rules," Villanova Law Review 41, 129–172 (1996).

Standard Protections

Professional ethics

Codifying the precepts of Hippocrates for current guidance, the Code of Ethics of the predominant U.S. medical society, the American Medical Association, specifies in its Core Principle IV that physicians "shall safeguard patient confidences within the constraints of the law." The Code's Opinion 5.05 affirms that "the information disclosed to a physician during the course of the relationship between physician and patient is confidential to the greatest possible degree. ... The physician should not reveal confidential communications or information without the express consent of the patient, unless required to do so by law." Its Opinion 5.07 insists: "The utmost effort and care must be taken to protect the confidentiality of all medical records, including computerized records"; then it lays out a series of safeguards.97

This and similar codes of practice provide guidance, and they are important as standards to which doctors are held in ethics inquiries and in court and other formal judgments.


(97) American Medical Association, Council on Ethical and Judicial Affairs, Code of Medical Ethics: Current Opinions with Annotations(American Medical Association, 515 North State Street, Chicago, Illinois, 1997).

Healthcare-provider certification and licensing

Under State medical certification and licensing laws, physicians are obliged to respect patients' privacy. But, contrary to widespread public belief, the physician–patient confidentiality privilege relates more to whether courts can force disclosure, than to whether physicians may reveal data to insurance companies or employers. Tort law provides partial reinforcement of obligations to confidentiality in healthcare relationships.

Weaker licensure laws cover nurses, speech therapists, psychologists, clinical laboratory personnel, and some other healthcare professionals. Nominally, under their licenses and their contracts with healthcare institutions, doctors and some other practitioners are responsible for the actions of persons working under their supervision. (But in actual practice, how firmly do they supervise?) These laws vary considerably.

Healthcare contracts

Healthcare provider obligations to protect medical confidentiality are specified in the statutes and patient documents of such Federal healthcare programs as Medicare (for persons 65 years of age and over), Medicaid (administered by the States, with joint Federal/State funding, for low-income and other disadvantaged persons), the Veterans Administration (for military veterans), and the Indian Health Service (for Native Americans).

Similarly, the service contracts of private-sector healthcare plans routinely assure that patient records will be held confidentially, although they may also require, as a condition for coverage, patient–member authorization of "administrative" or "research" use of their data.

Human-subjects protections

Respect for privacy is incorporated in a variety of human-subjects protections, such as the mandatory informed consent, Institutional Review Board supervision, and other requirements discussed in Chapter 4, most of which are Federal rules.

Some researchers and research settings are not covered. For example, an independent physician working in his own clinic may test an experimental clinical treatment; and although of course he is subject to a variety of licensure and other legal controls, and on the advice of his attorney he almost certainly will seek patients' informed consent, he is not required by Federal law to have his research supervised by an Institutional Review Board.

Certificates of Confidentiality

For defense against subpoenas, court orders, and other externally compelled disclosures of health-research data, the Public Health Service Act provides that special legal-confidentiality protection may be issued:98

The Secretary [of Health and Human Services] may authorize persons engaged in biomedical, clinical, or other research (including research on mental health, including research on the use and effect of alcohol and other psychoactive drugs) to protect the privacy of individuals who are the subject of such research by withholding from all persons not connected with the conduct of the research the names or other identifying characteristics of such individuals.

Persons so authorized to protect the privacy of such individuals may not be compelled in any Federal, State, or local civil, criminal, administrative, legislative, or other proceedings to identify such individuals.

Currently protection of this kind is granted in the form of "Certificates of Confidentiality" which are issued, upon application, for particular projects. The research need not be Federally sponsored to qualify. Once a Certificate is granted, the researcher must apply it. This mechanism allows researchers to give firm prior assurance of confidentiality, with few possible exceptions (such as for tightly controlled Federal audits), to data-subjects. If they wish to, data-subjects themselves may authorize specified disclosures.

Incidentally, unless a Certificate of Confidentiality has been obtained before research begins, or some other legal protection obtains, research data may be vulnerable to legally compelled disclosure. This legal area deserves some public policy review generally, and it may be acquiring new dimensions now (for instance, with respect to secondary research in databases, in which data are transferred and "reside" far from their source).


(98) Public Health Service Act § 301(d); 42 United States Code 241(d).

The Federal Privacy Act (99)

The Privacy Act of 1974 covers personally identifiable data held by the Federal government, no matter what their source or subject, that are stored in "systems of records" from which data are retrieved by personal identifiers. Thus it covers regulatory data held by the Food and Drug Administration, statistical data held by the National Center for Health Statistics, public- health surveillance data held by the Centers for Disease Control, and the like. It requires that agencies announce in the Federal Register the purposes and uses of the system of records, and that notice be provided to the data-subjects. It provides that individuals must be allowed upon request to see information about themselves. And it prohibits disclosure of data without the consent of the data-subject, except in some special circumstances set out in the Act.

However, under the Privacy Act the Federal agencies are allowed wide discretion in making disclosures pursuant to their mandates. They may designate information as being eligible for "routine use" disclosure without the consent of the data-subjects if it is "for a purpose which is compatible with the purpose for which it was collected." "Routine uses" must be announced in the Federal Register, and the conditions on use are restrictive.

The Department of Health and Human Services provides for "routine use" disclosure of specified data sets for health research, imposing conditions on disclosure and use.100

The Privacy Act has been widely noted to have serious weaknesses, among them that:101

  • It does not cover data held outside the Federal government.
  • It covers only data about U.S. citizens and aliens permanently residing in the U.S., not data about citizens of other countries.
  • Its "routine use" provision is lax.
  • Few legal avenues are provided for citizens to seek injunctive or other relief if they believe their rights are being violated.
  • Its protections do not continue after the death of the data-subject.

The Privacy Act does not negate the provisions of the Freedom of Information Act (the law that provides "transparency" in Federal records by allowing citizens access to them).102 Exemption 6 of the Freedom of Information Act states that the Act does not apply to "personal and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy." A few Freedom of Information demands for access to personally identifiable health data have succeeded, but for the most part health-research data have been defended.103


(99) 5 United States Code 552a.

(100) 5 United States Code 552a(e)(4)(D).

(101) For critique of the Privacy Act, see Schwartz and Reidenberg, as cited in endnote (83).

(102) 5 United States Code 552.

(103) See Gostin, as cited in endnote (4), pp. 501–503.

Federal Agencies' Statutes

Adding specificity and strictness beyond the Privacy Act, the statutes governing many Federal agencies that conduct, support, or regulate health research set detailed privacy-protection requirements.104 Generally these statutes' requirements extend to the agencies' grantees and contractors, and must be recognized in contractual arrangements with the agencies.

Special provisions of the Public Health Service Act apply to dedicated facilities that provide treatment for alcohol or drug abuse with Federal assistance. These set very strict rules on disclosure of data acquired in treatment, rehabilitation, training, and research.105


(104) Public Health Service Act § 306(d), 42 United States Code 242K(d), does so for the National Center for Health Statistics, for example, and § 903(c), 42 United States Code 299a-1, for the Agency for Health Care Policy and Research.

(105) Public Health Service Act § 543, 42 United States Code § 290dd-2.

State Laws and Activities

State laws, which have "just grown" independently over the years, vary greatly in the ways and extents to which they protect privacy of health information. Most recognize some form of patient–physician privilege (the patient right to defend against forced court disclosure of his record), but the scope of protection varies greatly. Most require that medical records be held closely, but allow a variety of disclosures for insurance and other "legitimate" purposes. All require, though not uniformly, that physicians and clinical laboratories notify public-health authorities of certain communicable diseases and some kinds of trauma (gunshot wounds, indications of child abuse...), and they may set constraints on disclosure of those data.

A recent analysis of State laws by Lawrence Gostin and colleagues found extreme variance in coverage of public-health data:106

Virtually all states reported some statutory protection for governmentally maintained health data for public health information in general (49 states), communicable diseases (42 states), and sexually transmitted diseases (43 states). State statutes permitted disclosure of data for statistical purposes (42 states), contact tracing (39 states), epidemiologic investigations (22 states), and subpoena or court order (14 states).

A number of States have statutes dealing with the confidentiality of personal data relating to specific diseases, such as cancer, HIV–AIDS, or mental-health problems. State legislative activity continues, with genetic data especially receiving attention.

Over the years State courts have rewarded penalties against unwarranted disclosure of health data on grounds of malpractice, breach of contract or implied contract with patients, invasion of privacy, and public embarrassment.

Having reviewed the above legal matters, in 1993 the U.S. Office of Technology Assessment summarized:107

This patchwork of State and Federal Laws addressing the question of privacy in personal medical data is inadequate to guide the health care industry with respect to obligations to protect the privacy of medical information in a computerized environment. It fails to confront the reality that, in a computerized system, information will regularly cross State lines, and will therefore be subject to inconsistent legal standards with respect to privacy. The law allows development of private sector businesses dealing in computer databases and data exchanges of patient information without regulation, statutory guidance, or recourse for persons who believe they have been wronged by abuse of data. These laws do not address the questions presented by new demands for data prompted by computerization, and the obligations of secondary users in accessing and maintaining data.


(106) Lawrence O. Gostin, Zita Lazzarini, Verla S. Neslund, and Michael T. Osterholm, "The public health information infrastructure," Journal of the American Medical Association 275, 1921–1927 (1996).

More detail is given in Lawrence O. Gostin, Zita Lazzarini, and Kathleen M. Flaherty, "Legislative survey of state confidentiality laws, with specific emphasis on HIV and immunization," report of a study sponsored by the U.S. Centers for Disease Control and Prevention, the Council of State and Territorial Epidemiologists, and the Task Force for Child Survival and Development of the Carter Presidential Center (July 2, 1996).

(107) U.S. Congress, Office of Technology Assessment, Protecting Privacy in Computerized Medical Information, p. 15 (Report No. OTA-TCT-576, U.S. Government Printing Office, Washington, DC, September 1993).

The New Insurance Law

In August 1996 a Health Insurance Portability and Accountability Act was signed into law.108 The Act set new requirements for private health insurance, established new ways for providing health insurance, and created a framework for standardizing transmission of information for financial and administrative transactions relating to health care.

The law's "Administrative Simplification subtitle (F)" establishes several requirements relevant for privacy and research. Standards for electronic financial and administrative transactions must be adopted by the Secretary of Health and Human Services (HHS), including providing for "a standard unique health identifier for each individual, employer, health plan, and health care provider for use in the health care system" (§1173(b)). Such an identifier number may prove very useful for keeping track of research subjects, linking data, and so on, but its confidentiality will have to be safeguarded carefully.

The law also requires that the Secretary develop security standards and safeguards (§1173(d)). Within twelve months of the law's enactment (i.e., by August 1997) she must submit "detailed recommendations" to the Congress "on standards with respect to the privacy of individually identifiable health information" (§ 264). Among other matters these recommend- ations must cover data-subjects' rights, procedures for assuring those rights, and rules on use and disclosure of the data.109

On all of these matters the Secretary is required to consult the National Committee on Vital and Health Statistics. In early 1997 the Committee held a series of public hearings and will duly advise the Secretary.110

Even though the privacy-protection standards to be established under this law apply mainly to administrative and financial transactions in health care, the data covered (such as the "Medicaid" data which are so important for understanding the health problems of low-income people) are the subject of much research. Moreover, the standards surely will set some example for future standards covering other aspects of health data.


(108) Public Law 104-191, known as the "Kennedy–Kassebaum Act" after its Senate sponsors.

(109) If "legislation governing standards with respect to the privacy of individually identifiable health information [relating to electronically transmitted claims]" is not enacted by 36 months after this Act was enacted, the Secretary [of HHS] must promulgate final regulations containing such standards no later than 42 months after the Act was enacted. (§264).

(110) Transcripts of the hearings are available on the Internet at < http://ncvhs.hhs.gov >.

Proposed New Medical Privacy Laws

Several Federal bills governing medical privacy, or fair health information practices, have been proposed in the past few years. The previous Congress considered a broad "Medical Records Confidentiality Act of 1996" (Senate Bill 1360, proposed by Senator Robert Bennett, Republican from Utah); a "Medical Privacy in the Age of New Technologies Act of 1996" (House of Representatives Bill 3482, by Congressman Jim McDermott, Democrat from Washington); and a "Fair Health Information Practices Act of 1997" (House of Representatives Bill 52, by Congressman Gary Condit, Democrat from California).

Each of these Bills has distinctive features, but generally they seek to establish uniform national rules on the collection and protection of personally identifiable health data, no matter where they are held; affirm rights of data-subjects; set criteria and procedures for disclosure, fair use, and security; focus responsibilities for ensuring proper protection and use; and establish penalties for wrongful use of data.

In addition, versions of a genetic confidentiality act are being proposed. A prominent one is the "Genetic Confidentiality and Nondiscrimination Act of 1996" (Senate Bill 1898, co-sponsored by Senators Pete Domenici, Republican from New Mexico, and James Jeffords, Republican from Vermont). (Genetic issues are discussed on pages 74–76).

The negotiations over these Bills in the current Congress are moving quickly, and this Report cannot comment on the legislative fray. But it must remark that a broad medical privacy law would foster nationwide uniformity of practices, provide guidance over private-sector data, and be relevant for "adequacy of protection" determinations regarding international transfers of data. Genetic data should be covered firmly by an omnibus medical privacy law, with special genetic provisions stipulated if necessary, but because genetic factors are so thoroughly integrated with other health factors, a separate law on genetic privacy is not desirable, nor would one be on any other particular health condition or disease.

7. Major Current Issue Clusters

The Report concludes by describing four large groups of issues that, while not entirely new, are growing rapidly in scale and complexity, and must urgently be attended to:

  • Secondary uses of data, and data linking
  • Research on private-sector health data
  • Cybersecurity
  • Genetic privacy.

Secondary Use of Data, and Data Linking

Secondary use is, as it sounds, use of data subsequent to the original use. As this Report has affirmed, much highly beneficial health research depends on it. The research may be performed either by the parties who initially collected the data, or by others, and either for reasons similar to the original ones, or for very different purposes. Most of the ethical and legal issues have to do with consent of the data-subject, and protections.

As databases are maturing and increasing in size and quality, their appeal as research resources also is growing. Thus the databases of healthcare finance systems and managed-care organizations, among others, are much in demand. These large collections of standardized, computerized data have much information to yield. But so do smaller, highly specialized data collections.

The data hunger of managed care, and of national healthcare systems, is insatiable. Ultimately the public will benefit from research studying the systems themselves as systems, as well as from research that uses data in the systems for external purposes. Much of this research will have to be performed retrospectively.

The privacy concerns surrounding secondary-research use begin the same way as for all research: Must the data be used in personally identifiable form, or can they be used in anonymized or key-coded form? If the data need to be transposed from identified to non- identifiable form, can this be performed effectively and efficiently? Usually these questions can be answered straightforwardly.

If it is decided that data must be used in personally identifiable form, then the most difficult issue is consent. Have the subjects agreed in advance to the new use, or, should they be approached and asked for new consent? Going back to the data-subjects to ask for re-consent may be difficult or even impossible—people relocate, change their names, change their healthcare providers, die—and it may be costly. And obviously even the act of going back to the subjects has to be done without violating their privacy.

Consent scenarios in secondary research

Research projects may be seen as falling into scenarios such as the following, which overlap but nonetheless may be helpful in structuring rationales regarding identifiability and consent. This scheme is proposed here in the hope that it will attract discussion and development.

Scenario A: Data-subjects have given consent to a future secondary use, under specified conditions.

A straightforward example would be an expected follow-up study to review outcomes. Increasingly, investigators are trying to anticipate future uses, and are seeking appropriate consent when they collect the data. If done properly, this should be acceptable. Ideally, scope of use and time-limits should be specified.

Scenario B: Although data-subjects have not given consent to a secondary use, the purposes of the secondary use are similar to those for the original use, and protections can be assured.

The judgment on this will vary by circumstances. An assessment must be conducted, taking account of benefits and risks, for instance, and perhaps asking whether, ultimately, members of society who are similar to the data-subjects (such as people suffering from the same illness, or having similar vulnerabilities) are likely to benefit from the research.

Some insights on such a similar-use principle may be derived from experiences with the U.S. Privacy Act's "routine use" provision.

Scenario C: Although data-subjects have not given consent to a secondary use, that use is judged to be minimally intrusive, an protections can be assured.

If this appears to be the scenario, an assessment must be conducted. Of course there are issues of who does the assessing, and by what criteria. Again some precedent may exist. The U.S. Federal Common Rule (§_.101(b)(4)) authorizes Institutional Review Boards to "waive the requirements to obtain informed consent provided the IRB finds and documents that:

  1. the research involves no more than minimal risk to the subjects;
  2. the waiver... will not adversely affect the rights and welfare of the subjects;
  3. the research could not practicably be carried out without the waiver...; and
  4. whenever appropriate, the subjects will be provided with additional pertinent information after participation.

Scenario D: Data-subjects have given broad consent to unspecified future secondary uses.

Often such consent is considered to have been secured when people sign contractual care agreements with health-care organizations or insurers. But, as Ruth Faden was quoted on page 40 of this Report as saying, broad unqualified consent to unspecified purposes is neither reassuring nor protective. Even if waivers exist on file somewhere, specific informed consent should be sought if possible. If that is not possible, a partial corrective is to ensure that identifiable data will be handled only within a defined research group, under defined protections.

In countries having national healthcare systems, there may be misunderstanding or disagreement over whether a person, simply by the act of availing himself of service, implies consent to unspecified future research on the resulting data.

The statutes governing the U.S. Medicare and Medicaid programs allow analysis of the data under very controlled conditions—so long as the research purposes accord with the purposes of the programs studied, and specified procedures are followed and safeguards enforced.

In some situations researchers seek group or community consent, or approval to approach individuals to request consent, via public meetings and discussions with group or community leaders.

Scenario E: Data-subjects have not given consent to a secondary use,and the impact on privacy is not necessarily negligible.

Unless the circumstances are compelling against it—such as if the research concerns a public-health emergency—new informed consent probably should be sought.

All of the above scenarios concern data that are personally identifiable. For secondary research: If data truly are anonymized, consent should not be an issue; if data can be transposed through an effective key-coding process into non-identifiable form, again consent should not be an issue.

Example: Research in HCFA databases

An example of the conditions that may be imposed on secondary research is instructive here. The protections for the personal data in the giant Health Care Financing Administration (HCFA) databases, which contain Medicare records on some 37 million older Americans and Medicaid data from 29 States on 22 million additional beneficiaries, are stringent. (The databases were described on page 19.)

HCFA's "Agreement for Release of [HCFA] Public Use Files" (files in which all personal identifiers have been removed) begins by saying: "In order to ensure the confidence of the American public regarding the confidentiality of information collected and maintained by the Federal government" HCFA expects recipients of its data to comply with specific requirements. Among other undertakings, data-recipients must:

  • Designate an official custodian for the files;
  • Agree to maintain a specified set of safeguards;
  • Agree "that the recipient shall neither publish nor release any information that is derived from the file and that could reasonably be expected to permit deduction of a beneficiary's identity";
  • Agree that without express written authorization from HCFA, the recipient will make no attempt to link records in the files to any other source of personally identifiable information; and
  • Agree that the recipient "shall make no attempt to identify any specific individual whose record is included in the file" or "unencrypt any person- level information in the file."

HCFA also releases personally identifiable data, under an even more strict "Agreement for Release of [HCFA] Data with Individual Identifiers." Among its additional provisions are that recipients must:

  • Submit a statement delineating the exact research purposes;
  • Promise to return the data or submit a certificate of destruction; and
  • Recognize the right of HCFA to inspect physical security arrangements.

A great many useful studies are performed under these Agreements. The rules are not easy to enforce, because it is difficult for HCFA to follow what the researchers actually do in practice. But surely they are the kinds of rules that make sense. Federal statutory penalties may be imposed if the rules are not followed, or if the data are used wrongfully. And from time to time HCFA does investigate and does sanction offending researchers and their institutions.

Data linking

Related to secondary research is data linking, in which associations (links) are made between data on the same data-subject(s) in more than one data collection.111

(In everyday life we do this when we match a name on one list, say a school student list, with the list of names in the telephone directory to make a best-guess at her address and parents' names, and then, because it "rings a bell," find ourselves associating the mother's name with a name on a list of local attorneys, and so on.)

Linking may occur within a data set, or between data sets. It may occur within an organization, or between organizations. It may involve health data only, or health data and other data (such as lifestyle, socioeconomic, or police data).

Typical of how secondary analysis, with data linking, can be useful is indicated by this example:112

To examine health status of nursing home residents, cost issues, quality of care concerns (e.g., pressure ulcers, methicillin-resistant Staphylococcus aureus, or [hospital incurred] infections), outcomes (mortality, readmissions), and prevention for residents eligible for both Medicare and Medicaid, data are needed from two sources: Medicaid data to identify nursing home residents and their characteristics, and Medicare Part A data to assess hospitalization episodes.

Beyond such considerations as consent, the concern about particular linking studies usually is whether they might assemble "too much" information about data-subjects or the social groups of which they are representative, even if personal identities are not revealed, and/or whether the linking can lead to data-subjects' becoming identifiable by deduction.


(111) U.S. Agency for Health Care Policy and Research, Report to Congress: The Feasibility of Linking Research- Related Data Bases To Federal and Non-Federal Medical Administrative Data Bases, AHCPR Publication No. 91-0003 (AHCPR, Rockville, Maryland, April 1991).

(112) Sanjaya Kumar and Charles Lucey, "Patient privacy and secondary use of administrative databases," Letter to the Editor, Journal of the American Medical Association 276, 1138–1139 (1996).

Is it "research"?

This question frequently arises with respect to nonroutine medical procedures, and may arise with respect to public-health surveillance, as were mentioned earlier, but it also arises with respect to many secondary analyses of data.

What, for example, is the status of studies performed by, or for, private-sector managed- care organizations on data they collect in providing care? As care-providers and as businesses, they review the ways their patient-members utilize services, the effectiveness of screening and diagnostic tests, the patterns of clinical practices, use of pharmaceutical and surgical and other resources, outcomes, costs incurred and cost-effectiveness achieved, the "market" need and demand for aspects of health care, and so on. For such analyses they use both their own and others' data. The formality of the studies ranges from casual internal scanning to scholarly analysis.

"Research" is defined by the Federal Common Rule this way (§_.102(d)):

"Research" means a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.

The Belmont Report, from which the above statement was adapted, expressed it more the way scientists would:113

The term "research" designates an activity designed to test an hypothesis, permit conclusions to be drawn, and thereby to develop or contribute to generalizable knowledge (expressed, for example, in theories, principles, and statements of relationships).

While some investigations within, say, managed-care organizations are of such scientific quality as to be generalizable (perhaps, publishable in peer-reviewed journals), many are not. Moreover, even if of quality to be generalizable, the findings may possibly be held internally for business advantage and not made generally available; they could, however, be thought of as being generalizable to the patient-member population.

An activity deemed to be "research on human subjects" may, depending on the context, fall under the Federal laws discussed, and require supervision by an IRB and the like. Private- sector organizations should work their way carefully through the issues of consent and data- subject protection, including the coverage of secondary studies that might formally be considered research.


(113) As cited in endnote (18), at (§ A).

Research on Private-Sector Health Data

The point here is large but can be made succinctly. Immense volumes of personally identifiable data and lightly masked key-coded data, as well as effectively key-coded or anonymized data, are handled by managed-care organizations, pharmaceutical and related companies, and other private-sector institutions. Some State legal controls apply, as may the Privacy Act and Federal laws where there is Federal involvement. Some managed-care organizations have chosen to conduct their research under the scrutiny of Institutional Review Boards.

But for many health data held in the private sector, few legal controls apply in theory or are enforced in practice regarding such matters as data-subject consent, public notification, Institutional Review Board supervision, or transfer of the data for secondary study. Effective privacy, confidentiality, and security safeguards may well be in place, but this may not be fully evident. A complication now is that much important research is being performed on private- sector data by government and other external organizations, and private-sector data are being mixed with, or examined in parallel with, public-sector data for study.

Several of the Federal confidentiality or fair-use laws now being considered in the U.S. would bring these private-sector data under much fuller coverage of law. As was mentioned earlier, lack of legal coverage of these data is seen by Europeans as being a major weakness in U.S. personal-data protections, and a reason they resist allowing transfer of personal data from Europe to the U.S.

The status of private-sector health data deserves to be reviewed. Probably it should brought under a uniform Federal regimen.

Cybersecurity

Keeping data secure obviously is part of the craft of privacy-protection. Electronic processing poses security challenges far more complex than paper processing does. In networked computerized systems the notions of "record" and "file" lose much of their meaning; data can be copied, split apart, reordered, assembled into new combinations, altered, and moved around with technical ease. Moreover, data "location" in networks is elusive, being a matter of shifting multiple access-points on interconnected web segments. The nets themselves may easily transcend geopolitical boundaries. Thus the rubric, "cybersecurity," is used here to connote the new character of the problems.

Sheer scale and interconnectedness of databases can be cause for concern. As was vividly expressed by Ross Anderson, referring to the U.K. National Health Service's system-wide "NHS- Net":114

We may not be much concerned that a general practitioner's receptionist has access to the records of 2,000 patients; but we would be very concerned indeed if 32,000 general practitioners' receptionists all had access to the records of 56,000,000 patients.


(114) Ross J. Anderson, Security in Clinical Information Systems, p. 5 (Commissioned for the Council, British Medical Association, Tavistock Square, London WC1H 9JP, January 1996). This is a solid review in British context. Available on the Internet at <http://www.cl.cam.ac.uk/users/rja14/policy11/policy11.html >.

Basic security measures

Security has many dimensions. The challenge is to keep data sequestered and protect its integrity, but at the same time to keep it accessible for authorized users who have legitimate need to use it.

In its provocative recent report on these issues, For the Record: Protecting Electronic Health Information, a committee of the National Research Council recommended immediate implementation of these technical practices and procedures:115

  • Individual authentication of users
  • Access controls based on legitimate need-to-know
  • Audit trails (maintaining access logs)
  • Physical security and disaster recovery (limiting physical access, carefully storing backup data)
  • Protection of remote access points (controlling external access)
  • Protection of external electronic communications (not sending personally identifiable data over public networks)
  • Software discipline (virus-checking, controlling software installation)
  • System assessment (testing security on an ongoing basis).

The committee also recommended adoption of these organizational practices:

  • Security and confidentiality policies
  • Security and confidentiality committees
  • Information security officers
  • Education and training programs
  • Sanctions
  • Improved authorization forms
  • Patient access to audit logs.

The report discussed all of these, and more advanced future practices, in detail. The committee "believes that adoption of these practices will help organizations meet the standards to be promulgated by the Secretary of Health and Human Services in connection with the Health Insurance Portability and Accountability Act—or can inform the development of such standards."

A special problem for the management of data in research is: How are various consents and differential access conditions to be trailed along with various data as the data are moved around, combined with other data, linked to other data, split apart into new combinations of data, and processed by different users for different purposes?


(115) NRC, Computer Science and Telecommunications Board, as cited in endnote (9).

Personal-data enclaves

Can and should cordons be drawn around the units of organizations that process personally identifiable data?

Most organizations that perform research on personally identifiable health data—clinical centers certainly, many academic units (such as those that perform detailed analyses of healthcare outcomes or economics), and pharmaceutical and related companies, for example—seem gradually to have come to consider themselves as being, in effect, health-data "enclaves." They transfer sensitive data rather freely within their organizations, and with other organizations under agreement, through a variety of communications conduits. Students, secretaries, data-entry clerks, and many others enter data into computers from paper records, make copies, send data around, and so on. Affiliated scientists who are not medically certified may be involved in analyzing and discussing the data. Some of those involved may legitimately be working under the supervision of a health professional; some may be bound by the terms of their employment not to reveal outside the organization personal data of which they become aware; but some may be little constrained.

Within Federal laboratories and research centers, and within most of the centers they support or regulate, data-access measures are maintained to varying degrees of strictness. Some research organizations formally authorize certain operating units, and internally certify some personnel, to work with personally identifiable data; but many organizations do not. The Clinical Center of the National Institutes of Health, for instance, specifically certifies personnel to handle patient data.

Likewise, organizations may, or may not, focus responsibility for ensuring data- confidentiality internally, and for assuring the public externally.

Organizations that perform research on personally identifiable health data can enhance confidentiality protection and security by: delimiting zones of access to personal data, formally establishing personal-data enclaves; internally training and certifying personnel to work in those enclaves; and focusing responsibilities for these matters.

Genetic Privacy [116]

As the newsmedia are constantly reminding us, the world has entered an entirely new era in genetics: The human genome is being mapped, incredibly sensitive and precise genetic tests have been developed, genetic screening has become commonplace, and an almost incredible array of genetic interventions is being explored.

But the world has by no means prepared itself to cope with the genetic-privacy issues that accompany the scientific advances.117 Genetic analyses and interventions have exceedingly sensitive attributes:

  • They broadly relate to health, to qualities of life, and to sense of fairness in the lottery of birth and treatment of the disadvantaged.
  • They relate to race, ethnicity, and parentage.
  • They relate to gender (and maybe to sexuality).
  • They relate to mental competencies and tendencies, and to behavioral predispositions.
  • They have relevance for descendants, and therefore possibly to reproductive choices.

So, are genetic data fundamentally different from other health data? One is tempted, for the reasons above and others, to think, Yes, of course: They can be very precise and determinative on core aspects of life and health, and they affect family and other relations. On reflection, though, the answer becomes, No, not really: Countless other health data can be precise and determinative (and besides, often genetic risk factors are just risk factors among others); and many kinds of health data have implications for family and other relations.

Until recently, genetic analyses mainly were able to identify the presence of genes which strongly determine diseases, such as cystic fibrosis, Tay-Sachs disease, and sickle cell trait. Some 5,000 such conditions are now known. What is changing rapidly now is that we are becoming able to identify genetic factors that increase disease risk but are not uniquely the determinants of disease—genes that relate to obesity, for instance, and some kinds of breast, prostate, and colon cancer, and susceptibility to alcoholism. We are learning more about genetic contributions to diabetes, and heart disease. It may well not be evident what genetic data imply for the person's health, or what interventions or other responses might be reasonable. The information can be very unsettling. Knowing this kind of genetic information may or may not be helpful or comforting.118

To take a poignant example, tests can determine, before she is born, that a girl has inherited the BRCA-1 and -2 genes, which predispose to breast cancer. But what should her parents be told and advised to do, and what and when should the girl be told, and how should she be protected against discrimination, and how can she be helped to minimize her risk? For many genetic conditions, of course, steps can be taken to minimize the health risk, such as by attending carefully to diet or other aspects of lifestyle, and monitoring for expression of the disease.

A special aspect of genetic privacy is that genetic data may relate not only to the data- subject but to blood relatives. Who must consent, then, before tests are performed, or before the results are revealed? Who must be informed that a test is being made, and who informed of the findings? As long-term genetic registries become established, who should have consent and other rights with respect to the data, given that the data will pertain to other members of the family, both present and future?

As an area of medicine and public-health practice, so much of the new genetics work is so innovative that for many purposes it must be considered "research."

Obviously genetic data can be used prejudicially against people's interests, such as eligibility for employment, financial credit, or health or life insurance. Should judgments based on genetic data be made at all? Should they be based on genetic testing data alone, or on family history or medical examination, or on actual expression of the genes as illness? How should people having a genetic makeup predisposing to disease, but who do not yet show symptoms, be treated?119,120

A special, difficult issue for research is how to deal with research on stored tissue samples, such as blood samples, biopsied tumor or other pathology materials, semen, and other human tissues that contain nucleated cells. Large numbers of samples are saved as part of research. In some instances a future research need is specifically anticipated. In others, tissues are saved because so often in science, needs arise later that simply could not have been foreseen. Scientists save specimens; it is part of the culture. Even larger numbers of samples, of course, are stored in blood banks and other collections. Identifiability, consent, and disclosure are the core issues.121,122

Developing ethical guidance over genetic privacy is crucial to the future of both basic genetic research and applied genetics.123,124,125 Because genetic science is becoming more deeply integrated with other kinds of biomedical knowledge, genetic ethics must be integrated with basic biomedical ethics and not developed entirely separately.


(116) Recent critical discussions include George J. Annas and Sherman Elias, Gene Mapping: Using Law and Ethics as Guides (Oxford University Press, New York and Oxford, 1992); Theresa Marteau and Martin Richards, editors, The Troubled Helix: Social and Psychological Implications of the New Human Genetics (Cambridge University Press, Cambridge, 1996); Thomas H. Murray, Mark A. Rothstein, and Robert F. Murray, Jr., editors, The Human Genome Project and the Future of Health Care (Indiana University Press, Bloomington and Indianapolis, 1996); and Philip R. Reilly, Mark F. Boshar, and Steven H. Holtzman, "Ethical Issues in genetic research: Disclosure and informed consent,"Nature Genetics 15, 16–20 (1997).

(117) A general resource is the program on Ethical, Legal, and Social Implications of the Human Genome Project, of the U.S. National Human Genome Research Institute; the NHGRI's home page on the Internet is < http://www. nhgri.nih.gov >. Another resource is the "Bibliography on Bioethics" maintained by the National Center for Genome Resources, available on the Internet at < http://www.ncgr.org >.

(118) For some such situations the words of Ecclesiastes (I:18) may not be too extreme: "He that increaseth knowledge increaseth sorrow."Mak'óbâh, "mental anguish."

(119) Kathy L. Hudson, Karen H. Rothenburg, Lori B. Andrews, Mary Jo Ellis Kahn, and Francis S. Collins, "Genetic discrimination and health insurance: An urgent need for reform," Science 270, 391–393 (1995).

(120) Michael S. Yesley, "Genetic privacy, discrimination and social policy: Challenges and dilemmas," Microbial and Comparative Genetics(April 1997).

(121) Peter S. Harper, "Research samples from families with genetic diseases: A proposed code of conduct," British Medical Journal 306, 1391–1394 (1993).

(122) Ellen Wright Clayton, Karen K. Steinberg, Muin J. Khoury, Elizabeth Thomson, Lori Andrews, Mary Jo Ellis Kahn, Loretta M. Kopelman, and Joan O. Weiss, "Informed consent for genetic research on stored tissue samples,"Journal of the American Medical Association 274, 1786–1792 (1995). A related commentary is: Bartha Maria Knoppers and Claude M. Laberge, "Research and stored samples: Persons as sources, samples as persons?" Journal of the American Medical Association 274, 1806–1807 (1995).

(123) In 1996 the Parliamentary Assembly of the Council of Europe adopted a " Convention for the Protection of Human Rights and Dignity of the Human Being with Regard to the Application of Biology and Medicine: Convention on Human Rights and Biomedicine," Dir/Jur (96) 7(Strasbourg, June 1996). While dealing with many issues such as research-subject consent, the human genome, and organ transplantation, the Convention reaffirms in passing (in Article 10) that "everyone has the right to respect for private life in relation to information about his or her health"; but it does not elaborate.

(124) Mark A. Rothstein, editor, Genetic Secrets: Protecting Privacy and Confidentiality in the Genetic Era (Yale University Press, New Haven, forthcoming 1997).

(125) A volume being prepared under the auspices of the Ethical, Legal, and Social Implications program of the U.S. National Human Genome Research Institute and the U.S. Department of Energy is Alan F. Westin, editor, The Social Sciences, Privacy, and Genetic Information (forthcoming, Columbia University Press, New York, early 1998).

8. Principles

The following principles are recommended for organizations that conduct, sponsor, or regulate health research involving personally identifiable data. They can be transposed into professional guidelines, standard operating principles, regulations, or laws. Detailed criteria and procedures should be established that are specific to the context.

  • Overall in health research, cultivate an atmosphere of respect for the privacy of the people whose health experience is being studied.
  • Collect or use personally identifiable data only if the research is worthwhile and identifiability is required for scientific reasons.
  • Urge Institutional Review Boards and other ethics review bodies to become fully engaged with the privacy, confidentiality, and security aspects of subject protection, in secondary research on data as well as in direct experimentation.
  • Respect such standard fair-use practices as announcing the existence of data collections, allowing data-subjects to review data about themselves, and the like. If for scientific reasons exceptions have to be made to normal practice, this should be discussed as part of the informed consent process before the study starts.
  • Attend sensitively to informing data-subjects and gaining informed consent.
  • Safeguard personal identifiers as close to the point of original data collection as possible.
  • Enforce a policy of "No access to personally identifiable information" as the default— then base exceptional access on need-to-know.
  • Generally limit the cordon-of-access to personally identifiable data. Allow access for formally justified research uses and to appropriate researchers. Maintain and monitor access "audit trails."
  • Remove data-subjects' personal identifiability as thoroughly as is compatible with research needs. If key-coding, aggregating, or otherwise removing personally identifying information, do so with adequate rigor.
  • Maintain proper physical safeguards and cybersecurity measures. Periodically challenge them, to test their adequacy.
  • Develop policies on seeking or allowing secondary use of personally identifiable data, and on the associated conditions and safeguards.
  • Before either (a) transferring data to other researchers or organizations, or (b) using data for new purposes, make conscientious decisions as to whether to proceed and what the privacy protections should be. Then if proceeding, implement appropriate protections.
  • Sensitize, train, and certify all personnel who handle personally identifiable data or supervise those who do. Make data stewardship responsibilities clear. Maintain internal and external accountability.

Appendix: The Author, and ASPE

William W. Lowrance earned an A.B. degree in chemistry and biology from the University of North Carolina (Chapel Hill) in 1965, and a Ph.D. in organic and biological chemistry from The Rockefeller University (New York City) in 1970.

Early in his career Dr. Lowrance turned his attention to hybrid science policy and ethics issues. He has taught and conducted research on science and technology policy, environmental policy, health policy, and risk decisionmaking, at Harvard University (Cambridge, Massachusetts) and Stanford University (Palo Alto, California).

During 1980–1990 he was a Senior Fellow and the Director of the Life Sciences and Public Policy Program of The Rockefeller University.

Lowrance has published two books: Of Acceptable Risk: Science and the Determination of Safety (William Kaufmann, Inc., 1976); andModern Science and Human Values (Oxford University Press, 1985).

He has served on numerous U.S. and international committees, including the U.S. Environmental Protection Agency's Science Advisory Board (SAB), and in 1989 he chaired a broad review by the SAB of its mission and functioning.

During 1991–1995, he was the Executive Director of the International Medical Benefit/Risk Foundation, a nonprofit foundation headquartered in Geneva.

Currently Dr. Lowrance is working as an international consultant in health policy.


The Assistant Secretary for Planning and Evaluation (ASPE) is the principal advisor to the U.S. Secretary of Health and Human Services on policy development, and is responsible for major activities in legislative development, policy planning, policy and economic analysis, and evaluation and policy research oversight.

The functions of the Office of the Assistant Secretary for Planning and Evaluation include coordination and policy development in the area of data policy. In connection with these latter responsibilities, ASPE supported this study of privacy and health research.

The ASPE contact regarding this study is Mr. John Fanning:
John P. Fanning
Division of Data Policy
Office of Program Systems
Office of the Assistant Secretary for Planning
and Evaluation
U.S. Department of Health and Human Services
200 Independence Avenue, SW
Washington, DC 20201
+1 202/690-5896 telephone
+1 202/690-5882 telefax
jfanning@osaspe.dhhs.gov