The Report concludes by describing four large groups of issues that, while not entirely new, are growing rapidly in scale and complexity, and must urgently be attended to:
Secondary use is, as it sounds, use of data subsequent to the original use. As this Report has affirmed, much highly beneficial health research depends on it. The research may be performed either by the parties who initially collected the data, or by others, and either for reasons similar to the original ones, or for very different purposes. Most of the ethical and legal issues have to do with consent of the data-subject, and protections.
As databases are maturing and increasing in size and quality, their appeal as research resources also is growing. Thus the databases of healthcare finance systems and managed-care organizations, among others, are much in demand. These large collections of standardized, computerized data have much information to yield. But so do smaller, highly specialized data collections.
The data hunger of managed care, and of national healthcare systems, is insatiable. Ultimately the public will benefit from research studying the systems themselves as systems, as well as from research that uses data in the systems for external purposes. Much of this research will have to be performed retrospectively.
The privacy concerns surrounding secondary-research use begin the same way as for all research: Must the data be used in personally identifiable form, or can they be used in anonymized or key-coded form? If the data need to be transposed from identified to non- identifiable form, can this be performed effectively and efficiently? Usually these questions can be answered straightforwardly.
If it is decided that data must be used in personally identifiable form, then the most difficult issue is consent. Have the subjects agreed in advance to the new use, or, should they be approached and asked for new consent? Going back to the data-subjects to ask for re-consent may be difficult or even impossiblepeople relocate, change their names, change their healthcare providers, dieand it may be costly. And obviously even the act of going back to the subjects has to be done without violating their privacy.
Research projects may be seen as falling into scenarios such as the following, which overlap but nonetheless may be helpful in structuring rationales regarding identifiability and consent. This scheme is proposed here in the hope that it will attract discussion and development.
Scenario A: Data-subjects have given consent to a future secondary use, under specified conditions.
A straightforward example would be an expected follow-up study to review outcomes. Increasingly, investigators are trying to anticipate future uses, and are seeking appropriate consent when they collect the data. If done properly, this should be acceptable. Ideally, scope of use and time-limits should be specified.
Scenario B: Although data-subjects have not given consent to a secondary use, the purposes of the secondary use are similar to those for the original use, and protections can be assured.
The judgment on this will vary by circumstances. An assessment must be conducted, taking account of benefits and risks, for instance, and perhaps asking whether, ultimately, members of society who are similar to the data-subjects (such as people suffering from the same illness, or having similar vulnerabilities) are likely to benefit from the research.
Some insights on such a similar-use principle may be derived from experiences with the U.S. Privacy Act's "routine use" provision.
Scenario C: Although data-subjects have not given consent to a secondary use, that use is judged to be minimally intrusive, an protections can be assured.
If this appears to be the scenario, an assessment must be conducted. Of course there are issues of who does the assessing, and by what criteria. Again some precedent may exist. The U.S. Federal Common Rule (§_.101(b)(4)) authorizes Institutional Review Boards to "waive the requirements to obtain informed consent provided the IRB finds and documents that:
- the research involves no more than minimal risk to the subjects;
- the waiver... will not adversely affect the rights and welfare of the subjects;
- the research could not practicably be carried out without the waiver...; and
- whenever appropriate, the subjects will be provided with additional pertinent information after participation.
Scenario D: Data-subjects have given broad consent to unspecified future secondary uses.
Often such consent is considered to have been secured when people sign contractual care agreements with health-care organizations or insurers. But, as Ruth Faden was quoted on page 40 of this Report as saying, broad unqualified consent to unspecified purposes is neither reassuring nor protective. Even if waivers exist on file somewhere, specific informed consent should be sought if possible. If that is not possible, a partial corrective is to ensure that identifiable data will be handled only within a defined research group, under defined protections.
In countries having national healthcare systems, there may be misunderstanding or disagreement over whether a person, simply by the act of availing himself of service, implies consent to unspecified future research on the resulting data.
The statutes governing the U.S. Medicare and Medicaid programs allow analysis of the data under very controlled conditionsso long as the research purposes accord with the purposes of the programs studied, and specified procedures are followed and safeguards enforced.
In some situations researchers seek group or community consent, or approval to approach individuals to request consent, via public meetings and discussions with group or community leaders.
Scenario E: Data-subjects have not given consent to a secondary use,and the impact on privacy is not necessarily negligible.
Unless the circumstances are compelling against itsuch as if the research concerns a public-health emergencynew informed consent probably should be sought.
All of the above scenarios concern data that are personally identifiable. For secondary research: If data truly are anonymized, consent should not be an issue; if data can be transposed through an effective key-coding process into non-identifiable form, again consent should not be an issue.
An example of the conditions that may be imposed on secondary research is instructive here. The protections for the personal data in the giant Health Care Financing Administration (HCFA) databases, which contain Medicare records on some 37 million older Americans and Medicaid data from 29 States on 22 million additional beneficiaries, are stringent. (The databases were described on page 19.)
HCFA's "Agreement for Release of [HCFA] Public Use Files" (files in which all personal identifiers have been removed) begins by saying: "In order to ensure the confidence of the American public regarding the confidentiality of information collected and maintained by the Federal government" HCFA expects recipients of its data to comply with specific requirements. Among other undertakings, data-recipients must:
HCFA also releases personally identifiable data, under an even more strict "Agreement for Release of [HCFA] Data with Individual Identifiers." Among its additional provisions are that recipients must:
A great many useful studies are performed under these Agreements. The rules are not easy to enforce, because it is difficult for HCFA to follow what the researchers actually do in practice. But surely they are the kinds of rules that make sense. Federal statutory penalties may be imposed if the rules are not followed, or if the data are used wrongfully. And from time to time HCFA does investigate and does sanction offending researchers and their institutions.
Related to secondary research is data linking, in which associations (links) are made between data on the same data-subject(s) in more than one data collection. (111)
(In everyday life we do this when we match a name on one list, say a school student list, with the list of names in the telephone directory to make a best-guess at her address and parents' names, and then, because it "rings a bell," find ourselves associating the mother's name with a name on a list of local attorneys, and so on.)
Linking may occur within a data set, or between data sets. It may occur within an organization, or between organizations. It may involve health data only, or health data and other data (such as lifestyle, socioeconomic, or police data).
Typical of how secondary analysis, with data linking, can be useful is indicated by this example: (112)
To examine health status of nursing home residents, cost issues, quality of care concerns (e.g., pressure ulcers, methicillin-resistant Staphylococcus aureus, or [hospital incurred] infections), outcomes (mortality, readmissions), and prevention for residents eligible for both Medicare and Medicaid, data are needed from two sources: Medicaid data to identify nursing home residents and their characteristics, and Medicare Part A data to assess hospitalization episodes.
Beyond such considerations as consent, the concern about particular linking studies usually is whether they might assemble "too much" information about data-subjects or the social groups of which they are representative, even if personal identities are not revealed, and/or whether the linking can lead to data-subjects' becoming identifiable by deduction.
This question frequently arises with respect to nonroutine medical procedures, and may arise with respect to public-health surveillance, as were mentioned earlier, but it also arises with respect to many secondary analyses of data.
What, for example, is the status of studies performed by, or for, private-sector managed- care organizations on data they collect in providing care? As care-providers and as businesses, they review the ways their patient-members utilize services, the effectiveness of screening and diagnostic tests, the patterns of clinical practices, use of pharmaceutical and surgical and other resources, outcomes, costs incurred and cost-effectiveness achieved, the "market" need and demand for aspects of health care, and so on. For such analyses they use both their own and others' data. The formality of the studies ranges from casual internal scanning to scholarly analysis.
"Research" is defined by the Federal Common Rule this way (§_.102(d)):
"Research" means a systematic investigation, including research development, testing and evaluation, designed to develop or contribute to generalizable knowledge.
The Belmont Report, from which the above statement was adapted, expressed it more the way scientists would: (113)
The term "research" designates an activity designed to test an hypothesis, permit conclusions to be drawn, and thereby to develop or contribute to generalizable knowledge (expressed, for example, in theories, principles, and statements of relationships).
While some investigations within, say, managed-care organizations are of such scientific quality as to be generalizable (perhaps, publishable in peer-reviewed journals), many are not. Moreover, even if of quality to be generalizable, the findings may possibly be held internally for business advantage and not made generally available; they could, however, be thought of as being generalizable to the patient-member population.
An activity deemed to be "research on human subjects" may, depending on the context, fall under the Federal laws discussed, and require supervision by an IRB and the like. Private- sector organizations should work their way carefully through the issues of consent and data- subject protection, including the coverage of secondary studies that might formally be considered research.
The point here is large but can be made succinctly. Immense volumes of personally identifiable data and lightly masked key-coded data, as well as effectively key-coded or anonymized data, are handled by managed-care organizations, pharmaceutical and related companies, and other private-sector institutions. Some State legal controls apply, as may the Privacy Act and Federal laws where there is Federal involvement. Some managed-care organizations have chosen to conduct their research under the scrutiny of Institutional Review Boards.
But for many health data held in the private sector, few legal controls apply in theory or are enforced in practice regarding such matters as data-subject consent, public notification, Institutional Review Board supervision, or transfer of the data for secondary study. Effective privacy, confidentiality, and security safeguards may well be in place, but this may not be fully evident. A complication now is that much important research is being performed on private- sector data by government and other external organizations, and private-sector data are being mixed with, or examined in parallel with, public-sector data for study.
Several of the Federal confidentiality or fair-use laws now being considered in the U.S. would bring these private-sector data under much fuller coverage of law. As was mentioned earlier, lack of legal coverage of these data is seen by Europeans as being a major weakness in U.S. personal-data protections, and a reason they resist allowing transfer of personal data from Europe to the U.S.
The status of private-sector health data deserves to be reviewed. Probably it should brought under a uniform Federal regimen.
Keeping data secure obviously is part of the craft of privacy-protection. Electronic processing poses security challenges far more complex than paper processing does. In networked computerized systems the notions of "record" and "file" lose much of their meaning; data can be copied, split apart, reordered, assembled into new combinations, altered, and moved around with technical ease. Moreover, data "location" in networks is elusive, being a matter of shifting multiple access-points on interconnected web segments. The nets themselves may easily transcend geopolitical boundaries. Thus the rubric, "cybersecurity," is used here to connote the new character of the problems.
Sheer scale and interconnectedness of databases can be cause for concern. As was vividly expressed by Ross Anderson, referring to the U.K. National Health Service's system-wide "NHS- Net": (114)
We may not be much concerned that a general practitioner's receptionist has access to the records of 2,000 patients; but we would be very concerned indeed if 32,000 general practitioners' receptionists all had access to the records of 56,000,000 patients.
Security has many dimensions. The challenge is to keep data sequestered and protect its integrity, but at the same time to keep it accessible for authorized users who have legitimate need to use it.
In its provocative recent report on these issues, For the Record: Protecting Electronic Health Information, a committee of the National Research Council recommended immediate implementation of these technical practices and procedures: (115)
The committee also recommended adoption of these organizational practices:
The report discussed all of these, and more advanced future practices, in detail. The committee "believes that adoption of these practices will help organizations meet the standards to be promulgated by the Secretary of Health and Human Services in connection with the Health Insurance Portability and Accountability Actor can inform the development of such standards."
A special problem for the management of data in research is: How are various consents and differential access conditions to be trailed along with various data as the data are moved around, combined with other data, linked to other data, split apart into new combinations of data, and processed by different users for different purposes?
Can and should cordons be drawn around the units of organizations that process personally identifiable data?
Most organizations that perform research on personally identifiable health dataclinical centers certainly, many academic units (such as those that perform detailed analyses of healthcare outcomes or economics), and pharmaceutical and related companies, for exampleseem gradually to have come to consider themselves as being, in effect, health-data "enclaves." They transfer sensitive data rather freely within their organizations, and with other organizations under agreement, through a variety of communications conduits. Students, secretaries, data-entry clerks, and many others enter data into computers from paper records, make copies, send data around, and so on. Affiliated scientists who are not medically certified may be involved in analyzing and discussing the data. Some of those involved may legitimately be working under the supervision of a health professional; some may be bound by the terms of their employment not to reveal outside the organization personal data of which they become aware; but some may be little constrained.
Within Federal laboratories and research centers, and within most of the centers they support or regulate, data-access measures are maintained to varying degrees of strictness. Some research organizations formally authorize certain operating units, and internally certify some personnel, to work with personally identifiable data; but many organizations do not. The Clinical Center of the National Institutes of Health, for instance, specifically certifies personnel to handle patient data.
Likewise, organizations may, or may not, focus responsibility for ensuring data- confidentiality internally, and for assuring the public externally.
Organizations that perform research on personally identifiable health data can enhance confidentiality protection and security by: delimiting zones of access to personal data, formally establishing personal-data enclaves; internally training and certifying personnel to work in those enclaves; and focusing responsibilities for these matters.
As the newsmedia are constantly reminding us, the world has entered an entirely new era in genetics: The human genome is being mapped, incredibly sensitive and precise genetic tests have been developed, genetic screening has become commonplace, and an almost incredible array of genetic interventions is being explored.
But the world has by no means prepared itself to cope with the genetic-privacy issues that accompany the scientific advances. (117) Genetic analyses and interventions have exceedingly sensitive attributes:
So, are genetic data fundamentally different from other health data? One is tempted, for the reasons above and others, to think, Yes, of course: They can be very precise and determinative on core aspects of life and health, and they affect family and other relations. On reflection, though, the answer becomes, No, not really: Countless other health data can be precise and determinative (and besides, often genetic risk factors are just risk factors among others); and many kinds of health data have implications for family and other relations.
Until recently, genetic analyses mainly were able to identify the presence of genes which strongly determine diseases, such as cystic fibrosis, Tay-Sachs disease, and sickle cell trait. Some 5,000 such conditions are now known. What is changing rapidly now is that we are becoming able to identify genetic factors that increase disease risk but are not uniquely the determinants of diseasegenes that relate to obesity, for instance, and some kinds of breast, prostate, and colon cancer, and susceptibility to alcoholism. We are learning more about genetic contributions to diabetes, and heart disease. It may well not be evident what genetic data imply for the person's health, or what interventions or other responses might be reasonable. The information can be very unsettling. Knowing this kind of genetic information may or may not be helpful or comforting. (118)
To take a poignant example, tests can determine, before she is born, that a girl has inherited the BRCA-1 and -2 genes, which predispose to breast cancer. But what should her parents be told and advised to do, and what and when should the girl be told, and how should she be protected against discrimination, and how can she be helped to minimize her risk? For many genetic conditions, of course, steps can be taken to minimize the health risk, such as by attending carefully to diet or other aspects of lifestyle, and monitoring for expression of the disease.
A special aspect of genetic privacy is that genetic data may relate not only to the data- subject but to blood relatives. Who must consent, then, before tests are performed, or before the results are revealed? Who must be informed that a test is being made, and who informed of the findings? As long-term genetic registries become established, who should have consent and other rights with respect to the data, given that the data will pertain to other members of the family, both present and future?
As an area of medicine and public-health practice, so much of the new genetics work is so innovative that for many purposes it must be considered "research."
Obviously genetic data can be used prejudicially against people's interests, such as eligibility for employment, financial credit, or health or life insurance. Should judgments based on genetic data be made at all? Should they be based on genetic testing data alone, or on family history or medical examination, or on actual expression of the genes as illness? How should people having a genetic makeup predisposing to disease, but who do not yet show symptoms, be treated? (119), (120)
A special, difficult issue for research is how to deal with research on stored tissue samples, such as blood samples, biopsied tumor or other pathology materials, semen, and other human tissues that contain nucleated cells. Large numbers of samples are saved as part of research. In some instances a future research need is specifically anticipated. In others, tissues are saved because so often in science, needs arise later that simply could not have been foreseen. Scientists save specimens; it is part of the culture. Even larger numbers of samples, of course, are stored in blood banks and other collections. Identifiability, consent, and disclosure are the core issues. (121), (122)
Developing ethical guidance over genetic privacy is crucial to the future of both basic genetic research and applied genetics. (123), (124), (125) Because genetic science is becoming more deeply integrated with other kinds of biomedical knowledge, genetic ethics must be integrated with basic biomedical ethics and not developed entirely separately.
| [Previous] | [Next] |
Return to the Data Council home page .
Last updated 7/23/97.