Several current changes in the context within which health data are collected and used must be recognized. First, the boundaries between classical medical care and "public health" are becoming ever less distinct. Over the past decades the rubric, "health," has been broadened to include many matters—from hyperactivity in children, to teenagers' nose shape, to memory loss associated with aging—that earlier were not viewed as matters of health, much less of medicine. At the same time, medical science has come to accord much more importance to such ordinary life factors as diet and stress as determinants of health, and therefore addresses them in medical care.29
This Report covers the whole range, and where distinctions are not sharp it refers to "health," which, after all, is the end of medicine. "Health data" includes all data collected under physicians' supervision, but also a wide range of other data that relate to health.
An expansive definition such as Lawrence Gostin's is necessary:30
The term "health data" is broadly defined as all records that contain information that describes a person's prior, current, or future health status, including [cause of disease], diagnosis, prognosis, or treatment, or methods of reimbursement for health services.
To quote an example from a statute, the newly enacted "U.S. Health Insurance Portability and Accountability Act" reaches broadly, as it must (§1171(4)):
The term "health information" means any information, whether oral or recorded in any form or medium, that—(A) is created or received by a health care provider, health plan, public health authority, employer, life ensurer, school or university, or health care clearinghouse; and (B) relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual.
Second, of course, health care has been evolving into systems, andsystems of systems. As a consequence, the traditional clinician's notes, scribbled down or dictated and later transcribed, and then locked up in filing cabinets, increasingly are being recorded along with other files in electronic media, usually networked.31
And third, the trend clearly is toward not only recording health information in computerized form, but indeed basing health care around the "lifetime linked-data dossier" on the person. Many advantages are evident for assembling health data from disparate sources, understanding the person's life-and-health trajectory, providing health-promotion input and health care, transmitting orders and analyses, and networking and consulting at distances. Many advantages are evident for billing and paying, administrative review, and research. And as well, there can be many advantages for the patient's own awareness and documentation of his health "story."
Public-health records are being computerized just as quickly. In the future envisioned by seers, many aspects of public-health surveillance (such as scanning for infectious disease outbreaks), compilation of statistics (use of hospital outpatient services...), development of registries (vaccination...), and other analytic collections (effects of pharmaceuticals...) will simply be derived, whenever and in whatever form needed, from the networked lifetime dossiers.
These visionary technical developments, which are very exciting but not without negative aspects, are being explored diligently by many institutions.32,33 A potential vulnerability, even the "Achilles heel," of this movement is whether it will be able to deal adequately with privacy, confidentiality, and security.
(29) Kerr L. White, Healing the Schism: Epidemiology, Medicine, and the Public's Health (Springer-Verlag, New York, Berlin, and Heidelberg, 1991).
(30) Lawrence O. Gostin, as cited in endnote (4).
(31) One appreciates the remark, the validity of which is now fading, by the distinguished physician Sir Douglas Black (personal communication): "The only privacy protection the poor patient has left is the doctor's bad handwriting."
(32) Institute of Medicine, Richard S. Dick and Elaine B. Steen, editors, The Computer-based Patient Record: An Essential Technology for Health Care (National Academy Press, Washington, DC, 1991). A distinctive aspect of this report is its promotion of computer-based medical records, beyond merely computerized versions of traditional records. (A Revised Edition is forthcoming, 1997.)
(33) Institute of Medicine, Committee on Evaluating Clinical Applications of Telemedicine; Marilyn J. Field, editor,Telemedicine: A Guide to Assessing Telecommunications in Health Care (National Academy Press, Washington, DC, 1996).
Although definitions need not be belabored here, a few concepts and items of vocabulary are necessary.
Data is taken to mean discrete bits of information. As one dictionary has it: "Data are facts or figures from which conclusions may be inferred." For most research now, data are converted into numerical form for processing by computers.
Data-subjects are the people about whom data are collected.
Databases are collections of data, recorded in standardized fashion, ordered for reference or research purposes.
Database research, then, is research that analyzes data in such collections.
Information is data set within a context of meaning. Raw data (such as lists of numbers that stand for blood-enzyme concentrations, or units on a mental-depression scale) make no "sense" as facts unless the measurement method and descriptive scale are known. And before any scientific meaning can be inferred, the data must be tied with data on other characteristics of the data-subjects and the circumstances.
Personally identifiable data are data that are associated with real persons, or that can be associated with real persons by deduction from descriptors such as birthdate, physical characteristics, occupation, residential location, social identification number, or history. Synonyms are "personal data" and "individually identifiable data." Often for brevity the descriptors, such as the person's name, that associate the data with a real person are referred to just as "identifiers."
Processing or handling of data, in an ethical or legal sense, may refer to recording, storing, retrieving, duplicating, transferring, destroying—in effect, any action through which someone may become cognizant of, or move, or alter, data.34 Verb lists of this kind are unavoidable; privacy of the data-subject can be affected by any such operations.
(34) The European Union Data Privacy Directive (95/46/EC) at Article 2(b) defines "processing" as being "any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction."
The Universe of Health Data
So many kinds of health data are collected that it would be distracting and soporific to do more here than take note of the major categories. But it is essential to recognize: (a) that great research power resides in a diversity of health data, and (b) that privacy issues surround many kinds of data beyond those in primary medical records.
Health data include:
- Primary medical, hospital, and clinic data (including various managed-care data)
- Prescribing, pharmacy, clinical laboratory, and imaging data (x-ray, magnetic resonance, sonagram...)
- Administrative and financial data (billing, payment, insurance, audit...)
- Vital records (birth, adoption, death...)
- Exposure registries (asbestos, x-rays, childrens' lead...)
- Disease registries (melanoma, tuberculosis, burn, congenital malformation...)
- Other monitoring and surveillance registries (drinking water fluoridation, infant nutrition, hearing conservation...)
- Genetic data registries (pedigree analyses, screening, gene maps...)
- Intervention registries (vaccination, cardiac pacemaker...)
- Military health and hazard-exposure data (Agent Orange, artillery noise...)
- Occupational health and hazard-exposure data (coal dust...)
- Incident-, accident-, and disaster-exposure data (Love Canal, Three Mile Island, Bhopal...)
- Tissue samples (blood, semen, ova, pathology...), with associated data
- Surveys of attitudes and practices (diet, alcohol consumption, dental hygiene, condom use...)
- Clinical-trial and other experimental data
- Regulatory data (in the Food and Drug Administration, in city and State health departments...).
All kinds of data may reveal intimate information. Prescription data, for instance, often indicate the disease, or at least the kind of disease, being treated. Blood-type holds implications about parentage. Just the very fact that a person has entered into a relationship with a psychotherapist, or a drug-abuse treatment center—as revealed, say, by billing records or clinic appointment logs—can be held against the person by employers or others.
Further, besides carrying technical observations relating to the main purpose of an encounter between a person and a healthcare or research system, records may contain subjective remarks on general health or lifestyle ("coughs a lot, probably heavy smoker"), incidental observations ("child has numerous bruises and small burn scars on back" or "spouse opposed to surgery"), or speculations ("taking anabolic steroids?"or "bulimic?").
Obviously some kinds of data are felt by data-subjects or the public in general to be especially sensitive. A commonly cited example is that HIV–AIDS data are much more sensitive than, say, data about wrist fracture. Whether sensitivity is somehow justified will always be debatable within the context. But for purposes of ethical practices, policy, and law, widely held public concerns must be recognized and respected appropriately.
Among the categories often taken to be highly sensitive are data about:
- Mental health
- Aberrant behavior (child battering...)
- Alcohol or other chemical habituation
- Reproduction (infertility, pregnancy, ova and sperm donation, spontaneous or elective abortion...)
- Embarrassing problems (sexual impotence, urinary incontinence...)
- Sexual orientation, attitudes, practices, and functions
- Sexually transmitted diseases
But although these are among the more obviously delicate kinds of data, a person may just as well have anxieties about employers or others becoming aware of data regarding asthma, for instance, or epilepsy, cirrhosis of the liver, or a weak back.
Sensitivity may have to do with revelation of a past that a person has moved beyond and does not wish others to know about, or be reminded of himself. It may imply improper or socially marginal behavior. It may stem from resentment at ill fortune in the lottery of life, or from imputation of careless behavior, or implication of disfunctionality. And of course it may stem from fear of negative discrimination.
This raises serious questions for policy. Should distinctions be made among kinds of health data with respect to how they are protected? Should special sensitivities be recognized? Should protections be scaled relative to the potential for social or physical harm, or emotional offense, to data-subjects?
A U.S. Task Force on the Privacy of Private-Sector Health Records expressed this view (with which the author agrees):35
The Task Force believes that any file containing health information should be considered a candidate for protection since it is the information itself, and not the form in which it is maintained, which could result in an invasion of privacy if released. ... Although the Task Force agrees that it is appealing to classify information according to sensitivity, it questions whether this is the most effective approach to protecting data that may potentially cause harm to an individual. Disease-specific segregation of records necessitates complicated administrative arrangements.... In addition, the definition of what constitutes a sensitive medical record may differ from decade to decade and from individual to individual. ... Protecting all health records adequately is the issue that must be addressed.
(35) U.S. Department of Health and Human Services, Task Force on the Privacy of Private-Sector Records, Final Report, p. 4 (report prepared under HHS Contract # 100-91-0036 by Kunitz and Associates, Inc., 6001 Montrose Road, Suite 920, Rockville, Maryland 20852, September 1995).
The Diversity of Data Holders
Just as varied as the types of health data, of course, are the types of individuals and organizations who hold or process the data. Data are processed by:
- Clinics, hospitals, nursing homes, hospices, laboratories, independent physicians, and other primary-care and diagnostic services (breast cancer screening...)
- Pharmacies (including large commercial chains)
- Private-sector managed-care providers, healthcare services (physical therapy, speech therapy, social work...), disease-management businesses (diabetes maintenance...), and pharmacy-benefit management companies
- Blood supply systems and manufacturers of blood products, ova and sperm banks, and organ transplant and other tissue brokers
- Academic and other nonprofit health research centers
- Government organizations (healthcare providers; payors; research and statistics centers; regulators; public-health authorities; immigration, military, penal, and social-services organizations...)
- Manufacturers of pharmaceuticals, vaccines and other biotechnology products, diagnostics, and medical devices
- International quasi-governmental organizations (World Health Organization, International Red Cross and Red Crescent...)
- Health, life, and casualty insurance companies
- Nonprofit patient organizations
- Employers, schools
- For-profit research firms (contract research organizations...)
- Commercial data vendors.
Thus health data are held by a greater variety of organizations than ever before. Data flow, often at very high volume, within and among many of these organizations.
Although physicians, and staff nominally under their supervision, still collect much of the most intimate data, they are not necessarily any longer in position to control the movements, uses, or fate of the data. Data from a routine patient encounter with the healthcare system quickly are transmitted among care-providers and their local institutions, various technical support services, the paying institutions, and a variety of supervisors, inspectors, auditors, and researchers—many far removed from the data-subject, many not medically certified, and possibly many not sworn to confidentiality. Eventually the encounter may be examined in practice review, filed into statistical tabulations, recorded into ongoing registries, or scrutinized in research.
Databases Useful for Research
Among the most important resources for research are databases and registries of health experience. Some are highly specialized but not very large; some are broad and enormous. Some are maintained only for research; some are primarily maintained for administrative or other purposes but are available for research. They may be organized by illness (leprosy...), by exposure (oral contraceptives...), by mode of intervention (kidney transplant...), by general healthcare experience (nursing home stay...), or by population (residents of Saskatchewan).
Perhaps the largest collection of health databases in the world is the set of U.S. "Medicare" database systems, which every year processes the records of over 600 million reimbursement claims. (Medicare is the Federal health insurance program for people age 65 and over, people with serious disabilities, and people suffering from serious kidney disease.) The Medicare databases, which are managed by the Health Care Financing Administration (HCFA), contain enrollment and eligibility data, claims for payment, data on the ways healthcare services are used, and many specialized data (such as on end-stage renal disease).36
Much very useful research is performed on HCFA data, which as collected is personally identifiable. Public-use files are made available in which, HCFA certifies, "all identifiers have been encrypted, ranged, or blanked." For research projects which meet the criteria for release of identifiable data, HCFA supplies data under Release Agreements pursuant to "routine uses" announced under the Privacy Act. The protections are strict. (See page 59 regarding the Privacy Act, and page 68 regarding conditions on use of Medicare data.)
"Medicaid" databases also are important research resources. (Medicaid programs are regimes under which the States pay for basic health care for low-income, blind, or otherwise disadvantaged people, using joint Federal–State funds.) Like Medicare data, Medicaid data are administrative and billing records.
Although the data may not be of highest quality and are not fully standardized nationally, they nonetheless provide large amounts of diverse information about health and health care about millions of patients "in the real world." Sophisticated computer programs allow searching for data on patient age and sex, diagnoses, use of medicines and medical procedures, costs, and other factors. Researchers are allowed access to the data under restrictive conditions.37
Health databases useful for research are maintained in many places.38 In Europe, just to mention a few examples to suggest their variety, they include the 30 Regional Centers of the French Pharmacovigilance System, the Danish Psychiatric Central Register, the Crohn's Disease Register for the Brussels region, and the Prescription Event Monitoring System run by the Drug Safety Research Unit in Southampton. All of these hold personally identifiable data, as they must.
(36) U.S. Health Care Financing Administration, Bureau of Data Management and Strategy, Data Users Reference Guide (HCFA, Baltimore, Maryland, September 1995), and Overview of Health Care Financing Administration Data: Resource Guide (HCFA, Baltimore, Maryland, April 1996).
(37) Jeffrey L. Carson and Brian L. Strom, "Medicaid databases," pp. 199–216 of Brian L. Strom, editor,Pharmacoepidemiology, Second Edition (John Wiley & Sons, Chichester and New York, 1994).
(38) Several hundred databases were surveyed and described in a series of International Drug Benefit/Risk Data Resource Handbooks(covering North America, Europe, Japan, Australia, and New Zealand) prepared under the auspices of the International Medical Benefit/Risk Foundation, Geneva. Information on the Handbooks can be obtained from Dr. Judith K. Jones, The Degge Group, Ltd., 1616 North Fort Meyer Drive, Arlington, Virginia 22209-3109.
The International Flow of Data
Health data are zipped around the world all day every day, by government research agencies, pharmaceutical firms, academic researchers, and many others. Data on Americans are transferred, American institutions do much data-transferring, and data are transferred for important American purposes.
A great many health data are imported into the U.S., and many are exported. Such U.S. agencies as the Centers for Disease Control and Prevention, working cooperatively in and with many other countries, import personally identifiable data, under safeguards. The National Heart, Lung, and Blood Institute, in joint programs with Canada and European countries, exchanges data internationally, under safeguards. So does the National Cancer Institute.
Huge volumes of clinical-trial data collected in medical centers are transferred all the time, on behalf of companies that develop and manufacture pharmaceuticals, diagnostics, and medical devices, and the National Institutes of Health, and the World Health Organization, and many others working to improve medical "tools." So are drug, device, and vaccine adverse-effect reports, which provide essential feedback.
Thus personally identifiable health-research data are exchanged internationally, for very good reasons, all the time, and inevitably this international data flow will increase. The importance of pressing for uniform international standards for protecting privacy, confidentiality, and security, is evident.