Personal Privacy in an Information Society. Research and Statistical Activities

07/12/1997

The term research will be used in this chapter to refer to any systematic, objective process designed to obtain new knowledge, regardless of whether it is "pure" (aimed at deriving general principles) or "applied" (aimed at solving a specific problem or at determining policy). Statistics refers both to the data obtained through enumeration and measurement and to the use of mathematical methods for dealing with data so obtained. Statistical methods can be descriptive, that is, any treatment designed to summarize or describe important features of data, or inferential, that is, techniques for arriving at generalizations that go beyond the sample being analyzed.

The research and statistical activities that use individually identifiable information draw huge quantities of it from Federal administrative records, both for routine production of statistical reports and for the performance of statistical analysis or other research tasks. Researchers draw other information directly from individuals as part of the research process. As to Federal agencies, some conduct the bulk of their research themselves. For example, the Bureau of the Census not only conducts all its own surveys but also performs data-collection services for other agencies on a reimbursable basis. A great deal of Federal agency research, however, is contracted out to private and semi-public research organizations and Federal grants support numerous research projects at other levels of government and in the private sector.

A typical research project starts with a hypothesis and proceeds through four stages: data collection; data processing; data analysis and interpretation; and finally, publication or dissemination of findings. Before data collection can begin, assumptions must be made about what information is relevant to the hypothesis and what kind of individuals are appropriate data subjects or respondents. Processing may involve anything from simple arranging and manual tabulations to complex coding and sophisticated computer analysis. Data storage and retrieval may rely on anything from handwritten notes and human memory to punched cards, magnetic tapes and discs, films, and computer memory. Data can be analyzed and interpreted in terms of the original hypothesis, or-when the research design less closely approximates the canons of the scientific method-in the light of less clearly articulated assumptions. Statistical manipulation may or may not be required. For some studies, a simple tabulation or descriptive case study may be the result. The final step is a research report to make the findings available to others.

In most studies, the researcher or statistician is interested in the individual primarily as a carrier of attributes or characteristics of groups or distributions. Individual data are often used as major building blocks during the analytical process, but in the final stage both research findings and statistical data are characteristically presented in aggregate form. In research, the purpose is to discover and analyze relationships among variables; in statistics, the purpose is to define average characteristics or discover their distribution or both. Individual data are therefore grouped according to characteristics and reported in the aggregate.

To illustrate, suppose the Department of Labor, for its own policymaking purposes, sponsors a study comparing and contrasting two manpower training programs. The project design requires extensive questioning and observation of two groups of trainees over a two-year period during which at least three series of interviews are conducted. Despite the research team's close, long-term involvement with the research participants, no information supplied by the respondents is released until the final report and then the information is in statistical summary form. If the final report contains quotations from respondents for illustrative purposes, they are presented anonymously, not as individual data with identifiers attached. The bulk of the data are presented in tables according to categories, such as training program A or B, sex, extent of formal education previously received, training, occupation, attitudes toward training programs, and whether participation was mandatory or voluntary.

In most cases, omitting identifiers, such as name, address, telephone number, or subject identification number, is enough to protect the participants' anonymity. In certain cases, however, other information can identify the respondents, as when the study is about people in a relatively unusual occupation such as network TV anchorwomen, or is limited to people in a specific geographic area or income bracket. In such cases, characteristics such as occupation, age, or income may have to be suppressed to preserve the participants' anonymity.

It is often difficult to decide in advance which information beyond the standard items of name, address, or telephone number will or will not constitute identifying information. It must be emphasized, however, that research and statistical activities are undertaken not in the investigative sense of discovering what there is to know about identified individuals, but in pursuit of systematic knowledge about human beings in groups. A distinction should also be drawn between the use of information for research and statistical purposes and the methods employed for information gathering and analysis. The methods researchers and statisticians use in data collection and analysis may also be useful for purposes wholly unrelated to research and statistics, notably for law enforcement, evaluating compliance with program requirements, assessing performance, and even for commercial exploitation. Thus, the duties and safeguards recommended in this chapter do not apply to all information about individuals collected or used according to what may be considered research or statistical methods.

Definitions

In the discussion of the Commission's recommendations, the following definitions apply:

Individual: any citizen or permanent resident of the United States.

Individually Identifiable Form: any material that could reasonably be uniquely associated with the identity of the individual to whom it pertains.

Research and Statistical Information: any information about an individual, obtained from any source, used for a research or statistical purpose.

Research and Statistical Record: any item, collection or grouping of information maintained in any form of record solely for a research or statistical purpose.

Research and Statistical Purposes: the developing and reporting of aggregate or anonymous information not intended to be used, in whole or in part, for making a decision about an individual that is not an integral part of the particular research project.

Functional Separation: separating the use of information about an individual for a research or statistical purpose from its use in arriving at an administrative or other decision about that individual.