From a privacy-protection perspective, there is a very wide distinction between personally identifiable data and truly anonymized data. But in practice the demarcation between these extremes is not sharp. Attending assiduously to where particular data lie on the spectrum between them, and especially to data that are somewhere in the middle, is a crucial protection strategy.
At present, large amounts of data lie in-between—they are not completely anonymized, but they are not readily identified, either. It is routine to decrease identifiability by assigning to data a pseudonym made up of numbers and/or letters. But if, for instance, the overall data category is known (say, epilepsy among men in a certain district) and the data are coded-for by, say, simply personal initials and birthdate, it may not be difficult to deduce who the data-subject is. The power of computers to perform elaborate, powerful, rapid searches, and the pressures for access, mean that merely assigning simple pseudonyms affords little protection.
For data whose identifiability has, up to now, been only lightly obscured, greater efforts now must be made either: (a) to much more effectively remove personally identifying information, or to aggregate, and thus anonymize, the data; or (b) to seek the data-subjects' informed consent and hold the data under a suitably protective regimen if identifiability is retained.
For key-coded data—that is, data for which personal identifiers are removed and secreted but which are still potentially traceable via a matching code, held separately—a variety of measures must be taken to mask the identifiability near the source, separate and lock up the identifiers, safeguard the linking codes, and carefully manage linking-back to the data-subject when it is required.
nstitutions should clearly articulate their policies on use or sharing of personally identifiable data. An example of such a policy statement is this guidance by the U.S. Office for Protection from Research Risks, on HIV studies:61
Where identifiers are not required by the design of the study, they are not to be recorded. If identifiers are recorded, they should be separated, if possible, from data and stored separately, with linkage restored only when necessary to conduct the research. No lists should be retained identifying those who elected not to participate. Participants should be given a fair, clear explanation of how information about them will be handled. ...
As a general principle, information is not to be disclosed without the subject's consent. The protocol must clearly state who is entitled to see records with identifiers, both within and outside the project.
(61) "Dear Colleague Letter," OPRR Reports, p. 3 (December 26, 1984).