Privacy and Health Research. Identifiable---Key-Coded---Anonymized


From a privacy-protection perspective, there is a very wide distinction between personally identifiable data and truly anonymized data. But in practice the demarcation between these extremes is not sharp. Attending assiduously to where particular data lie on the spectrum between them, and especially to data that are somewhere in the middle, is a crucial protection strategy.

At present, large amounts of data lie in-between—they are not completely anonymized, but they are not readily identified, either. The power of computers to perform elaborate, powerful, rapid searches, and the pressures for access, mean that merely assigning simple pseudonyms affords little protection.

For data whose identifiability has, up to now, been only lightly obscured, greater efforts must now be made either: (a) to much more effectively remove personally identifying information, or to aggregate, and thus anonymize, the data; or (b) to seek the data-subjects' informed consent and hold the data under a suitably protective regimen if identifiability is retained.

For key-coded data—that is, data for which personal identifiers are removed and secreted but which are still potentially traceable via a matching code, held separately—a variety of measures must be taken to mask the identifiability near the source, separate and lock up the identifiers, safeguard the linking codes, and carefully manage linking-back to the data-subject when it is required.