For many purposes researchers must keep the ability to trace back, even if through intermediaries, to the data-subjects. Irreversible anonymization is not necessarily desirable.
There are a number of important reasons why retaining personal identifiability—either openly labelled or via key-coding—may be essential:
- To allow technical validation of reports, such as to confirm the correspondence of various data with the data-subjects, or even to verify the very existence and identity of subjects, in order to prevent scientific errors or fraud.
- To avoid duplicate records or redundant cases, such as to be certain that two case reports are independent and not just the same case recorded in two files.
- To facilitate internal scientific data-quality control, such as enabling working-back to original records and ancillary data.
- To allow case follow-up if more evidence or confirmation are needed.
- To check data-subject consent records, or to examine Institutional Review Board stipulations or opinions on a case.
- To allow tracking of consequences after some research intervention, to be able later, if necessary, to notify the patient or physician and recommend reexamination or other measures in-between research and health care.
- To ensure accurate correspondence in linking data on data-subjects, or cases, or groups, or specimens, among different files or databases, perhaps over a long period, even over decades, and possibly to follow-on to descendants.
One of the clearest examples of the need to retain potential identifiability is the analysis of pharmaceutical and medical-device side-effect risks. As was mentioned above, the U.S. Food and Drug Administration, like all regulatory authorities, properly requires that data-links to the patient record be maintained (usually through the data-subject's physician) so that adverse-drug-event reports, sent in by physicians, the public, or manufacturers, can be verified and scrutinized in clinical detail if necessary.