Minimizing Disclosure Risk in HHS Open Data Initiatives. C. The Mosaic Effect


The “mosaic effect” is a new term in the literature on confidentiality. It received prominent mention in Memorandum M-13-13 from the Office of Management and Budget (OMB), “Open Data Policy—Managing Information as an Asset” (OMB 2013), but a search for the term in the database Google Scholar produced no relevant hits.

The notion of a mosaic effect is derived from the mosaic theory of intelligence gathering, in which disparate pieces of information—although individually of limited utility—become significant when combined with other types of information (Pozen 2005). Applied to public use data, the concept of a mosaic effect suggests that even anonymized data, which may seem innocuous in isolation, may become vulnerable to re-identification if enough datasets containing similar or complementary information are released. Even though personal identifiers are removed from these datasets, an intruder who is able to piece together enough information may be able to re-identify individuals whose data are contained in one or more of these datasets. To do so, the intruder must possess or be able to secure at least some data on known individuals. Such information is readily available in computerized form in voter registration records, hospital discharge records, commercially marketed databases, and other sources (Rothstein 2010).

Another potential source of information on individuals is the worldwide web. Malin (2005) demonstrates the application of “trail matching” methods to re-identify IP addresses of website visitors. Common patterns in data trails left behind after website visits can be used to discover relationships between them that enable re-identification when some of the locations capture identifying information along with anonymous data. While re-identification in this context represents a different problem than re-identification of health data released by federal agencies, Malin’s results illustrate how re-identification can be accomplished with large amounts of mostly anonymous data when identifiers are attached to some of it.

View full report


"rpt_Disclosure.pdf" (pdf, 1.01Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®