Johnson noted that even revealing that a person filed a tax return is considered a disclosure by the IRS, so they set a high bar to prevent disclosure. IRS balances transparency and confidentiality by working in cooperation with the users. They formed a user group, and ask them to help make choices. Two outside users helped develop the updated version of the public use file, which increased utility and strengthened protection, and helped justify removing the geographic variable.
Oelschlaeger commented that, recently, CMS has focused on aggregated files rather than de-identified files. The stand-alone de-identified public use files are so focused on removing all variables that could lead to re-identification that the files might be considered as useless. CMS has a number of ways to share data, only one of which is for researchers. Commercial researchers receive HIPAA Limited Datasets. The Qualified Entity program in the Affordable Care Act gives CMS the authority to release data for quality improvement/performance measures. Any disclosure of identifiable data requires a DUA.
Scheuren observed that the mosaic effect comes into play when someone extracts data and tries to match it to another dataset. Billions of records in the insurance world are used for data mining. This is largely a good thing, but there are downsides, such as hackers. He noted that we have a trust system, but we need a trust-but-verify system. There are three things to do: penalize people, enforcement, and scale back overzealous confidentiality.
Johnson commented that the IRS needs a legislative change to allow a DUA that would put the responsibility on users. Right now, the data the IRS releases must be completely safe, or it cannot be made available.