Utilizing Data from Various Data Partners in a Distributed Manner

Developing and testing the capability to conduct timely and secure distributed regression analysis in distributed data networks.
  • Food and Drug Administration (FDA)


Start Date
  • 7/15/2015


  • Use of Clinical Data for Research
  • Use of Enhanced Publicly-Funded Data Systems for Research


STATUS: Active Project


Currently information on a patients’ health care is captured across various data sources. The ability to link data across health care databases would provide more robust cross‑sectional or longitudinal patient profiles, enhancing secondary uses of electronic health care information for research purposes, and improving access to information that would not be present in claims or registry data or electronic health records (EHRs) alone. In order to address this gap, the Food and Drug Administration (FDA) seeks to build upon previous distributed linear regression analysis efforts by developing enhanced analytic capabilities and fully automating distributed linear regression analysis of patient data across organizations.


This project is spearheaded by the FDA and will develop and test the capability to conduct timely and secure distributed regression analysis in distributed data networks. Additionally, it will explore the feasibility of creating virtual linkage capabilities to: 1) utilize data from multiple data sources with unique populations (horizontally partitioned data); and 2) utilize data for one specific patient with information at different institutions (vertically partitioned data) through a unique key used to identify the patient. This would allow research networks to maintain control of patient‑level data while generating valid regression estimates within and across networks without the need to transfer protected health information, providing a balance between analytic requirements, patient privacy and confidentiality, and proprietary considerations.

The project objectives are to:

  • Develop a new open‑source software application that will use PopMedNet™ (PMN), an open source software application that enables the creation, operation, and governance of distributed health networks, to automate multi‑step interactive processes and allow stakeholders to conduct distributed regression analyses with data from different people held at different institutions without sharing potentially identifiable information across sites.

  • Develop this software application so that it can be supported by PMN and can be modified and adopted for non‑PMN applications.

  • Test the new, distributed regression application in an actual distributed research network.

  • Provide technical and user documentation to accompany the new software and allow for its widespread adoption.

  • Explore the feasibility of conducting distributed regression analyses in which data from the same people are held at different institutions.