Making a Powerful Connection: the Health of the Public and the National Information Infrastructure. 4.1 Data Collection and Analysis


Monitoring health status and environmental quality, diagnosing and investigating health problems and hazards, and evaluating the accessibility and effectiveness of clinical and population-based health services all require sophisticated data collection, linkage, and analysis. Since the client for public health is the community, data are needed not only about people (including their health status, personal risk behaviors, and medical treatment), but also about potential sources of disease and injury in the environment (such as restaurants, wells, water or sewage treatment plants, worksites, and insects), and available resources that can be mobilized for effective action. Ultimately, these data need to be linked to each other and aggregated geographically, so that it is possible to do such things as detect an incipient epidemic from isolated cases seen by different care providers, relate clinical events with proximate health hazards, and correlate the use and costs of personal health care services with ambient behavioral and environmental risks to health.

Data for these purposes come from a wide variety of sources. Currently, public health predominantly relies on seven types of data to meet its needs: vital statistics; health care utilization data; practitioner registries; disease and injury registries; disease, injury, and behavioral risk factor surveillance systems; periodic surveys; and programmatic data systems. Individually, some of these data sources are among the best in the world. But viewed as a whole, problems with fragmentation, lack of standardization, episodic data collection -- and the fact that data for public health purposes are often collected separately and redundantly from encounter data in the medical treatment system -- have exacerbated the burden and costs of collecting data, limited the linkability and usefulness of the data that are collected, and resulted in critical data gaps.

The development of logically integrated health information systems, in which information collected once can serve multiple purposes, has the potential to overcome some of these problems. Realizing this potential will require:

  • a standardized, multipurpose nomenclature for all health concepts;
  • uniform standards for electronic data transmission;
  • unique identifiers for all "units" of interest, including individuals, providers, worksites, restaurants, wells, etc.
  • a secure environment for transmitting, linking, and anonymizing data;
  • strong privacy protections; and
  • appropriate data sharing policies.

Not surprisingly, these same factors are also prerequisites for effective computer-based patient record systems.