Improving Data for Decision Making: HHS Data Collection Strategies for a Transformed Health System




Introduction and Background

The collection, analysis and dissemination of timely and reliable data are essential to the success of the HHS mission of enhancing the health and well-being of Americans.  Indeed it would be difficult to conceive of an HHS program mission or policy initiative in which data and information do not play a central role.  In addition to HHS programs, the Department plays critical leadership and stewardship roles for health and human services data.  In those roles, HHS periodically assesses its survey and data collection portfolio with a view to identifying data collection strategies that would strengthen the capacity of HHS data resources; promote synergy across systems; assure efficiencies, quality, utility and timeliness; address high priority data gaps and minimize unintended redundancy.

Accordingly, the HHS Data Councils Executive Committee directed an assessment of the current HHS data collection system portfolio with a view towards recommendations that would promote continuing improvement in the way data are collected and made available for program management, research, surveillance, policy analysis and consumer health.  To carry out the review, the Executive Committee created two Working Groups.  The Research and Development Working Group was asked to review the current HHS data collection system portfolio and to identify and assess the benefits and costs of proven, promising and emerging practices, approaches, strategies and technologies that could be considered to improve the timeliness and responsiveness of HHS surveys and data collection systems, with special attention to web based data collection capabilities.  The Integration and Alignment Working Group was charged with outlining a framework for data coordination and integration in current and planned data collection systems, and identifying opportunities for future integration and alignment relating to surveys, administrative data systems and electronic health record (EHR) systems.

The Executive Committee noted that the value of HHS data is a function of its relevance, timeliness, availability when needed and distinctive contributions.  As a springboard for its work, the Executive Committee adopted a number of principles for an HHS data collection strategy that were used to guide the development of goals and recommendations provided in this report.  The principles embody the values of Privacy, Confidentiality, and Security; Relevance, Efficiency; Availability and Ease of Use; Innovation, Synergy, and Scientific Integrity.  The findings and recommendations from these two Working Groups provide a framework to support HHS leadership in planning a data strategy that involves maximizing existing resources and setting future priorities.

As the initial step in assessment, the Working Groups developed an inventory of current and planned HHS data collection systems and assessedthe methods and technology that form the basis for current HHS data capacity.  As part of this review, the work group identified opportunities for Research and Development that would:

  • Strengthen the capacity of HHS data resources
  • Assure efficiencies, quality, utility and timeliness
  • Address high priority data gaps and minimize non-productive redundancy
  • Enhance HHS capacity for quick response data, nontraditional data, State and local data, and data on vulnerable populations.

Similarly, the Integration and Alignment Working Group reviewed the inventory of HHS data systems and assessed past and emerging integration initiatives in Departmental data collection systems including surveys, administrative data and public health reporting.  As part of their charge, this work group identified opportunities for Integration and Alignment that would:

  • Promote synergy across systems
  • Achieve further integration and alignment
  • Enhance data linkage across systems
  • Assure alignment and convergence of surveys, administrative data, and EHR data
  • Maximize the use of technology

[ Go to Contents ]

Data Needs and Priorities

The review of the inventory of HHS data systems by the Working Groups included an assessment of critical data gaps and priorities that currently exist.  The Department will need to direct focus and attention in moving forward to address these and other priorities, including data on:

  • The impact of the Affordable Care Act (ACA) provisions and other forces affecting health system change on health status outcomes, insurance coverage rates, access and quality indicators, health care expenditures and population health measures.
  • Monitoring the behavior of States, health plans, employers, providers, and consumers in the context of ACA.
  • The health status and health care disparities of vulnerable populations, such as racial and ethnic populations, persons with disabilities, rural populations and LGBT populations.
  • State and community level policy and public health data.
  • Changes in the adequacy of health care institutions and workforce to meet needs.
  • Social determinants of health and the changing nature of population health beyond the health care delivery system.

To address such priorities, HHS data collection systems will need to be more responsive to policy needs in terms of timeliness, flexibility, granularity, and the capacity to monitor change over time.  In addition, consideration of non-traditional data sources such as those available in the commercial sector will help to address some data needs and priorities.

[ Go to Contents ]

Research and Development to Improve HHS Data Collection Systems

HHS data collection systems have historically employed the latest data collection modes and technologies consistent with data objectives, and they do so currently as well.  Current technologies for surveys include:  Computer assisted personal interviewing (CAPI); Audio computer assisted self-interviewing (ACASI); Computer assisted telephone interviewing (CATI); and, Interactive Voice Response (IVR).  Current activities are exploring the potential for survey pilots using EHR data, and ongoing health care provider and establishment surveys are employing mixed mode techniques that include web based and interactive online reporting.  Data collected from physical examinations, medical record abstraction, surveillance systems and administrative data systems employ a range of mechanisms for data capturing and coding which include electronic automated and web based systems.  Several promising practices for research, development and survey technology are described below.  These promising practices can improve the speed and efficiency of data collection and release; improve access to HHS data; and enhance methodology to fill data policy and population data needs.

Web Panels and Web Surveys

Web based research panels and web based surveys are increasingly being used alone or in combination with other modes for research.  The advantages of the use of the web include faster data collection, convenience, and lower costs compared to in-person interviewing.  Challenges in using web surveys and web panels include ensuring a representative sample, coverage of small or special populations, and variable response rates.  Considerable interest has been expressed in the development of a plan and feasibility study to add web panel components to the National Health Interview Survey, the National Health and Nutrition Examination Survey, and the Medical Expenditure Panel Survey.  In addition, SAMHSA is exploring the use of web panels to identify emerging patterns of substance use that need to be captured with questionnaire updates.  In addition, the potential of utilizing commercial web panels for health research needs further exploration.  The Work Groups recognized the need for the development of HHS guidelines describing good practices for using web panels in the context of research objectives.

Timeliness of HHS Data Systems

There is considerable interest in expanding and enhancing the capacity of HHS data systems for providing quick response and faster turnaround for researchers, policymakers, State and local health departments and the public.  HHS surveys provide data for researchers, analysts, and policymakers within 9-12 months after data collection is completed.  Within the Department, a number of operating divisions have developed quick release analytical programs such as the National Health Interview Survey quarterly release program, the vital statistics advance data release program, and the MEPS employer component.  CMS also has made strides in improving the timeliness of administrative and claims data for Medicare and Medicaid.  Early release strategies for administrative data now include sample or partial year release of data for analysis along with early estimates.

Access to HHS Data

HHS agencies have expanded their efforts to provide access to the extensive data resources they maintain.  HHS data access initiatives now include the HHS Open Government Plan, Data.Gov, Health Data.Gov, the Health Indicators Warehouse, and a number of individual agency initiatives.  In addition, HHS agencies continue to make their data available through a variety of other methods, including public use data files, web based statistical tabulations and compendia, publications, and Research Data Centers (RDCs).  Research Data Centers currently serve as the mechanism for researchers to access restricted data files.  The present structure and process through which users gain access to RDCs can pose barriers, including difficulty for researchers to learn what linked data are currently available for analysis, travel to RDCs due to limited remote access options, and the challenge of determining which data sets involve linked data.  At the same time, considerable progress has been made in expanding access to non-restricted files, for example, through direct website access as with HCUPnet and MEPSnet.  SAMHSA has accelerated its program to expand data access.  Public use files for the Drug Abuse Warning Network have been released for the first time for download and online analysis.  An online restricted use data analysis system and a data portal to provide off-site access to restricted use data files are under development by SAMHSA, beginning with the National Survey on Drug Use and Health data.

Non-Traditional Data Sources

Data from commercial and non-traditional sources have the potential to enhance or augment data collected by HHS, and several commercial data sources are currently utilized within the Department.  HHS agencies make use of commercial data resources such as pharmacy claims data, commercial insurance data bases, multi payer claims data bases, local health care market area data, local health care resource data and retail sales data.  An inventory of commercial data sources that agencies currently utilize is needed to facilitate broader shared use of commercial data.

Methodological Techniques to Increase Survey Efficiency

Responsive design interviewing has been used successfully in the National Survey of Family Growth and mixed modes are used in many HHS data systems.  These methodologies can be explored for expanded use in HHS data collections.

Augmenting HHS Analytic Capacity for Community-level Analysis and Special Populations

For surveys, sample size presents a challenge for small area analysis and analyses of vulnerable and special populations.  However, several techniques are currently being used to develop estimates for small areas and special populations.  A number of HHS surveys have the capability to provide State and community level data and special population data.  Other HHS surveys will need to explore the potential application of a variety of techniques for such estimates, including direct estimates and model based estimates.  A complementary strategy is to use administrative data, which are based on a census rather than a sample.  Work also is needed to improve coding in administrative data, ensure completeness, and expand the use of EHR data in HHS data systems.

[ Go to Contents ]

Enhancing the Integration and Future Alignment of HHS Data Collection Systems

One powerful strategy to bolster the utility of data is to integrate data from two or more sources.  This can be accomplished, for example, by linking two surveys, linking surveys with administrative data, linking administrative data with clinical data, and other data linkages.

Survey data

Integration of surveys with each other depends on a strong core of surveys with sufficient samples and breadth of content.  The HHS core surveys already provide a platform for integration, lessening the need for stand-alone survey efforts.  Thus, little evidence was found of duplication or unintended overlap within HHS data collections.  However, there is a greater need for coordination and standardization of data collection in some areas, such as insurance coverage and health plan data.

Successful past survey integration activities in HHS include the NHIS and MEPS, and the National Immunization Survey (NIS) and the State and Local Area Integrated Telephone Survey (SLAITS).  Several additional survey integration efforts are planned within the Department.  They include the integration of various health care provider surveys in the National Health Care Survey at NCHS, the Substance Abuse and Mental Health Services Administration  NCHS integration of emergency department surveys, the NCHS Long Term Care Family of Surveys involving the integration of long term care provider surveys, and the integration of the sample frame for SAMHSAs two national surveys of Mental Health and Substance Abuse facilities.  In planning its health care provider surveys, NCHS is looking to obtaining increasing amounts of data from EHRs in conjunction with health care provider surveys.

Most surveys have the capacity for linkage to Medicare enrollment and claims data, Social Security Administration data, and social and economic contextual data, but the frequency of these linkages and access to data are unclear.  HHS provider surveys are testing the potential for data linkage with administrative and EHR data, and initiatives are underway in Medicare and Medicaid data access, linkage and timeliness, in multi payer claims data, and health plan data.

Although many HHS surveys have the capacity to link to administrative records, information on what data systems are currently linked and available for use is limited.  Several data access initiatives to promote the use of linked data are underway, but more effort is needed in this area.  Coordination and improvements also are needed for Research Data Centers to provide access to data, usually linked data that cannot be released to the public, while safeguarding the privacy and confidentiality of personally identifiable health information.  SAMHSAs data portal will facilitate the linkage of its survey data with other data sources in a secure environment.

Administrative and Claims Data

Administrative or claims data can provide a relatively inexpensive source of data with the capacity to drill-down to States and localities and to assess racial, ethnic or other disparities in access, quality, or cost.  Considerable innovation is underway to expand the utility of administrative data by linking it to other data sources and improving the information in the source data.  AHRQ is actively supporting and evaluating State and hospital ability to link to additional clinical data that are already available electronically.  Progress is being made to adopt standards in claims data sets for primary language spoken.  Web-based toolkits are being made available in the area of laboratory data and point of admission coding (POA) easing the adoption process with staff and stakeholders through training materials.

Administrative data files created by CMS are an extremely valuable source of HHS data.  Efforts to speed the process through which users can extract information contained in CMS administrative data systems and reducing barriers to use could also improve data timeliness.  To improve accessibility, the agency is examining the feasibility of creating claims-level public use files (PUFs) that can be made available to users free of charge on the CMS website.  CMS is testing new techniques of data acquisition from States in order to examine ways to reduce administrative costs and level of effort for both federal and State governments, and to improve data timeliness, detail, and reliability of Medicaid data.

Public Health Surveillance Data

National public health surveillance systems, including the National Notifiable Disease Surveillance System, the Behavioral Risk Factor Surveillance System, and several infectious and chronic disease surveillance systems provide a critical public health data resource.  Conducted in collaboration with local and State health department partners, these data systems constitute a great deal of the data collected and disseminated by CDC and represent a large portion of the CDC fiscal investment in data collection.  A number of initiatives are underway to strengthen and improve these data resources, including increased reliance on electronic reporting, the development and adoption of data standards, and coordination and integration efforts.

In addition, the meaningful use provisions of HITECH envision more effective ways of accomplishing public health functions such as disease reporting, registry development, and syndromic surveillance.  For this vision to be realized, State and local public health information systems need to be strengthened so that they can receive and process data coming from provider EHRs.  There is also a need for standards development based on functional requirements that reflect the workflow and data content needs of State and local health agencies.  On this front, work is being conducted in a number of public health areas at CDC including laboratory reporting, vital records, newborn screening, immunization, cancer reporting and other disease-specific areas.  The pathway toward adoption of the necessary standards and the considerable investment needed in bringing health departments up to an operating status is the subject of workgroups at CDC, the Office of the National Coordinator for Health IT (ONC), and numerous public health informatics groups such as the Joint Public Health Informatics Taskforce and the Public Health Data Standards Consortium.

Electronic Health Records

In the context of pursuing meaningful use through certified EHR systems, standards are being developed that could facilitate reporting of clinical data that will be useful not only for patient care, but for population monitoring and research as well.  Steps must be taken to develop techniques to extract meaningful data in ways that can in the future complement survey and administrative data collection.  Challenges for research with these records include non-standardized data elements as well as physical and organizational barriers to data integration when source data are housed in multiple repositories.  A research roadmap is needed to align the data capabilities afforded by new administrative data systems, EHRs, surveys, and the promise of standardized data repositories for public health and research.  Pilot tests, workshops, linkage and methodological research can promote alignment.  HHS Comparative Effectiveness Research investments to create practice-based networks can be used to learn how EHR data can be extracted for use in population health, provider, surveys, and outcomes evaluation.


For purposes of data strategy, alignment involves the harmonization and convergence of current and future data sources through standards, data collection processes, analysis and dissemination policies and other means of assuring future convergence among EHR systems, administrative data systems and surveys.  The ultimate goal is to encourage synergy across these data producers in ways that enhance everyones capability to improve health and health care and answer critical health policy, systems, and research questions.  A variety of activities are being undertaken in HHS to explore how greater standardization of data can enhance analytic capabilities.

  • Under the National Quality Strategy, efforts will be made to ensure that quality measures used by HHS will increasingly be harmonized for use with provider incentives and with further iterations of meaningful use standards.
  • In terms of workforce policy, there is a growing concern about the adequacy of the health professional workforce to address future demand, particularly the availability and distribution of primary care practitioners as well as the public health workforce.  HHS is undertaking a multi-agency effort to assess the current state of monitoring the workforce with existing data and the potential of new standardized data from States, professional groups, and other federal agencies to augment present efforts.
  • The administrative data that will be created as a byproduct of implementation of ACA provisions will be an important source of future information.  Careful attention should be paid to alignment across the data elements and standards adopted by both the federal government and States in developing insurance exchanges and other information systems.

[ Go to Contents ]

Goals and Action Steps

The following goals and action steps have been identified in support of a Department wide data strategy to ensure that data activities address Administration and Secretarial priorities, promote coordination and efficiencies in data planning, and meet critical interagency data needs in a coordinated and integrated fashion.

Advancing Research and Development to Improve HHS Data Collection Systems

Goal 1:  Improve the speed and efficiency of data collection and release.

Action Steps:

  • HHS will explore approaches to improve the speed of data collection by maximizing the appropriate use of web surveys, quick turnaround phone surveys, and automating data collection for provider surveys.  These technologies have the potential to significantly reduce the time in the field required to collect data and provide meaningful analyses.
  • HHS agencies will explore the use of web surveys and web panels in HHS data collection systems.  Specifically,
    • NCHS will test the feasibility of adding a web based component to the National Health Interview Survey and will evaluate this effort as it relates to improvement in survey timeliness.
    • The HHS Data Council will develop policy and best practices for the use of web surveys and web panels in HHS data collections.
  • HHS agencies will explore use of quick response telephone surveys to improve timeliness.
    • AHRQ will test the feasibility of adding a telephone or web based panel to the MEPS Household Component as a quick response mechanism.
  • HHS agencies will fully explore and expand the use of electronic and automated data collection capabilities in data systems.
  • HHS agencies will develop and expand early data release programs, such as the NCHS, AHRQ, SAMHSA and CMS initiatives, and expand and adapt the strategy to new survey platforms and administrative data systems.
    • HHS will explore the expansion of administrative data analytic capability to include developing methods and capacity to improve coding, ensure data completeness, and use of EHRs.  CMS is currently pursuing methods to collect encounter data collected for Medicare beneficiaries in managed care plans.

Goal 2:  Improve access to HHS data

Action Steps:

  • HHS agencies will continue to expand their efforts to provide access to the extensive data resources they maintain though outlets such as Data.Gov, Health Data.Gov, the Health Indicators Warehouse and individual agency initiatives.  In addition, HHS agencies will continue to make their data available through a variety of methods, including public use data files, web based statistical tabulations and compendia, publications, restricted-use data analysis systems, data portals and Research Data Centers (RDCs).
  • The Data Council will oversee periodic reviews of the data portfolios of HHS agencies to identify additional opportunities to make data accessible.
  • The Data Council will coordinate an assessment of Research Data Centers (RDCs) sponsored by HHS agencies with a view toward improvements to facilitate access to restricted data, while ensuring confidentiality protections.  This effort will include an assessment of expanded opportunities for remote access.
  • HHS will analyze de-identification methods to support greater development of public use files, and explore options to reduce the costs associated with using administrative data.
  • HHS will continue to make data and tools available to researchers and the public on, including the availability of new tools and instruments to analyze Departmental data.

Goal 3:  Enhance methodology to address policy and population data needs

Action Steps:

  • HHS will assess strategies to augment samples in population surveys to address community level and special population data needs, including data on vulnerable populations.
  • HHS agencies will develop and assess the potential of model based estimates to provide data for small areas and special populations.  In particular, NCHS will develop and assess the potential of small area estimation procedures for the NHIS and other NCHS surveys.
  • HHS will explore ways to expand use of mixed-mode surveys to address data needs.
  • HHS will compile and share an inventory of commercial data sets used by HHS agencies and identify situations and data objectives for which use of commercial data may be appropriate.

Promoting Data Collection Coordination, Integration and Future Alignment of Surveys, Administrative Data, and Electronic Health Record Systems

Goal 4:  Ensure that survey integration efforts are preserved and expanded.

Action Steps:

  • HHS agencies that conduct national surveys are planning and advancing several integration initiatives.  The HHS Data Council will ensure that these efforts remain on track, monitor their progress, learn from implementation experience, and assess further opportunities to increase value through further integration.  Specifically,
    • NCHS is integrating data collected from the National Hospital Discharge Survey and the National Hospital Ambulatory Medical Care Survey into the new National Hospital Care Survey.
    • SAMHSA and NCHS are collaborating on the integration of emergency department data collection systems into the NCHS family of national provider care surveys.
    • SAMHSA is currently integrating its two major surveys of treatment facilities for mental illness and substance abuse.
    • NCHS is developing a plan for a family of long term care facilities surveys.

Goal 5:  Maximize data linkages between HHS data systems to increase analytic capacity and survey efficiency

Action Steps:

  • Through the HHS Data Council, HHS will develop and publicize a description of all current data linkage capacities and linked data sets.  Attention will be directed at the array of on-going linkage capabilities between the national surveys and administrative datasets as well as plans for additional linked data sets.
  • Based on a review of the linked data set inventory, HHS will identify gaps across the major surveys that might be addressed through record linkages and assess the potential for additional linkages.

Goal 6:  HHS will promote opportunities for integration, standards and alignment of surveys, administrative data systems and Electronic Health Record Systems to meet emerging data needs in a coordinated and cost effective manner.

Action Steps

  • The HHS Data Council will coordinate a review of current and new administrative data systems associated with ACA implementation to determine where greater alignment in terms of data standards can enhance the potential for ACA monitoring, policy analysis, and evaluation, and will describe how survey data can complement administrative data.
  • HHS will develop and implement data collection standards for race, ethnicity and other demographic data as outlined in Section 4302 of the Affordable Care Act.  Proposed standards for race, ethnicity, sex, primary language, and disability status have been published for public comment.  Final data standards are scheduled to be implemented by the Department in 2012.
  • The Data Council will establish a Working Group to coordinate plans and activities for adding new questions relating to monitoring the impact of health reform to HHS and Census Bureau surveys.
  • HHS will explore ways to build on federal research investments in practice-based networks to learn how EHR data can be most effectively extracted for use in population health activities, quality measurement, provider surveys, and outcomes evaluation.
  • The Data Council will develop a framework and research agenda to guide and support the alignment of the data systems and data capabilities that will be afforded by new administrative data systems, Health information exchange, EHRs and personal health records, and surveys.  An agenda of pilots, workshops and methodological research will be developed to promote this alignment.
  • HHS initiatives in the areas of workforce, disparities reduction, and EHR meaningful use have implications for data standards development and alignment.  These cut across federal, State and private data systems.  The Data Council will serve as an overarching monitor, and when appropriate coordinator, of data alignment aspects of these domains.

[ Go to Contents ]


In addition to the Action Steps described above, the Data Council offers several high level recommendations concerning continuing leadership support for the data collection infrastructure, goals and strategies proposed in this report.  These recommendations provide the framework to ensure that HHS continues to make progress in addressing data needs, policies and priorities.

  1. HHS should maintain and protect the core surveys and administrative data systems that provide the foundation for the Departments and the nations capacity to monitor health and well-being, the quality and efficiency of healthcare, health and human services systems changes, and Secretarial initiatives and priorities.
  2. HHS should ensure that the core surveys maintain their capabilities to serve as robust platforms for integration, and look at possible ways of enhancing their designs to meet future integration needs
  3. HHS should continue to support the development and use of timely and high quality administrative data to provide data for decision making, as well as administrative data initiatives aimed at more timely release and improved access and availability
  4. HHS should implement the Data Councils proposed strategy to address data needs for racial and ethnic minority populations as well as other vulnerable populations, as outlined in the HHS Disparities Action Plan.  The strategy includes data collection standards, inclusion of standard questions in major surveys, oversampling in major surveys, use of administrative data, pooling and other innovative approaches to analysis, commitment to data access and dissemination, and special targeted studies for populations not amenable to national surveys.
  5. HHS should examine the potential of current surveys and data collection systems to provide data on high priority HHS data needs, including
    • The impact of ACA provisions and other forces affecting health system change on health status outcomes, insurance coverage rates, access and quality indicators, healthcare expenditures and population health measures.
    • Data to monitor the behavior of States, health plans, employers, providers, and consumers in the context of ACA.
    • Data to assess disparities in health status and health care of vulnerable populations, such as racial and ethnic populations, persons with disabilities, rural and LGBT populations.
    • State and community level policy and public health data.
    • Information to monitor and assess changes in and the adequacy of health care institutions and workforce to meet needs.
  6. HHS agencies should continue to expand efforts to provide user friendly access to and maximum availability of the extensive data resources they maintain though outlets such as Data.Gov, Health Data.Gov, the Health Indicators Warehouse and agency initiatives.  In addition, HHS agencies should continue to make their data available through a variety of other methods, including public use data files, web based statistical resources and compendia, publications and Research Data Centers (RDCs.

[ Go to Contents ]


While the data collection systems sponsored by HHS are essential to support the HHS mission, they also provide most of the national statistical capacity to monitor the health of the population and the functioning of the health care and human services systems.  Although many HHS data systems represent the state of the art in their class, new challenges are placing increasing demands on these resources.  New data needs are arising, significant gaps exist, pressures for improved timeliness are mounting, and costs are increasing.  At the same time, new administrative data systems needed to support health reform, and the widespread adoption of EHR systems and electronic information exchange can provide a wealth of new data sources.  The need to improve survey efficiencies, address gaps, increase data for vulnerable populations and small areas and improve the timeliness of data collection and dissemination are challenges that a thoughtful and systematic data strategy can address.

The goals, actions steps and recommendations described above are offered for HHS consideration to address many of the challenges outlined.  In many instances, work has already been initiated under the auspices of the HHS Data Council and individual agencies.  The Data Council stands ready to undertake, coordinate and oversee implementation of the Actions Steps recommendations and strategies outlined, and to work with and support other entities in doing so.