Toward a National Health Information Infrastructure: A Key Strategy for Improving Quality in Long-Term Care. Standards for Terminology Coding Systems

The phrase "terminology coding systems" refers to the continuum of approaches used to assure standardized recording and encoding of clinical data in electronic record systems. Such coded data is central to the efficient exchange of information in messages sent across documents, systems and applications. Various types of terminology coding systems exist on a continuum that ranges from human readable, enumerated coding schemes to formal terminologies that enable machine "understanding."4

Enumerated coding schemes emphasize encoding pre-coordinated phrases that enable users to pick the most relevant terms from pre-defined lists. Typically, such systems provide a very limited coverage of clinical content, and focus only on the specific use for which those data are required. Such systems reflect the technology available 20 years ago and the constraints that were present in relation to coding data for computer based analysis. The MDS is an example of an enumerated coding scheme. The enormous collection of such single-purpose, stand-alone coding systems has created a situation often compared to the Tower of Babel (, where different data sets and software applications are not able to meaningfully exchange or reuse data and information.

More recent research and development initiatives in electronic health records emphasize the use of formal terminologies. Formal terminology systems emphasize the indexing and retrieval of concepts and their associated terms, and the post-coordination of phrases.

Between the enumerated classification systems and formal terminologies that anchor this continuum are other types of terminology coding systems such as nomenclatures, classifications, and taxonomies. Each is differentiated by the nature of the organization of terms within the system and the concept orientation of the coding system. It is important to recognize that the development of more complex types of terminology systems is in large part enabled by the development of technologies that enable more complex data structures and the development and use of description logics based on first order logic as a foundation for the algorithms that enable the semantics or "machine understanding" of text.

The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is one of the most extensively developed terminologies of this type ( SNOMED CT is discussed more extensively below, but an example is provided here to clarify this discussion point. The pre-coordinated term "pneumonia due to Klebsiella pneumoniae" is equivalent to a phrase that could be post-coordinated using the following SNOMED CT codes: 56415008 "Klebsiella pneumoniae" and 233604007 "pneumonia". A portion of the SNOMED CT hierarchy is presented below.5 SNOMED CT specifies that pneumonia:

Is-a disease of the lower respiratory tract
Finding_site lung structure
Onset (subacute, acute, insidious, sudden)
Severity (mild, moderate, severe)
Episodicity (first, new, ongoing, other)
Course (acute, acute diffuse, acute-on-chronic, etc.).

Similarly, SNOMED CT specifies that Klebsiella pneumoniae:

Is_a Klebsiella
   Is_a enterobacteriaceae
      Is_a gram-negative bacillus
         Is_a gram-negative bacterium
            Is_a bacterium
               Is_a infectious agent
                  Is_a microorganism

The linkage concept "due_to" specifies the relationship between the two concepts of "pneumonia" and "Klebsiella pneumoniae." More complex expressions are possible, and can be constructed at the point of care, reflecting the clinically relevant data. Encoding of patient information is then accomplished through the post-coordination of the terms "pneumonia" "due to" and "Klebsiella pneumoniae", using a formalism such as description logics that function somewhat as an assembly language for expressing phrases.6

In this example, the goal is that in well-designed and standardized PMRI systems, all lab reports indicating the type of pneumonia would be located within a standard document architecture, a standard coding scheme would be used to name the pneumonia, and a message could be sent from the PMRI to a reporting document such as the MDS indicating the presence of pneumonia. With respect to pneumonia, the MDS only requires data on the presence or absence of pneumonia. That information could be "messaged" from the PMRI and the MDS form could be sent only with information indicating pneumonia, excluding information on the biological agent that caused the pneumonia. However it is important to retain more detail in automated patient records in order to construct decision support systems that might, for example, suggest cost-effective antibiotics for specific types of pneumonia, detect any possible drug-drug interactions, or enable the reporting of another type of pneumonia, e.g., "pneumonia in anthrax" to appropriate public health agencies.

The relationship of messaging standards, document architecture, and coded terminology systems and formalisms is equivalent to thinking about the grammar that enables us to put words together in order to communicate ideas. Achieving the vision of the NHII requires finding the relevant data within a source document (most easily achieved by creating documents in a structured fashion), composing expressions that include varying degrees of detail, and then "populating" the data fields of an HL7 version 3 message format for transmission to another application. Uniform data structures and encoding are required to accomplish this. All participants, from vendors supplying the software to providers to the government agencies providing oversight of quality, public health, and health policy, must adopt uniform data standards if the data are to be interoperable, or exchanged across applications and systems. The goal is similar to the use of an ATM card to deposit and withdraw cash at locations remote from one's bank; a major difference is that we all agree on the naming and value of monetary units while we do not agree on how to name our clinical data, or how to formally represent those data.

To date, the NCVHS has not recommended standards around terminology systems. However, desirable characteristics of formal terminologies are well described in the literature and are briefly summarized below.7, 8, 9

Concept orientation: Tools that empower users to adapt "local terms" to reference terminologies are required when the concepts the terms represent are equivalent. For example, "pressure sores" may be the locally preferred term while "pressure ulcers" is the term in the reference terminology. Therefore well-formed terminologies must accommodate both synonymy and lexical variants, and a thesaurus must be available for automated identification of terms associated with concepts. A promising way to accomplish this is by assembling components into a dynamic terminology server, rather than presenting users with a laundry list of all possible terms.

Comprehensive and complete: Well-formed terminologies must provide the depth and breadth of content coverage relevant to specific domains. This means there must be a way to express all the clinical content required for a wide range of specified uses.

Atomic and compositional: Well-formed terminologies must ensure that "atomic" levels of data are available and that the meaning of atomic level data elements is preserved when combined or post-coordinated with other concepts. A closely related requirement is that concepts are organized within the framework of a reference terminology system that enables the assembly of atomic concepts into more complex expressions (as in the earlier pneumonia example).

Explicit formalism (e.g., description logic): Well-formed terminologies must have a formal logic or inference engine that enables the post-coordination of more complex expressions from atomic level data elements. Presently, description logics appear best suited to this task.

Multiple classifications: In order to support the reuse of clinical data across multiple special purpose classification systems, terminologies must enable concepts to be mapped to multiple "parents". For example, one MDS data element is "short-term memory," and the most similar SNOMED CT term is "uncompensated short term memory deficit." The short-term memory item in the MDS indicates that the patient "seems/appears to recall after 5 minutes." If using the SNOMED CT system for encoding data, one would need to decide which of the following parent classifications represents the intended use of the data.

"Uncompensated short term memory deficit" is classified in SNOMED CT as

   finding of memory performance
      memory finding
         functional finding
            clinical history and observation finding
                  SNOMED CT concept

"Uncompensated short term memory deficit" is also classified in SNOMED CT as

   short-term memory performance
      verbal short-term memory performance
         ability to recall random address at five minutes
            ability to recall five digit number at five minutes
               visual short-term memory performance
                  ability to reproduce geometric figure at five minutes

Representation of context: Some experts in the field of electronic medical records believe that well-formed terminologies must be coordinated with structural models of clinical documents within the electronic record in order to disambiguate meaning from use ( For example, "History of heart disease" means something very different when recorded in a family history section of the record than when recorded in a past medical history section of the record.

Clearly, the needs for health care data and information reflect multiple and complex uses of that information, and the requirements for terminology systems are extensive. Without terminology standards that support the composition and de-composition of clinically relevant and detailed expressions, interoperability and reuse of patient data across applications and systems will be seriously constrained. Formal terminology coding systems are critical to the success of uniform coding in PMRI systems and to support the evolution of the NHII. This study focuses on three coding systems. Only SNOMED CT has been developed with the specific purpose of meeting the requirements of a reference terminology for PMRI systems. The other two coding systems, ICF and ICNP, are included because they are believed to include many of the definitions and classifications of terms in two subject areas that are particularly relevant to long-term care (functioning, disability and health; and nursing).

View full report


"toward.pdf" (pdf, 237.05Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®