Improving the Collection and Use of Racial and Ethnic Data in HHS. 6. Use of Racial and Ethnic Data


The use of epidemiological data and other statistical data often involves gathering research and using it to help justify specific actions. Data help to frame an issue, providing evidence for subsequent actions or planning. Whether utilized by the public or private sectors, data can be used to plan programs, estimate burden of disease, profile patient characteristics, determine policies, assess outcomes, plan interventions, conduct health practices, perform evaluations, determine preferences, define clinical characteristics, provide investigational evidence, formulate and justify budgets, and allocate funds. Although data can be used to provide direct justification, this use is restricted by the current availability of complete, accurate, reliable, and relevant data.

Many present data systems are the best possible to date, yet many are complicated by data collection perplexities, sampling issues, and methodological complexities that severely limit their use for specific applications. For instance, some hospitals collect racial and/or ethnic data, while others do not. Thus, information about all populations is not available. Another example is the national data collection on a specific topic, such as cancer. Currently, SEER, one of the best cancer-related data systems in the country, is used to oversample racial and ethnic minorities and continues to make strides in reporting cancer data and statistics for racial and ethnic minorities. It is, however, plagued with sampling issues related to population representation and small sample sizes. The bottom line is that most data systems are riddled with complications that are not readily remedied. Therefore, it is important to not be overly critical of existing systems but, as improvements are made, to stay mindful of the limitations and subsequent uses of the data.

Data are best used with their intended purposes defined by the design parameters. To use data effectively, it is important to read the documentation or consult with the source agency to understand the purposes of the data, as well as the data preparation procedures. Before using data, it is important to know the parameters of the data set, such as population characteristics, intent, sample size, variable labels, delayed reporting, and numerators. This is especially important when using racial and ethnic data, because of differing practices in labeling and collecting the information, as well as variations in sample size. By properly discerning such details, data can be used more effectively, data gaps can be identified, limitations can be clarified, criticisms can be reduced, and misinformations can be decreased. When used appropriately with design limitations acknowledged, data can sometimes be used to provide direct evidence of disease and treatment patterns, detect changes in the progression or course of a disease or treatment, and set priorities for programs.

There appears to be some concern among the general public that data on race and ethnicity will be misused. The use of these data must be made clear to the public; safeguards against misuse must be outlined clearly to gain the cooperation of individuals who will provide the data.

Data as Source of Direct Evidence

Data can provide direct information on many factors, such as the incidence, prevalence, and survival rates of various diseases, the examination of results of diseases, screening patterns and rates, population characteristics, analysis of patterns of health care utilization, food intake, risk factors, mortality, health care needs assessments, patterns of care and health behavior, pathology, treatment methodology, costs, quality of life, genetics, epidemiology, etiology, and database linkages. Data can provide parameters pertinent to the estimation of the outcomes of a disease or the effect of an intervention. Examples include a decreased benefit (i.e., survival of an acute myocardial infarction), an increased probability of a particular risk (i.e., arrhythmia), or an increase in a symptom (i.e., prolonged chest pain).

If direct information is not available, related data are compiled to provide indirect evidence. These assumptions are often effective in describing a condition when direct evidence is not available. However, caution should be used when making resolute statements or program decisions based on such figures. In addition, data users should always be mindful that results are influenced and limited by design. Scientists interpret results of a study to search for "proof" in the form of statistically significant results. These results can only be as effective as the design and the cultural competence of the investigation. This is an important prerequisite to using and interpreting racial and ethnic data as direct evidence. There are cultural, language, and literacy differences influencing the responses that create data. Consequently, the outcomes may differ from the original intent. Definition of health outcomes must be precise and meaningful to those who will be affected by them. In addition, if the survey questions are not meaningful or culturally appropriate, response patterns may differ from the intended outcome. Thus, the design may adversely affect the data results.

Data Monitoring and Surveillance to Depict Patterns

Data monitoring and surveillance ascertains trends reflecting patterns of disease or treatment and characterizing an overall perspective that a single data point cannot describe. This kind of data monitoring can assist in detecting changes in the progression or course of a disease or treatment, which could possibly cause a change in treatment which, in turn, could change the treatment outcomes. A collective data portrayal provides stronger evidence than any single data point.

Although such information generates important contributions to understanding diseases and treatment patterns, it generally is not available for most racial and ethnic groups. In most cases, it is limited to white and black populations, mostly due to the historical lack of racial and ethnic qualifiers in previous data collection efforts. In the future, data collection efforts and inclusion may provide this valuable information for additional groups. Until then, the lack of information should not exclude these groups from funding and program priorities at the local, State, or Federal levels.

Data for Assistance in Assigning Priorities

In an age of limited resources, data are often used to assign priority when planning for programs and funding activities. Often this means that the resources used for one activity will not be available for subsequent activities. It follows then that, if the resources spent on one program could have yielded greater benefit if spent on another program, those resources have not been used in the most effective, efficient manner. These decisions are often persuaded by what information is available and dissuaded when direct evidence is not available for all the components of the decisionmaking process. As a result, racial and ethnic groups that do not have such information may be overlooked as a priority.

When allocating limited resources, the customary practice is to give priority to interventions that are the most efficient in that they yield the greatest improvement in terms of the stated objective for the available resources. Efficiency of allocation determines that resources should be allocated to interventions according to their "bang for the buck," or their effectiveness per unit cost when compared with that of alternate interventions. Funds are said to be allocated efficiently if the intervention chosen delivers the maximum total effectiveness possible given available resources. However, it is impossible to make such determinations without all of the evidence. Frequently, these decisions are made with the limited information, and subsequently can disregard a priority that does not have the evidence of proof (in the form of data). This practice often results in specific population concerns not being addressed simply because the relevant data are not available.

In the management of programs with limited resources and budget cuts, funding resources are often reduced in areas with no data or restricted data. In most cases, this becomes a precursor to failure, as resources are not allocated efficiently. The target population, in particular certain racial and ethnic groups, is often wrongly blamed for the inadequacy. Needed data are not secured, and programs cease to be planned or allocated for in the budget process. In the end, populations with missing data cannot be justified as a priority.

Examples of Data Use

American Indians or Alaska Natives tribal entities have the option of contracting or compacting with IHS to operate their own health programs. About 40 percent of the IHS budget is currently under tribal control. In tribal health programs that need data on demographics, morbidity, health care utilization and expenses, health risk behaviors, and others, tribal-specific data are ideal, but county-level Indian data can be used as a proxy. Such data are used for health program planning, budget formulation and justification, program advocacy to governments and other funding sources, performance measurement and evaluation, health status assessment, and research.

Minority community-based organizations use demographic, morbidity, health care utilization and expenses, and health risk behaviors data much in the same way. A substantial portion of this information is unavailable for minorities in general, and for specific subcategory populations,

such as Mexican, Puerto Rican, Cuban, and South American. There are, however, a few Federal and State data systems that collect information by racial and ethnic groups and subcategories. These data include but are not limited to: (1) census material, including the Public Use Microdata Sample (PUMS); (2) vital statistics from selected States that are available at the national level; (3) NHIS; and (4) SEER data from NCI; and NPCR data from CDC. Other data systems may collect sufficient minority and subcategory data at the aggregate level, bu the data are not always analyzed or disseminated by the agencies.

No standard practice exists for data collection employing specific racial or ethnic identifiers. For example, the term "Hispanic" is not associated with a national origin, which makes it difficult to collect data from non-U.S.-born Hispanics who are unfamiliar with the term. Furthermore, because Hispanic is an ethnic classification, the related data may be masked by the racial category, and the particular needs of Hispanics may go undetected. Many community-based organizations are frustrated with the need to have data to apply for funding and develop programs, when little or no data are available. This lack of data probably affects the shortage of Federal and State programs targeted specifically toward minorities. Private industry is also a consumer of racial and ethnic data. They use demographic data to target marketing and to determine which products to sell to which consumers. Certain segments of private industry have the capacity to conduct studies providing evidence of product usage broken down by specific groups and populations. Such data often provide the evidence for management and marketing plans.