Continuation of Research on Consumer Directed Health Plans: HSA Simulation Model Refinement . Data & Analytic Approach


Three data sources were used to complete this analysis.  These data sources and the steps taken

to prepare the database are described in Figure 1.  The data sources include:

  • The 2001 Medical Expenditure Panel Survey (MEPS) developed and supported by the Agency for Healthcare Research and Quality (AHRQ).
  • Health plan choice data from four large employers participating in a Robert Wood Johnson Foundation (RWJF)-funded study on Consumer Directed Health Plans (CDHPs). Originally, we used data from three large employers.  This addition doubled the covered lives available for analysis.
  • In the revised version we used premium data that we computed based on the actual claims experience of the different plans modeled.  Originally, premium data for individual health insurance policies from the web site.

These data sources were used for three major analysis tasks: Model estimation; Choice Set Assignment/Prediction; and Policy Simulation.  Often more than one database was required to complete the task.  Integral to this analysis was the use of consumer directed health plan data from three large employers working with the study investigators.  Below, we provide greater detail on database attributes, use of the databases, and the analytic methods used.

Database Descriptions

Medical Expenditure Panel Survey (2001):

The Medical Expenditure Panel Survey is an annual survey of the non-institutionalized, civilian population in the U.S.  For this project, we use the 2001 MEPS Household Component (HC), which is a public-use file containing detailed demographic, health status, employment, insurance, medical care utilization and expenditure information on individuals. We restrict our attention to individuals who are 19-64 years of age, not enrolled in public insurance programs, and not full-time students.  Our full sample has 16,282 individuals. When weighted to produce population estimates, this corresponds to 147,955,033 non-elderly adults in the United States. A breakdown of the 19-64 population for 2001 is provided in Figure A1.

Figure A1

Figure A1: A Breakdown of the 19-64 population for 2001.

Figure A1 illustrates a diagram that depicts the age group of 19-64, this corresponds to 147,955,033 non-elderly adults in the United States. They are categorized into three different groups Missing age, Not employed and Employed. For this analysis, data from four large employers representing approximately 160,000 covered lives of information (including dependents) were available.

Consumer Directed Health Plan data (2001-2003):

The project investigators had access to de-identified data on the selection of health plans by employees, as well as their demographics.  For this analysis, data from four large employers representing approximately 160,000 covered lives of information (including dependents) were available.  Three of the four employers were national firms with substantial populations of employees; one was a large employer located in Minnesota.  Each of these employers has offered a CDHP along with other traditional managed care plans.  For the CDHP plans, each employer has received a take-up rate ranging between 4% and 15% in their first year offered.

Model Estimation

The model estimation had several steps.  As a first step, we pooled the data from the three employers offering CDHPs to estimate a conditional logistic plan choice model similar to our earlier work (Parente, Feldman and Christianson, 2004).  Conceptually, we used a choice model based on utility maximization, where utility is considered to be a function of personal attributes such as age, gender, income, chronic illness, and family status; health plan attributes such as the tax-adjusted, out-of-pocket premium and the deductible amount; and the interaction of personal and plan attributes.  Personal characteristic variables were entered into the model as interactions of plan attribute variables.  The coefficient estimates produced by this model represent the utility of each plan attribute to an employee.

In the second step we used the estimated choice-model coefficients to predict health plan choices for individuals in the MEPS-HC.  In order to complete this step, it was necessary first to assign the number and types of health insurance choices that are available to each respondent in the MEPS-HC.  For this purpose we turned to the smaller, but more-detailed MEPS Household

Component-Insurance Component linked file, which contained the needed information.

The steps taken to estimate this predictive model are highlighted in Figure 1.  More detail of how these steps were executed is described below.

Estimate plan offerings using the MEPS linked data:

The MEPS "linked" Household Component-Insurance Component data file is a random sample of individuals who reported being employed and offered health insurance in Round 1 of the Household Component survey.  These individuals were asked to provide contact information regarding their place of employment.  Employers of these individuals were surveyed to provide detailed information about the number and types of plans that they offered to eligible workers.  For each offered plan (up to four plans for private establishments and all plans for government organizations), an employer was asked to include the total premium, employee and employer shares of the total premium, and plan characteristics including hospital and physician coinsurance, hospital and physician copayments, and deductibles for individual and family coverage.

Since the linked sample only represents a subset of all offered workers in the Household

Component, we checked the representativeness of the linked sample using a binary logistic regression and found:

  • Individuals in professional services and public administration were more likely to link than those in agriculture, mining, entertainment/recreation, personal services, and active military.
  • Midwesterners were more likely to link relative to westerners.
  • Whites were less likely to link relative to persons of "other" race.
  • Government workers had a higher response rate than private-sector workers.

The link process was a function of the following variables: age, sex, race, marital status, dependents, geographic region, metropolitan (MSA) location, government employment, establishment size, industry category, wage income, and chronic illness (defined as a binary variable).

The linked data have 3,127 individuals and 7,802 plan-person observations.  We do not have good information on response rates because we do not know what fraction of offered workers in MEPS was considered for the linked survey.  In absolute terms, it appears that approximately 36% of offered workers linked.

Approximately 40% of linked workers have one plan offered to them, 19.7% have two plans offered, 11.8% have three plans, and the remaining 29.5% have four or more plans from which to choose.  These percentages are not representative of the national proportions of workers who have one, two, three, and four or more plans offered to them because of the over-representation of government workers, who commonly have more offered plans than private-sector workers.

To predict the number and type of plans offered, we followed two steps:

1.  Used the MEPS linked insurance file to estimate a model for the number and types of health plans offered to eligible workers (age 19-64, non public enrollees, non full-time students).

More specifically, we estimated an ordered probit model with the dependent variable taking the values of 1, 2, 3, or 4+ plans.  The model included the following explanatory variables:  age, male, white, black, marry, total number dependents, wage income, union member, works for government, establishment size, whether the establishment has more than one location, northeast, midwest, south, and MSA.  The total number of observations was 2,891 and the R   was .12.

2.  Apply the model estimates to the MEPS-HC full sample to predict the number of plans for all respondents who were offered insurance by an employer.

Using the model estimates, for each individual who reported being offered employer group coverage in the MEPS-HC, we predicted the probability of each outcome (1, 2, 3, 4+ plans offered).  We then identified the category that had the maximum probability among the four options.

We used a specific decision rule to assign the number of plans to each individual  It included using both the category with the highest predicted probability as well as the individual's direct response to a question asked in the MEP-HC about whether he/she had a choice of plans.  If he/she was reported not having a choice of plans, then the individual was assigned one plan.  If he/she reported having plan choice, then the assigned number of plans reflected the outcome with the highest predicted probability among the 2, 3, and 4 plan options.

The types of plans were based on the distribution of plan offerings from the linked sample, conditional on the total number of plans offered.  For example, individuals who had one plan offered to them were most likely to be offered a Preferred Provider Organization (PPO) plan.16 So, we assigned a PPO to those with one offered plan.  The other assignments were as follows:

2 plans:       PPO and HMO

3 plans:       2 PPOs and 1 HMO

4+ plans:     3 PPOs and 1 HMO

Estimate Premiums for Simulation:

One challenge we faced was how to designate specific plan attributes (e.g., coinsurance rate, deductible, etc.) for the assigned plan choices.  Originally, we used summary statistics from the MEPS linked insurance file to identify the median characteristics of plans by type (PPO versus HMO) as well as coverage type (single versus family).  To predict the premium that would be associated with a particular bundle of attributes, we estimated "hedonic" premium models.  The specific equation used was:

Total premium = f(hospital coinsurance, physician coinsurance, and deductible).

The estimates for HMOs used patient co-payments (dollar payments per unit of service) rather than the physician coinsurance rate.

These equations were estimated separately by coverage type, plan type, and establishment size (e.g., single-coverage PPO offered by establishments with <50 workers).  The model estimates were then used with the summary statistics to predict premiums for each plan, coverage type, and establishment size category (< 50; 50-200; >200) combination.

Finally, to obtain the employee's out-of-pocket premium cost, we multiplied predicted total premiums by the average proportion paid by employees for single and family coverage.  We did not feel that the sample sizes were large enough reliably to perform this multiplication separately by coverage type and establishment size.

In our new approach, we used the claims data from the different health plan types to develop experience rated premiums for each person in the employer data using variables common to both the employer and MEPS database.  We then computed group market community rated premiums for firms with different establishment sizes as was developed before.  The key difference in the premium by establishment size was the loading factors we assumed which the smallest employers facing the highest loading charge and the largest employers facing the greatest loading charge.  In the individual market, everyone faced their computed experience rated premium plus the smallest group loading charge.

Estimate Plan Choice Regression:

We pooled plan choice data from the four employers offering CDHPs to specify a conditional logistic regression model similar to our earlier work (Parente, Feldman and Christianson, 2004).  Conceptually, we use a choice model based on utility maximization, where utility is considered to be a function of personal attributes such as health status, health plan attributes such as the out-of-pocket premium, and the interaction of premium and health status, formally stated as:

Uij = f(Zj,Yi,Xij)

Where i is the decision-making employee choosing among:

  • j = health plan choices,
  • Yi = employee personal attributes,
  • Zj = health plan attributes and
  • Xij = interactions between alternative-specific constants and personal attributes.

A very important constraint in our modeling was that any plan attribute used in the model from the employer data also had to be available in the MEPS data to permit a simulation.  As a result, the key variables used in the plan choice model were:

  • SCALEDPREM = After tax premium paid by the employee
  • CLB = Amount of money in the employee’s health reimbursement account (HRA), if any.
  • CUB = Difference between the employee’s plan deductible and the HRA.
  • COIN = Coinsurance rate
  • CHRONIC = Employee or dependent has a chronic illness=1, else 0 NEW
  • AGE = Employee’s age (years)
  • FEM = Employee’s gender (1=female, 0=male)
  • FAM = Employee has a 2-person or family contract=1, else =0
  • INC = Employee’s annual wage income.

Also included in the regression were alternative-specific constants (intercepts) for each of the possible health plan choices.  These intercepts are used to capture plan-specific features not represented by other identifiers of plan design.  They are also included as interaction terms with age, gender, family status and income.  The intercept terms include:

  • PPO_L = PPO Low (e.g., restrictive network, high co-pay, 15% coinsurance)
  • PPO_M = PPO Medium (e.g., better network, lower co-pay and coinsurance)
  • PPO_H = PPO High (e.g., open network, lowest co-pay, no coinsurance)
  • HRA = Health Reimbursement Account CDHP
  • HSA_E = Employer-sponsored HSA, modeled on higher premium cost HRA
  • HSA_S = Employee-paid HSA, no employer contribution, modeled on lower premium cost HRA
  • HMO = Health Maintenance Organization

Choice Set Assignment and Prediction

Assign Plan Choices to Full MEPS Sample:

We used the three data sources to develop two sets of plan choice predictions for the simulation: one set of data for workers with insurance offers and a second set for individuals who do not have employer offers of coverage.  This second set includes both uninsured individuals, as well as those who take up non-group policies   One group of individuals that we exclude from the simulation are non-offered individuals who reported having employer group coverage through another household member. Below we outline the analytic steps taken to develop the individuals' choice sets for the simulations.

1.   Workers With Offers

We started with the original four choices predicted earlier, including three PPOs and an HMO.  Since a worker was assigned between one and four plans, we needed to make some assumptions for each.

  • 4 choices: Low PPO, Medium PPO, High PPO, HMO
  • 3 choices: Low PPO, High PPO, HMO
  • 2 choices: Medium PPO, HMO
  • 1 choice: Medium PPO

Here, low, medium, and high refer to the cost and quality of the plans (e.g., low implies low cost and lower quality).

To these choices we added four additional options:

  • Self-financed (full cost) HSA - Additional choice for all workers
  • Turned down health coverage - Additional choice for all workers
  • Employer sponsored HSA - Available to all workers in establishments with >500 employees, not available to other workers
  • Employer sponsored HRA - Available to all workers in establishments with >500 employees, not available to other workers

2.   Individuals Without an Insurance Offer

Individuals who did not have health insurance offered to them at work or who were not employed faced five health plan choices regardless of income, age or gender:

  • High PPO
  • Medium PPO
  • Low PPO
  • Self-financed HSA
  • Uninsured

Use Parameter Estimates to Predict Plan Choice Probabilities:

With a total set of possible choices for workers with insurance offers and individuals without insurance offers, we used the plan choice regression results to predict plan choice probabilities

for each MEPS-HC sample respondent.17

However, before we could predict the probabilities, we needed to develop some specific assumptions about benefit plan design and premiums for individual plans.  To get premium estimates, we originally used MEPS linked insurance data to develop a hedonic price model to predict premiums for individual plans.  We worked with the same hedonic plan regressions described above, except that for individuals without offers of coverage, we used the premium model for the smallest establishment size category, based on the assumption that this most closely represents an individual policy in terms of the loading charge for plan administrative costs.  The current approach used individual market premiums that were computed for each person in the MEPS plus a loading charge.  The premium estimates came from health plan specific cost regressions.  Originally, we needed to inflate premiums from 2001 prices to 2006 prices based medical insurance price inflation during the period.  In the current model, we inflated the premiums from 2002 (the dominant year of the claims data used) to 2006.

The plan characteristics that we used to define the three PPOs (low, median, and high) came from the 2002 HIAA/ survey of plans purchased in the individual market. Roughly speaking, we used the 25th, 50th, and 75th   percentiles of coinsurance and deductibles for assigning the plan characteristics.

We also recognized that premiums in the individual market vary a lot by a person's age.  The

MEPS survey included a table of average premiums by age cohort.  Originally, we created an index using the information on this table.  The index was set equal to 1.0 for the age group corresponding to the median age of adults in our sample (35-39).  Older individuals, who had higher premiums, had index values that were greater than 1.0.  Younger individuals, who had lower premiums, had index values less than 1.0.  The index values ranged from .59 to 2.18 for single coverage policies and .453 to 1.65 for family coverage policies.

In the current model, we take age, gender, family contract and chronic illness into account to predict premiums used the health plans claims data.  Finally, we adjusted all premiums to 2006 dollars.

Rescale Take-up Rates

One significant issue with our simulation is that we were not able to predict whether or not an individual would take-up insurance in the employer-offered market or be uninsured in the individual market.  We faced this limitation because the CDHP employer data only includes information on offered workers who held coverage.

To address this issue, we needed to calibrate our model to accurately reflect both the actual percentage of people who turn down employer offers and the actual percentage of people in the individual market who are uninsured.  To obtain more accurate estimates, we completed these calibrations by four quartiles of income and then compared our results to national, non-take-up and uninsurance rates.  We also applied the national population weights to the calibrated model to represent the entire adult population, excluding full-time students, those with public insurance, and individuals with employer-based coverage through another household member.  This fairly tedious process was performed for each re-estimation and/or modification of the conditional logistic regression.

Policy Simulation

To complete the simulations, two final steps remained.  The first was to generate 2006 HSA

premiums and benefit designs.  The second was to specify the various simulation proposals.

Define HSA Plan Design and Premium:

Starting in 2004, we assumed that all individuals in the non-group ("individual") market would have access to an HSA.  We relied on the website ( for current information on HSA premiums and plan characteristics. We collected information on two HSA policies offered in the largest two cities across every state.  Next, we estimated a hedonic premium equation that allowed us to predict the premium for different HSA designs.  For all of the simulations, except one (described below), we used an HSA with a $1,000 spending account and a $3,500 deductible for single coverage and $2,000/$7,000 for families.  The average monthly premium for our prototype HSA for a 40-year old non-smoking single male was $102.78 per month; for a 40-year old married male (also a non-smoker) with a spouse and two children under the age of ten, the monthly premium was $226.97.

This approach was the same we used in both simulations.  The only difference was we updated the prices from 2005 to 2006 prices.  When we completed a market scan of the same major markets examined before, we did not see many benefit design differences from the time of our original analysis.

The HSA premiums used in our simulations are the sum of the catastrophic policy price plus a $1,000 account.  For example, a $6,500 HSA premium in our simulation for a family policy would be based on a $5,500 premium for a catastrophic insurance policy and a $1,000 HSA.

Benefit differences in HSAs can be large. For example, below we list two different HSA options,

a high a low deductible HSA plan in Santa Clara County, CA, that we found on in early, 2005:

HSA Option #1

Single Coverage:

  • $1,000 HSA Account
  • $3,500 Deductible
  • $2,500 'Donut Hole' (DH starts at $1,001 of expenditure - ends at $3,500)
  • 0% Coinsurance
  • Premium includes catastrophic and $1,000 HSA Account.
  • Thus, 100% catastrophic coverage starts at $3,501

Family Coverage:

  • $1,000 HSA Account
  • $7,000 Deductible
  • $6,000 'Donut Hole' (DH starts at $1,001 of expenditure - ends at $7,000)
  • 0% Coinsurance
  • Premium includes catastrophic and $1,000 HSA Account.
  • Thus, 100% catastrophic coverage starts at $7,001

HSA Option #2

Single Coverage:

  • $1,000 HSA Account
  • $2,600 Deductible
  • $1,600 'Donut Hole' (DH starts at $1,001 of expenditure - ends at $2,600)
  • 0% Coinsurance
  • Premium includes catastrophic and $1,000 HSA Account.

Family Coverage:

  • $1,000 HSA Account
  • $2,600 Deductible
  • $1,600 'Donut Hole' (DH starts at $1,001 of expenditure - ends at $2,600)
  • 0% Coinsurance
  • Premium includes catastrophic and $1,000 HSA Account.

HSA premiums were age-adjusted using the same method described above to rescale individual

PPO plan coverage.  Note, the premiums used in the predictions included an annual payment of $1,000 into an HSA for both the single and family policies.  We chose $1,000 because it was the lowest amount for a family coverage personal care account in our analysis of employer HRAs and a low to moderate amount for a single coverage personal care account.

Finally, it is important to note that for the Offered-turned down population, we have not explicitly taken account of whether these individuals have employer group coverage through another source (e.g., a working spouse).   From the MEPS data, we do know that approximately 25% of those who turn down an offer of employer coverage are uninsured.

Also, in our take-up estimates, we have excluded all non-offered individuals who reported having employer group coverage from their partner through the offered-group market.  This group represents approximately 29 million insured individuals.

Substantial Changes between the Original and Current Simulations

View full report


"report.pdf" (pdf, 321.78Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®