Sample selection models are typically comprised of a number of equations. In this inquiry, the first equation estimates the variables related to the probability of having a LTC insurance policy. A limited dependent variable estimation process (e.g. Probit analysis) is most appropriate for this stage of the modeling because the dependent variable -- being an LTC insurance policyholder -- is dichotomous. The result of this analysis is that each individual in the pooled 1994 NLTCS and 1999 Insured Panel sample is assigned a probability of being a LTC insurance policyholder. These probabilities are then used to calculate a variable known as the Inverse Mill's Ratio. It is a nonlinear function of the ratio of the predicted probability and the actual probability of having insurance; therefore, it captures the residual or unobserved variables relating to the probability of being an insurance policyholder on the dependent variable of interest.
The Inverse Mill's Ratio (IMR) is then entered as a regressor into a second equation, which for the purposes of this example, focuses on understanding service utilization among the insured group only. By including this variable in the second equation, we can control for the effect on service utilization of unobserved variables related to having an insurance policy. Put another way, we can isolate the "insurance purchase bias" from the insurance effect on service utilization. This allows us to obtain unbiased estimates for the insurance effect on service utilization among individuals in the Insured Panel. In all analyses where the Insured Panel is being compared to the NLTCS, sample selection models were employed.22
For example, suppose we wished to measure the "insurance effect" on the amount of total care individuals' use. The first equation is used to predict the probability of having insurance. The second equation predicts the number of hours of total care received by the Insured Panel. As shown below, the Inverse Mills Ratio, designated as , is entered into this second equation. It controls for the unobserved variables associated with the insurance purchase decision. If the coefficient for the IMR is not significant, then the third equation is used to estimate hours of care for the entire sample. It includes a dummy variable that identifies whether or not someone has insurance.
Equation 1: The probability of Being Privately Insured
Where Z = a dichotomous variable indicating whether or not someone has insurance;
Where = vector of explanatory variables
Where = the random disturbance term
Equation 2: The Number of Total Weekly Hours of ADL/IADL Care Received (Insured Panel Only)
Where Y = is variable that measures weekly hours of ADL and IADL care for Insured Panel
Where = vector of explanatory variables that have at least one variable that is not included in equation 1.
= the Inverse Mill's Ratio (IMR)
= the random disturbance term
Equation 3: The Number of Total Weekly Hours of ADL/IADL Care Received (Total Sample)
Where Y = is variable that measures weekly hours of ADL and IADL care for the combined Insured Panel and 1994 NLTCS
Where = vector of explanatory variables that includes a dummy variable for having insurance.
= the random disturbance term.
If it turns out that the Inverse Mills Ratio is not significant, then this suggests that the propensity to purchase insurance does not affect the propensity to use more hours of care among the Insured Panel and that there is no observed insurance purchase bias. The implication is that an ordinary least squares regression model based on the entire sample -- insured and non-insured -- can include a dummy variable for insurance. The coefficient on this variable can then be interpreted as the impact of being insured on the total number of weekly hours of care (holding other variables constant).