Using National Survey Data to Analyze Children’s Health Insurance Coverage: An Assessment of Issues. 3. Sources of Error in Estimates of the Uninsured


There are a number of sources of error encountered in attempting to measure uninsurance, and these affect the comparability of estimates from different surveys. These include certain limitations inherent in measuring uninsurance as a residual, as it is usually done; the possibility that respondents may not be aware of existing coverage; the bias introduced by respondents’ imperfect recall; the sensitivity of responses to question design; and the impact of basic survey design choices.

a.Limitations Inherent in Measuring Uninsurance as a Residual

Perhaps the most significant problem with measuring uninsurance as a residual is that a small error rate in the reporting of insurance becomes a large error in the estimate of the uninsured. With the number of children insured at a point in time being eight to nine times the number without insurance, and the number ever insured during a year being 18 to 19 times the number never insured, errors in the reporting of insurance coverage are multiplied many times in their impact on estimates of the uninsured. Based on the SIPP estimates reported in Table 2, a 6 to 7 percent error in the reporting of children who ever had health insurance would double the estimated number who had no insurance. In Section 4, below, we argue that this is what accounts for the fact that the CPS estimate of the uninsured resembles an estimate of children uninsured at a point in time rather than children uninsured for the entire year, which is what the questions are designed to yield.(9)

Another implication of measuring uninsurance as a residual can be seen in the CPS estimates of the frequency of uninsurance among infants. The health insurance questions in the March CPS refer to coverage in the preceding calendar year--that is, the year ending December 31. If parents answer the CPS questions as intended, a child born after the end of the year cannot be identified as having had coverage during the previous year. With no reported coverage, such a child would be classified as uninsured. If all children born after the end of the year were classified as uninsured, this would add about one-sixth of all infants to the estimated number uninsured. Because the March CPS public use files lack a field indicating the month of birth, data users cannot identify infants born after the end of the year and cannot exclude them from their analyses. Is there any evidence that uninsurance is overstated among infants in the CPS? Table 3 addresses this question by comparing estimates of the rate of uninsurance for infants and older children, based on the March CPS and the SIPP. The CPS estimates of the proportion of infants who are uninsured are markedly higher than the SIPP estimates in both the 1993 and 1994 reference years: 11.5 versus 7.7 percent in 1993 and 17.3 versus 9.3 percent in 1994.

b.Awareness of Coverage

People may have insurance coverage without being aware that they have it. While this lack of awareness may seem improbable, both the CPS and SIPP provide direct evidence with respect to Medicaid coverage. Prior to welfare reform, families that received Aid to Families with Dependent Children (AFDC) were covered by Medicaid as well. Nevertheless, surveys that asked respondents about AFDC as well as Medicaid found that nontrivial numbers reported receiving AFDC but not being covered by Medicaid. Were such people unaware that they were covered by Medicaid, or did they know Medicaid by another name and not recognize the name(s) used in the surveys?(10)

We do not know the answer. To correct for such instances, the Census Bureau employs in both the CPS and SIPP a number of “logical imputations” or edits to reported health insurance coverage. All adult AFDC recipients and their children are assigned Medicaid coverage, for example. Of the 28.2 million people estimated to have had Medicaid coverage in 1996, based on the March 1997 CPS, 4.6 million or 16 percent had their Medicaid coverage logically imputed in this manner (Rosenbach and Lewis 1998). Most if not all of these 4.6 million would have been counted as uninsured if not for the Census Bureau’s edits. With AFDC, which accounted for half of Medicaid enrollment, being replaced by the smaller Temporary Assistance to Needy Families (TANF) program, the number of logical imputations will be reduced significantly, which could increase the number of children who in fact have Medicaid coverage but are counted in the CPS and SIPP as uninsured.(11)


Survey and Date less than 1 1 to 5 6 to 14 15 to 18 Total
CPS, March 1994 11.5 11.6 13.7 19.4 14.1
CPS, March 1995 17.3 13.2 14.0 16.5 14.4
CPS, March 1996 16.7 12.7 13.7 16.1 14.0
SIPP, September 1993 7.7 10.9 13.7 16.7 13.1
SIPP, September 1994 9.3 10.5 13.1 16.3 12.7
SOURCE: Tabulations of public use files, CPS and SIPP.

c.Recall Bias

It is well known among experienced survey researchers that respondent recall of events in the past is imperfect and that recall error grows with the length of time between the event and the present. Error also increases with the amount of change in people’s lives. Respondents with steady employment have less difficulty recalling details of their employment than do respondents with intermittent jobs and uneven hours of work. Similarly, respondents who have had continuous health insurance coverage can more easily recall their coverage history than respondents with intermittent coverage. Obtaining accurate reports from respondents with complex histories places demands upon the designers of surveys and those who conduct the interviews. Panel surveys that ascertain health insurance coverage (and other information) with repeated interviews covering short reference periods are much more likely to obtain reliable estimates of coverage over time than one-time surveys that ask respondents to recall the details of the past year or more.

d.Sensitivity to Question Design

Even when recall is not an issue, when insurance coverage is measured “at the present time,” survey questions that appear to request more or less the same information can generate markedly different responses. This point was demonstrated in dramatic fashion when the Census Bureau introduced some experimental questions into the CPS to measure current health insurance coverage. At the end of the sequence of questions used to measure insurance coverage during the preceding year, respondents were asked:

These next questions are about your CURRENT health insurance coverage, that is, health coverage last week. (Were you/Was anyone in this household) covered by ANY type of health insurance plan last week?

Those who answered in the affirmative were asked to identify who in the household was covered and then, for each such person, by what types of plans he or she was covered. This sequence of questions, which first appeared in the March 1994 survey, yielded an uninsured rate that was about double the rate measured by the NHIS and the SIPP, and the experimental questions were discontinued with the March 1998 supplement.

Even if these questions had not followed a lengthy sequence of items asking about several sources of coverage in the preceding year, it would have been difficult to imagine that they could have generated such low estimates of coverage. That they did so despite the questions that preceded them is hard to fathom, and it underscores the point that researchers cannot simply write out a set of health insurance coverage questions and expect to obtain the true measure of uninsurance--or even a good measure of uninsurance, necessarily. It is not at all clear why this should be so. Health insurance coverage appears to be straightforward enough. Generally, people either have it or they don’t. Yet the Census Bureau’s experience sends a powerful message that questions about health insurance coverage can yield rather unanticipated results. Researchers who are fielding surveys that attempt to measure health insurance coverage would be well-advised to be wary of constructing new questions unless they can also conduct very extensive pretesting. In the absence of thorough testing, it is better to borrow from existing and thoroughly tested question sets rather than construct new questions from scratch.

e.Impact of Survey Design and Implementation

While perhaps not as important as question wording, differences in the design and implementation of surveys can have a major impact on estimates of the uninsured. These differences include the choice of universe and the level of coverage achieved, the response rate among eligible households, the use of proxy respondents, the choice of interview mode, and the use of imputation.

Universe and Coverage. Surveys may differ in the universes that they sample and in how fully they cover these universes. Typically, surveys of the U.S. resident population exclude the homeless, the institutionalized population--that is, residents of nursing homes, mental hospitals, and correctional institutions, primarily--and members of the Armed Forces living in barracks. There may be other exclusions as well. For example, household surveys do not always include Alaska and Hawaii in their sampling frames.

All surveys--even the decennial census--suffer from undercoverage; that is, parts of the universe are unintentionally excluded from representation in the sample. In a household-based or “area frame” sample, undercoverage can be attributed to three principal causes: (1) failure to identify all street addresses in the sample area, (2) failure to identify all housing units within the listed addresses, and (3) failure to identify all household members within the sampled housing units. Nonresponse, discussed below, is not undercoverage, although the absence of household listings for nonresponding households can contribute to coverage errors (in either direction). The 1990 census undercounted U.S. residents by about 1.6 percent.(12) Sample surveys have much greater undercoverage. The Census Bureau has estimated the undercoverage of the civilian noninstitutionalized population in the monthly CPS to be about 8 percent in recent years. Undercoverage varies by demographic group. For children under 15, undercoverage is closer to 7 percent than to 8 percent. But among older teens it approaches 13 percent, and for black males within this group the rate of undercoverage reaches 25 to 30 percent.

To provide at least a nominal correction for undercoverage, the Census Bureau and other agencies or organizations adjust the sample weights so that they reproduce selected population totals. These population totals or “controls” may even incorporate adjustments for the census undercount.(13) This “post-stratification,” a statistical operation that serves other purposes as well, is based on a limited set of demographic characteristics--age, sex, race and Hispanic origin, typically, and sometimes state.(14) Other characteristics measured in the surveys are affected by this post-stratification to the extent that they covary with demographic characteristics. We know, for example, that Medicaid enrollment and uninsurance vary quite substantially by age, race, and Hispanic origin, so a coverage adjustment based on these demographic characteristics will improve the estimates of Medicaid enrollment and uninsurance. To the extent that people who are missing from the sampling frame differ from the covered population even within these demographic groups, however, the coverage adjustment will compensate only partially for the effects of undercoverage on the final estimates. It is quite plausible, for example, that the Hispanic children who are missed by the CPS have an even higher rate of uninsurance than those who are interviewed. We would suggest, therefore, that survey undercoverage, even with a demographic adjustment to population totals corrected for census undercount, contributes to underestimation of uninsured children.

Response Rate. Surveys differ in the fraction of their samples that they succeed in interviewing. Federal government survey agencies appear to enjoy a premium in this regard. The Census Bureau, which conducts both the CPS and the SIPP and carries out the field operations for the NHIS, reports the highest response rates among the surveys that provide our principal measures of health insurance coverage. For the 1997 March supplement to the CPS, the Census Bureau reported a response rate of 84 percent.(15) For the first interview of the 1992 SIPP panel the Bureau achieved a response rate of 91 percent, with the cumulative response rate falling to 74 percent by the ninth interview. The 1995 NHIS response rate for households that were eligible for selection into the MEPS was 94 percent (Cohen 1997). In contrast to these , MPR obtained a 65 percent response rate for the CTS, and Westat achieved a comparable percentage for the NSAF, which includes a substantial oversampling of lower income households. For the first round of the MEPS, Westat secured an 83 percent response rate among the 94 percent of eligible households that responded to the NHIS in the second and third quarters of 1995, yielding a joint response rate of 78 percent (Cohen 1997). These response rates are based on people with whom interviews were completed, but there may have been additional nonresponse to individual items in the health insurance sequence. However, unlike more sensitive items, like those pertaining to income, health insurance questions do not appear to generate much item nonresponse.

The reported response rates also do not include undercoverage, which varies somewhat from survey to survey. Arguably, people who were omitted from the sampling frame never had an opportunity to respond and, therefore, may have less in common with those who refused to be interviewed than they do with respondents. Nevertheless, their absence from the collected data represents a potential source of bias and one for which some adjustment is desirable. Generally speaking, however, less is known about the characteristics of people omitted from the sampling frame than about those who were included in the sampling frame but could not be interviewed. Hence the adjustments for undercoverage, when they are carried out, tend to be based on more limited characteristics than the adjustments for nonresponse among sampled households.

How important is nonresponse as a source of bias in estimates of health insurance coverage? We are not aware of any information with which it is possible to address that question. Certainly the nearly 30 percent difference in response rates between the NHIS and the CTS or NSAF could have a marked impact on the estimated frequency of a characteristic (uninsurance) that occurs among less than 15 percent of all children, but we have no direct evidence that it does.

Proxy Respondents. Some members of a household may not be present when the household is interviewed. Surveys differ in whether and how readily they allow other household members to serve as “proxy” respondents. From the standpoint of data quality, the drawback of a proxy respondent is the increased likelihood that information will be misreported or that some information will not be reported at all. This is particularly true when the respondent and proxy are not members of the same family. For this reason some surveys restrict proxy respondents to family members. Ultimately, however, some responses are generally better than none, so it is rare that a survey will rule out particular types of proxy responses entirely. Rather, proxy responses may be limited to “last resort” situations--that is, as alternatives to closing out cases as unit nonrespondents. For this reason, it is important to compare not only how surveys differ with respect to their stated policies on proxy respondents but the actual frequency with which proxy respondents are used and the frequency with which household members are reported as missing.

Children represent a special case. While all the surveys we have discussed collect data on children, the surveys differ with respect to whether these children are treated as respondents per se or merely other members of the family or household, about whom information is collected only or largely indirectly. For example, both the CPS and SIPP define respondents as all household members 15 and older. Some information, such as income, is not collected for younger children at all while health insurance coverage is collected through questions that ask respondents who else in the household is included under specific plans. With this indirect approach, children are more susceptible to being missed.

Mode: Telephone Versus In-person. Surveys may be conducted largely or entirely by telephone or largely or entirely in-person.(16) There are two aspects of the survey mode that are important to recognize. The first bears on population coverage while the second pertains to how the data are collected.

Pure telephone surveys, which are limited to households with telephones, cover a biased subset of the universe that is covered by in-person surveys. Methodologies have been developed to adjust such surveys for their noncoverage of households that were without telephone service during the survey period. These methodologies use the responses from households that report having had their telephone service interrupted during some previous number of months to compensate for the exclusion of households that had no opportunity to appear in the sample. How effectively such adjustments substitute for actually including households without telephones is likely to vary across the characteristics being measured, and for this reason some telephone surveys include a complementary in-person sample to obtain responses from households without telephones.(17)

In addition to the coverage issue, distinguishing telephone from in-person interviews is important because the use of one mode versus the other can affect the way in which information is collected and the reliability with which responses are reported. Telephone surveys preclude showing a respondent any printed material during the interview (such as lists of health insurance providers), and they limit the rapport that can develop between an interviewer and a respondent. Furthermore, the longer the interview, the more difficult it is to maintain the respondent’s attention on the telephone, so data quality in long interviews may suffer. On the other hand, conducting interviews by telephone may limit interviewer bias and make respondents feel less uncomfortable about reporting personal information. Moreover, until recently, telephone interviewing allowed for the use of computer-based survey instruments that could minimize the risk of interviewer error in administering instruments with complex branching and skip patterns. For all of these reasons, survey researchers recognize that there can be “mode effects” on responses. The different modes may elicit different mean responses to the same questions, with neither mode being consistently more reliable than the other. To minimize differential mode effects when part of a telephone survey is conducted in person, survey organizations sometimes conduct the in-person interviews by cellular telephone, which field representatives loan to the respondents.

Panel surveys allow for another possibility: using a household-based sample design and conducting at least the initial interview in-person but using the telephone for subsequent interviews. Both the CPS and the SIPP have utilized this approach. In the CPS, the first and last of the eight interviews are conducted in person while the middle six are generally conducted by telephone. For any given month, then, about one-quarter of the interviews are conducted in person.(18)

The recent introduction of computer-assisted personal interviewing (CAPI) has created an important variation on the in-person mode and one with its own mode effects. In some respects, CAPI may be more like computer-assisted telephone interviewing than in-person interviewing with a paper and pencil instrument. The methodology is too new to have generated much information on its mode effects yet.

Imputation Methodology. Surveys differ in the extent to which they impute values to questions with missing responses and in the rigorousness of their imputation methodologies. For example, both the CPS and SIPP impute all missing responses, and they use methodologies that have been developed to do this very efficiently. For the SIPP imputation algorithms, over time the Census Bureau has made increasing use of the responses reported in adjacent waves of the survey. Generally, questions about health insurance coverage elicit very little nonresponse, so imputation strategies are less important than they are for more sensitive items, such as income. Nevertheless, in the March 1997 CPS, the Census Bureau imputed 10 percent of the “reported” Medicaid participants (Rosenbach and Lewis 1999).(19) In the NHIS, responses of “don’t know” are not replaced by imputed values, and in published tabulations the insurance coverage of people whose coverage cannot be determined is treated as unknown. While this may not have a large impact on the estimated rates of uninsurance among children or adults, this strategy does make it more difficult for data users to replicate published results.