In national time-series models, regression analysis is used to estimate a relationship between some measure of program participation (e.g., the average caseload) in the entire country over some time interval (usually years or quarters) to a set of explanatory variables that represent or serve as proxies for hypothesized determinants of growth.
The Congressional Budget Office model (CBO, 1993) is the most recent national time series model that we have found.(1)We describe this model in some detail below and use it to provide context for a discussion of the strengths and weaknesses of this methodology. We then go on to describe briefly one earlier national time-series model.
The CBO model uses quarterly data for the period from 1973.1 through 1991.3 (73 quarters). Separate models are estimated for the Basic and UP caseloads.(2) For each model, the dependent variable is the average monthly caseload over the three months of the quarter. specifies the caseload measure as a linear function of a set of explanatory variables and a disturbance term. The disturbance term represents all determinants of the caseload that are not captured by the explanatory variables, and is assumed to change slowly over time.(3)
Explanatory variables include: a measure of female-headed families, used in the Basic equation only;(4) the "employment gap" -- the difference between actual and potential employment; a measure of the real average (across states) maximum AFDC benefits for a family of three; average earnings for year-round, full-time workers aged 18 to 24 with exactly 12 years of schooling (a female series is used for the Basic model and a male series is used for the UP model); three dummy variables to capture the transitional and permanent effects of program changes enacted under OBRA81; and three quarterly dummies to capture seasonal caseload variation. The real earnings data are annual data that have been converted to quarterly data by interpolation. Both current and lagged values of the employment gap variable are included in the equations; the first through third lagged values appear in the Basic equation and the first through fifth lagged values appear in the UP equation.
Almost all of the coefficients reported are statistically significant, and the few that are not have the expected sign. The most significant variables (i.e., those with highest t-statistic for their coefficient) in the Basic equation are the measure of female-headed families, and two of the three OBRA81 dummies. The four employment gap variables (current and three lags) are collectively very significant as well. In the UP equation, the first and second quarter dummies have the most significant individual coefficients. Collectively, the six employment gap variables (current and five lags) also stand out as especially significant.
For purposes of comparison with other studies, we used the CBO results and ancillary data reported by CBO to compute the estimated effect of a one percentage point change in the unemployment rate on the Basic and UP caseloads. Because the employment gap, rather than the unemployment rate, is included in the CBO model, it was first necessary to investigate the relationship between these two variables. Based on data reported in Figure 6 of CBO (1993), it appears that a one percentage point change in the unemployment rate is approximately equivalent to a one percentage point change in the employment gap variable. Given this, and assuming that the initial unemployment rate is five percent, the CBO estimates imply that a one percentage point increase in the unemployment rate increases the Basic caseload by 1.7 percent after four quarters and increases the UP caseload by 9.7 percent after six quarters.
There are numerous advantages of the national time-series methodology relative to others, but there are also significant disadvantages. We use the CBO study to illustrate some of the advantages and disadvantages, but they apply to other national time-series studies as well. The advantages include:
It is methodologically simpler than most other approaches. This makes it relatively easy to apply, and the results are relatively easy to describe.
National data for explanatory variables are readily available from published sources. Further, national explanatory variables are available at a level of specificity and measured with a degree of accuracy that is not available for smaller geographic units such as states. Most of the data used by CBO come from readily available published sources.
The researcher can examine the dynamics of the relationship between explanatory variables and participation, as CBO does by including both current and lagged values of the unemployment rate. This cannot be done with a pure cross-section approach, which uses data from only a single point in time.
In a statistical sense, the researcher is able to "explain" a very high percentage of the variation in participation over time (high adjusted R-square) and to produce simulated participation series that closely track the actual series. This is true of CBO's models -- the adjusted R-squares are 99.6 percent and 99.5 percent for the Basic and UP equations, respectively, and sample-period simulations track the actual series well. This occurs because time-series data -- especially aggregate data over large populations -- tend to be very highly correlated with one another. With a small to moderate size sample, such as CBO's, it is fairly easy to find a small set of explanatory variables that can achieve a good fit. There is, however, a negative aspect to this advantage, which we return to later.
The disadvantages of the national time-series approach include:
It can capture state-level changes in AFDC programs only in a very crude way -- through their impact on program variables that are aggregated across states, such as CBO's AFDC average maximum benefit for a family of three. The authors of the CBO report note that this variable may be endogenous: as caseloads increase, states may cut back maximum benefits for budgetary reasons (CBO, 1993, p.39).
The method's ability to distinguish between the effects of a substantial number of variables is limited by high correlations among explanatory variables that are typical of time-series data, and by limited observations. For instance, it is difficult to be confident that the impact of an increase in unemployment ends after three quarters for Basic and five quarters for UP given the likely high correlations among the various lags of the unemployment rate and the relatively short time-series available for estimating CBO's model. The authors mention this as a specific problem with their model (CBO, 1993, p. 12).
Major national-level changes in the program are difficult to disentangle from the effects of other variables because they can usually only be modeled in a very crude way -- such as the three dummy variables that CBO used to capture the impact of OBRA81. The authors of the CBO report mention this as another specific problem with their model (CBO, 1993, p.12). This same problem would arise in modeling the impact of the 1988 Family Support Act (FSA) using more recent data. Both OBRA81 and the FSA may be viewed as changes in "regime," and it could be that coefficients of other variables also changed with the regime shift. When regime shifts occur as frequently as they perhaps have for AFDC, time-series data alone are likely to be inadequate for testing whether other coefficients did change. For instance, using national time-series data we would be unlikely to determine whether the mandating of the UP program as of October 1990, under the FSA, had an impact on the models other than a one-time shift in UP participation.
There is a serious danger that the researcher will end up with a model that fits the data well (very high adjusted R-square and the simulated series tracks the actual series closely), but that the model coefficients misrepresent the causal relationship between the explanatory variables and participation; further, out-of-sample predictions may be very poor. The reason is that the high correlations found in time-series data, combined with a fairly small sample size, make it relatively easy to get a good fit by trying a variety of different specifications or by constructing an explanatory variable that seems to track participation well but that does not have a clear theoretical rationale. CBO's measure of female-headed households (FHH) may be such a variable.
The last point merits further discussion. According to the report, the FHH variable is defined as the "number of families headed by women with their own children under age 18, multiplied by the ratio of never-married mothers to mothers who had been married." This variable was developed after an attempt to include separate variables for families headed by never-married mothers and by ever-married mothers led to nonsensical results, evidently because of multicollinearity between the two variables (CBO, 1993, p. 14, fn. 17). Although the particular functional form used to aggregate the variables does not have an apparent theoretical rationale, the path of the variable has an upturn that coincides with the upturn in Basic caseload growth.
While the variable is critical to the fit of the model, the authors find that its coefficient is implausibly large; the coefficient implies that 80 percent of all new female-headed households move onto AFDC (CBO, 1993, p. 18). The authors suggest that some of the growth attributed to this variable may be due to omitted factors, such as the Immigration Reform and Control Act of 1986 and Medicaid outreach.
Other than earlier versions of the CBO model, the next most recent national time-series model we have found comes from ASPE's last effort to model AFDC caseloads (Grossman, 1985). This effort is notable because the author also explored the development of a pooled, state-level model, and contrasted the findings to the findings from a national time-series model. We describe the national model here, and return to the state-level model in Section D, below. The national model also provides additional examples of the advantages and disadvantages of the national time-series approach.
The Grossman model is estimated with quarterly data for the period from 1974.4 through 1983.4 (37 quarters). The Basic and UP programs are modeled separately. Two equations are estimated for each program: a caseload equation and an average benefits per case equation. The primary purpose of this effort was to improve national forecasts of AFDC expenditures for each program (the product of the program's case-load and average expenditure per case).
As in the CBO model, the caseload measure for each program is the average caseload over the three months of the quarter. The average benefit variable is the average of the monthly values for benefits per case over the same three months.
Explanatory variables in the Basic caseload equation include: the number of female-headed households, the poverty rate for families, the real average hourly wage rate in retail and service industries, the average standard of need for a family of three, the lagged unemployment rate (with lags for four quarters), quarterly dummies, and three dummies to capture the transitional and final impacts of OBRA81.(5) The explanatory variables in the UP caseload equation are the same except that the labor force in UP states replaces the number of female-headed households and the fifth lag of the unemployment rate is added. A first-order autoregressive disturbance is specified for each model.
Note that the CBO (1993) caseload specifications are very similar to Grossman's specifications in many respects. The variables with the most significant coefficients are the OBRA81 dummies and the female-headed household variable. The only variable with a very insignificant coefficient (t-statistic less than 1.0) in both equations is the poverty rate. As with the CBO model, adjusted R-squares are very high: 97.5 percent for the Basic caseload and 99.0 percent for the UP caseload.
The implied long-run elasticities for the Basic and UP caseloads with respect to the unemployment rate are very similar to those calculated from CBO's findings: 0.1 for the Basic caseload (identical to the value we found for CBO) and 0.7 for the UP case-load (compared to 0.5 based on CBO's findings), despite the fact that only about half of CBO's sample period was used by Grossman.(6) Thus, it appears from the national time-series estimates that the impact of a change in the unemployment rate on caseloads has been reasonably stable over a long period of time.
The average benefit equation for the Basic program includes the following explanatory variables: the weighted average of the maximum benefit for a family of four, an estimate of the average number of persons per family in the United States, and dummy variables for OBRA81. The average benefit equation for the UP program also includes the maximum benefit variable and an OBRA dummy, but does not include the average family size variable. The adjusted R-squares for both equations are very high: 99.4 percent for the Basic equation and 98.6 percent for the UP equation. The maximum benefit variable has a very large t-statistic in both equations, and accounts for most of the models' explanatory power.
We note that Grossman did not include the unemployment rate in the average benefit model. Holding the maximum benefit constant, increases in unemployment may be associated with reductions in earnings among existing recipients, which would increase average benefits. Earnings and other benefit determining characteristics of cases that enter the caseload due to an increase in unemployment may differ from those of existing recipients. This could offset or add to the hypothesized positive effect of unemployment on average benefits for existing cases.