CHAPTER THREE

METHODOLOGY

A. INTRODUCTION

The methodology we use is technically described as "pooled cross-section time-series analysis" because it "pools" time-series data for a cross-section of individual states. We estimated the models using quarterly data for 51 "states" (including the District of Columbia) from 1979.4 through 1994.3 .

The methodology is described in generic terms here. Plans for the specifications of the dependent and explanatory variables are discussed in Chapter Four. We describe the econometric model in Section B. In Section C, we discuss the methodology we use to control for changes in the age distribution of the population. The construction of quarterly time series for explanatory variables that are only observed annually is discussed in Section D. The estimation methodology is discussed in Section E, and the simulation techniques are described in Section F.

B. ECONOMETRIC MODELS

We estimate separate Basic and UP models for caseloads, recipients, and child recipients. In addition, we estimate an average monthly benefit (AMB) equation for the combined programs. All equations are methodologically identical, differing only in the specification of their variables. In this section we first describe the "generic" structure of the equations.

Each equation estimated has the following general form:

Equation 3.1: Dln(Yst) = a1 Z1st + ... +a JZJst + b1 DX1st + ... + bKDXKst

+ g2Q2t + ... + g4Q4t + d80T80t + ... + d94T94t + est

where:

The model is specified in changes in order to eliminate state "fixed effects" -- factors that vary across states, but not over time. "Time," or "year" effects are captured by the time dummies. Hence, we are essentially relying on covariation in changes across states to estimate the model's parameters. Purely cross-sectional covariation is not used at all, and use of purely time-series covariation to estimate parameters other than the year and quarterly dummy coefficients is minimal.(1)

We specify the dependent variable as a change in the logarithm because we expect the effects of changes in the explanatory variables to be proportional to the size of the caseload, rather than independent of the caseload size. For instance, a change in the maximum monthly benefit and a change in the unemployment rate are both expected to have an impact on caseload size that is proportional to the size of the caseload -- the more cases, or potential cases, that may be affected, the larger is the effect.

Some continuous explanatory variables are also in logarithms; for such variables the corresponding coefficient is an elasticity -- the percent change in the dependent variable per percent change in the independent variable. Those not in logarithms are all rates of some sort (e.g., the average benefit reduction rate).(2)

Continuous variables are specified in changes in most cases. Some "flow" variables -- representing potential case openings (e.g., immigrants) or closings -- are specified in levels. (3) The regression disturbance is assumed to follow a first-order autoregressive process, with different parameters in each state. Formally:

Equation 3.2: est = rsest-1 + vst,

where rs is the autocorrelation coefficient for state s (-1 < rs < 1), and vst is a random variable that is independent over time, but not across states. We assume that the variances and contemporaneous covariances of the vst are constant over time. This model is sometimes referred to as the "Parks" model (Parks, 1967).(4)

The Parks methodology has an important limitation -- the number of explanatory variables can be no larger than the number of time-series observations.(5) This prevented us from using the methodology for estimating models within subperiods and testing for stability across subperiods. (6) We did, however, estimate an UP model for all states using just the sub-period following the federal mandate, implemented in the fourth quarter of 1991, using an alternative specification. We assumed in this case that contemporaneous cross-state covariances of the disturbances are zero and that the autocorrelation coefficient is the same for all states. We also weighted each observation by the state's population.(7) To assess the influence of the alternative estimation methodology on the findings, we re-estimated our final caseload models for the full period via this method and compared the results to the Park estimates.

C. MODELING CHANGES IN THE AGE DISTRIBUTION OF THE POPULATION

The age distribution of the population has changed substantially during the period under examination as the "baby boom" generation has aged. Since the proportion of households on AFDC varies by the age of household head, we developed variables to capture the effects of population age distribution changes on AFDC participation. These variables also capture the effects of population growth. For each program (Basic and UP), we developed three "expected participation" variables -- one each for caseloads, total recipients, and child recipients. Construction of these variables is discussed in Section C.1, below.

A second, related, issue is that some explanatory variables in the model can be expected to change with changes in the age distribution (e.g., the unemployment rate), but such changes would not be expected to have an impact on AFDC participation -- although they might be associated with changes in participation because AFDC participation is affected by changes in the age distribution. When feasible, we adjusted such variables to remove the effect of age distribution changes. We describe the construction of the adjusted variables in Section C.2.

1. "Expected" Participation

The value of the expected participation variable for a specific State and quarter is the level of participation we would expect if age-specific participation rates were the same as national monthly average age-specific participation rates in 1990. We chose 1990 as the base year because the national age-specific data needed to construct the variable is better in that year than in others, due to the Decennial Census.

For the Basic caseload, the age-specific participation rate is defined as the number households in the Basic program headed by women in the age group divided by the number of women in the age group. Age-specific participation rates for the UP caseload are defined analogously, but using the number of households in the UP caseload, the age of the adult male in the household, and male population data. For expected recipients and expected child recipients in each program, we used the same scheme to classify households into age groups. The age-specific participation rate for recipients is the number of recipients in households in the age category divided by the number of women (Basic) or men (UP) in the age group, and the age-specific participation rate for child recipients is defined in the same way, but only including children. National age-specific participation rates for 1990 were estimated using the 1990 Survey of Income and Program Participation (see Chapter Four).

We constructed annual expected participation variables for each state by computing a weighted sum of the 1990 national age-specific participation rates, with the weight for each age group equal to the State's population of the relevant sex in the age category in the current year:

Equation 3.3: A*st = Sa Aa90 Past

where A*st is "expected" AFDC participation (i.e., expected caseload, recipients, or child recipients in one of the programs) in State s and year t, Aa90 is the 1990 national AFDC participation rate in age group a, and Past is the size of the population of the relevant sex in State s and year t that is in age group a. The final step was to convert the annual series to quarterly series, using the methodology discussed in Section E, below.

The change in the logarithm of each expected participation variable is used as an explanatory variable in the relevant participation equation. The coefficient of this variable can be interpreted as the percent change in the caseload associated with a one percent increase in the size of the population, holding the age distribution of the population and other explanatory variables constant. Hence, we would expect them to be close to one. In the initial models we estimated the coefficients of these variables were very significant, but not significantly different from one. In the models reported here, we have constrained the coefficient of these variables to be one.

2. Age-adjusted Explanatory Variables

Each age-adjusted variable is the logarithm of the ratio of the unadjusted variable to its "expected" value. Expected values are computed analogously to the computation of expected participation: they are a weighted average of age-specific national values for 1990, with the weight for each age group equal to the share of the State's population in the age category in the current year. Mathematically, the index variable (Xst ) is defined by:

Equation 3.4: Xst = ln(Wst / W*st)

where Wst is the unadjusted variable, and W*st is the "expected" value of the variable, defined as:

Equation 3.5: W*st = Sa Wa90 Past/Pst

where Wa90 is the 1990 national value of the variable for age group a and Pst is the total population of State s in period t. Thus, for example, the unemployment rate index for State A in 1982 is the log of the actual rate divided by the rate we expect given the age distribution of State A's working age population in 1982 and national unemployment rates by age in 1990. Annual figures were converted to quarterly series as described in Section E, below.

The complexity of the construction of each age-adjusted variable may diminish the ability of policy makers and others to understand and use the findings. The interpretation of the results and their potential use are not as difficult as they may first appear, however, and the results may be substantially more useful if adjusted variables are used than if unadjusted ones are used.

To illustrate, consider the logarithm of the age-adjusted unemployment rate as it would appear in a typical model:

Equation 3.6: ln(Ast) = ... + bu ln(ust/ust*) + ... = ... + bu ln(ust) - bu ln(ust*) + ....

where ust is the unemployment rate and ust* is the "expected" rate, defined as in Equation 3.5. It is apparent from the second representation of the unemployment term above that the coefficient of the age-adjusted variable can be interpreted as the elasticity of the caseload with respect to unemployment holding the other explanatory variables constant, and provided that the unemployment rate change is not due to a change in expected unemployment -- i.e., not due to a change in the age distribution of the population. This is no different than what the interpretation would be if we used the logarithm of the unadjusted unemployment rate as the explanatory variable, except in that case the interpretation would apply to changes due to changes in the age distribution of the population as well as any others. This also illustrates the reason for making an age adjustment: we would not expect a change in unemployment that is due to a change in the age distribution of the population to have the same impact on AFDC participation as a change that is due to the business cycle. In fact, we would expect it to have no effect other than the effect that is accounted for by the expected participation variables.

Continuing the illustration, contingency loans to States, intended to help them finance their AFDC payments during a recession, could be tied to the unemployment rate, with the maximum loan amount related to the gap between the unemployment rate and some "standard" unemployment rate. Some specific value for the unemployment rate would be the simplest choice for the standard rate, but the fact that unemployment rates vary because of changes in the age distribution of the population, not just because of the business cycle, means that the "standard" that would be appropriate for a given age distribution would be inappropriate for another one. The expected unemployment rate could be used as the standard instead, thereby recognizing the effect of a change in the age distribution of the population on the unemployment rate. Under the latter system, the maximum loan amount would be insulated from changes in the unemployment rate that are caused by changes in the age distribution of the population rather than by the business cycle.

D. MODELING DELAYED IMPACTS

There are strong reasons to believe that the impact of changes in many determinants of AFDC participation are delayed. The most prominent example is the unemployment rate; substantial evidence already exists that increases in the unemployment rate have their full impact on participation only after several quarters have passed (see Lewin-VHI, 1995a).

The simplest way to capture delayed impacts of a specific explanatory variable is to include "lagged" (i.e., previous period) values of the variable as separate explanatory variables. For instance, in all models we include the current quarter's unemployment rate, the previous quarter's rate ("first lag"), the rate from the second previous quarter ("second lag"), etc., for as many as nine quarters. If DlnUst-l is the change in the log of the age-adjusted unemployment rate lagged l periods, and bl is its coefficient, the unemployment rate specification can be represented as:

Equation 3.7: Dln(Yst) = ... + b0DlnUst + b1DlnUst-1 + ... bl DlnUst-l + ... bLDlnUst-L+ ....

where L is the longest lag length. The sum of the coefficients of the current and lagged coefficients is the total, or long-run, elasticity of a permanent increase in the age-adjusted unemployment rate.(8)

E. CONVERTING ANNUAL SERIES TO QUARTERLY SERIES

Some of the series to be used in the analysis are available annually only. In order to use them, we created quarterly series that fit the annual series exactly and that also follow a smooth pattern across quarters in each year. An example is a State's population in a specific age group. For most of the annual series, we utilized a method that first fits a smooth curve called a "cubic spline" to the annual series and then uses the fitted curve to generate quarterly series.(9) In some instances, however, this method produced unreasonable quarterly values. When this was the case, we used a method that first fits a linear spline to the annual series and then uses the fitted curve to generate the quarterly series.(10) The exact method applied to each series depends on the nature of the original series -- stock data are treated differently than flow data and "end-of-year" series are treated differently than "annual averages."(11) The fact that we had to estimate quarterly values from annual series implies that the quarterly values are measured with error. This specification presumably biases coefficients toward zero.

F. SIMULATIONS

We use the regression estimates to simulate the estimated effects of all explanatory variables, and selected subsets of the explanatory variables, on historical growth in AFDC participation over various subperiods of our sample.

For each simulation, we first calculate the change in the log of participation from the first quarter in the period to the last that is explained by the changes in the relevant explanatory variables by state. This change is divided by the number of quarters in the period and multiplied by four to get an estimate of the average annual rate of change due to the set of explanatory variables in the state over the period. We report these results directly for selected states.

To obtain a national average rate of change due to the explanatory variables, we compute the weighted average of the state changes using weights that are proportional to average participation in each state over the entire period.

1. If we had included a dummy variable for each quarter of the entire period, we would have eliminated entirely the role of purely time-series covariation in determining the coefficients. Our quarterly and yearly dummies are more restrictive than such a specification, but we think the difference is inconsequential. We also include separate dummies for OBRA81 and DEFRA84 in the reported results, to capture any nationwide effects of the implementation of those laws that was missed by the year and quarterly dummies. We also tried dummies for implementation of other legislation, but the estimated coefficients were not at all significant.

2. Grossman (1985) provides an example of a State model in which the dependent variable, caseloads, is in the levels. One of the explanatory variables in the model is a dummy for OBRA-81 implementation. The coefficient of this dummy represents the estimated effect of implementation on the level of the caseload in each State under the implicit assumption that the effect on the level is the same in all States. It would be more reasonable to assume that the effect in each State would be proportional to the size of the caseload -- the assumption implicit if the dependent variable were in logarithms.

3. We initially specified vital statistic variables -- marriages, out-of-wedlock births, and divorces -- in levels but switched to a change specification after finding that the latter had substantially more predictive power.

4. The autocorrelation parameter for each state could be negative because the dependent variable is a first difference. A value of zero in the first-difference specification corresponds to a value of one in a levels specifications. We estimated the model using the SAS/ETS, Release 6.10,procedure TSCSREG. We adjusted the standard errors and t-statistics obtained from SAS because of an error in the program that was confirmed by the SAS Institute. The standard errors reported by SAS were multiplied by [T/(T-P)].5, where T is the number of quarters in the sample period and P is the number of explanatory variables in the equation, and the t-statistics were divided by the same factor.

5. This limitation can be solved by imposing more structure on the variance and covariance parameters in the model, but this could not be implemented with TSCSREG.

6. For the same reason, we did not test for fixed state effects as originally planned. This would have required specifying the model in levels rather than changes and including 51 state dummies for the fixed effects, which was not possible given the length of the time series. Based on our earlier work concerning participation in SSA's disability programs, however, we were confident that we would not have rejected the fixed effect model.

7. The estimation was performed using the SAS procedure MODEL.

8. We initially planned to use polynomial distributed lags to impose some structure on the coefficients of the lagged unemployment rate variables because we expected collinearity among the lagged values to result in erratic patterns of the estimated coefficients. Collinearity was not, however, a serious problem, even with as many as nine lags.

9. The cubic spline specification requires each adjacent pair of annual observations to fall on a cubic function of time. The cubic function can be different for each adjacent pair, but the first and second derivatives of the two functions passing through a specific year's value are constrained to be the same.

10. The linear spline specification fits a continuous curve to the data by connecting successive input values with straight line segments. This method was used to produce the following quarterly series: IMMGTOTL, IRCA, MEDGAIN, MEDFAM3, SSIKIDS, and ZEBLEY. These variables are defined in Chapter 4.

11. This method is implemented using the EXPAND procedure in the Economic Time Series (ETS) component of SAS (see SAS/ETS Users Guide, 1993).