
Description of Variables

This section describes the creation of variables used in the analysis of this report. We group the variables into four categories: adult outcome variables, youth risky behavior variables, family environment variables, and other explanatory variables of adult outcomes.
Because of the longitudinal nature of the data, many variables were constructed using multiple waves of data from the NLSY79. Means and sample sizes for each variable used in the analysis are presented at the end of this section.


Adult Outcome Variables

The adult outcomes have four domains with a total of 10 measures.
The health domain includes two measures: alcohol abuse or dependence around age 30; drug use (marijuana or cocaine) in the past month around age 30.
A series of questions, asked during the 19821985, 198889, 1992, and 1994 surveys, elicited information on the development of drinking patterns, quantity of various alcoholic beverage consumed, frequency of use, impact of consumption on schoolwork and/or job performance, and types of physiological and behavioral dependency symptoms. Although types and wording of alcohol abuse or dependence questions vary over the years, the items in the1989 and 1994 surveys are almost identical. We use 1989 data for those born in 19571960 and the 1994 data for those born in 19611964. The respondents were between ages 2932 during these years. Diagnosis of current alcohol abuse and dependence were derived using a virtually identical set of 29 symptomitem questions in the1989 and 1994 surveys designed to operationalize the DSMIIIR abuse and dependence. The creation of the measure of alcohol abuse or dependence follows a set of complicated criteria provided by Harford, and Grant (1994). Alcohol abuse and alcohol dependence are different measures of alcohol problems. We combine them to form a single variable that takes the value of one if either condition is met.
Questions on substance use were included in the 1984, 1988, 1992, 1994, and 1998 surveys. Among other usage information collected, these surveys collected information on the most recent use of marijuana and cocaine, from which the measure of pastmonth drug use was created. The 1988 data were used for those born in 19571960 and the 1994 data were used for those born in 19611964. Respondents were ages 2832 in these years. Pastmonth drug use measures whether a respondent used marijuana or cocaine in the past month. We chose not to use a measure of drug use in the past year, another common measure, in order to reduce the likelihood of measuring occasional recreational use. We implicitly assume that adults who used these drugs in the past month are more likely to be regular drug users.
The economic domain has six measures: ever under the poverty line between ages 2529; number of years in poverty between the ages of 2529; ever on welfare (AFDC/TANF or food stamps) between ages 2133; number of years on welfare between ages 2133; percent of employable time spent employed between the end of formal schooling and age 33; and age when found a "steady" job, i.e. working at least 2 years for a single employer since leaving formal schooling.
Variables have been created in the NLSY database for each survey year (19791998) that indicate whether or not a respondent's total family income for the past calendar year was above or below the poverty level for a given family size. Our two poverty variables were created based on these variables from 19791994. The interview frequency of the survey changed to biennial after 1994 and the poverty information was not available for the missing years. As a result, the oldest we could observe all individuals is age 29. We chose 25 as the starting age to avoid classifying respondents in college as in poverty. If a respondent was in poverty in any of the years when he/she was 2529 years old, a value of "1" was assigned to the respondent for the measure of ever under the poverty line between ages 2529; otherwise, a value of "0" was assigned.
The "income" section of each year's questionnaire collects information on amounts and time periods during which cash and noncash benefits were received from various sources of public assistance. The universe and type of data collected varies across survey years. This report focuses on AFDC/TANF and food stamps, since they are the major components of public assistance and were consistently collected over all survey years. Unlike poverty, information on welfare was collected using event histories and therefore information is available for every year from 19781997 even if the survey was not fielded in some years. Similar to the poverty measures, if a respondent received AFDC and/or food stamps in any of the years when he/she was 2133 years old, a value of "1" was assigned to the respondent for the measure of ever on welfare between ages 2133; otherwise, a value of "0" was assigned. The measure of years on welfare between the ages of 2133 sums the years on welfare between the ages of 2133.
The other two economic measures (percent time employed and tenure) were created using both the NLSY79 main and Work History Files. The NLSY79 Work History File was constructed from work experience data collected during the main NLSY79 surveys. It provides a weekbyweek longitudinal work record of each respondent from January 1, 1978 through the current survey date. Since continued education may take away time from working, to give every respondent a consistent starting point in counting his/her percent time employed and tenure, we created the measure of percent time employed and the measure of tenure from the end of formal schooling to the age of 33. The age at which a respondent ended his/her formal education was created using two sets of created variables that summarize each respondent's school enrollment status and highest grade completed as of May 1 of each survey year from the NLSY79 main file. ^{25}
"Percent of time employed" for each respondent is calculated by the ratio of weeks employed over weeks in the labor force between the end of formal schooling and age 33. This measure is intended to measure attachment to the labor force. We chose to limit this measure to time in the labor force so as not to "penalize" women for childbearing. Although some individuals maintain an attachment to the labor force, they have difficulties maintaining a steady job. We define "steady job" as one lasting at lest two years. For a discussion of measuring the transition from school to work, see Pergamit (1995).
To calculate age when first reached two years of tenure after the end of formal schooling, we first calculated the age when started each job listed. Then the starting age was compared with age when formal schooling ended. If the job was started before formal schooling ended, then the starting point for the job was replaced by age when formal schooling ended. Tenure was calculated for each job listed by the difference between the starting and the ending point of each job. Age when first reached two years of tenure was obtained by choosing the minimum age at jobs where two years of tenure was reached. One limitation of this variable is that college graduates cannot achieve two years of tenure at as early ages as those with less education (see Table 6). Thus, they appear to do worse for this outcome. The impact on the analysis, however, is small since college graduates generally attain two years of tenure quickly with fewer such individuals in the right tail of the distribution.
The family formation domain has one measure, which is a measure of six combinations of marriage and fertility status at the age of 33. Marital status and fertility status at the age of 33 were first created separately.
A series of edited Supplemental Fertility File variables that reflects the beginning and ending dates of marriages was constructed for 1982 through 1998. This information is derived from the marriage section of the NLSY79 questionnaire. We used the information to create length of first marriage and assigned marital status at the age of 33. Marital status at the age of 33 has three categories: never married, married and stayed married, and married but divorced at the age of 33.
Beginning in 1982, every wave of the NLSY79 data release has included a created variable that tracks the age at which respondents first had a child. This information was used to create fertility status at the age of 33, that is, whether a respondent had a child by the age of 33.
The marital status variable and the fertility status variable were then combined to create a sixcombination measure of marriage and fertility status at the age of 33. The categories of this measure are: never married without children, never married with children, married without children, married with children, married but divorced without children, and married but divorced with children.
The crime domain has one measure: ever been in jail by the age 33. NLSY79 respondents are followed and interviewed even when they enter an institution. Interviewers designate the "Type of Residence" which identifies those respondents who resided in jail at each interview date. This information was used to create a binary measure of ever being in jail by the age of 33. Since some jail terms will have occurred between interviews, this is an underestimate of the number of respondents who ever spent time in jail.


Youth Risky Behavior Variables

We explore five youth risky behaviors: alcohol usage, marijuana usage, cocaine usage, sex activity, and delinquency. The first four behaviors are measured by age of initiation. Age when started a risky behavior was asked in multiple waves of interviews. In general, the value usually taken was from the earliest year the question was asked. However, to overcome potential data entry errors, if the reported age of initiation is less than 11, then a later year entry was taken. Those who reported age of initiation less than 11 were included in the youngest age group. Age when started to use alcohol was asked in both the 1982 and the 1983 interview. Age of marijuana and cocaine initiation was created using the 1984 and 1998 questions on drug initiation. Age of sex initiation was created using the 1983, 1984, and 1985 questions on sex initiation. Age of initiation is grouped into four categories: 1115, 1617, 1819, and not by age 19 (i.e. after age 19 or never initiated).
Delinquency was measured as the total number of delinquent and/or criminal acts reported in the 1980 interview. The 1980 NLSY9 included a selfreported section detailing respondents' participation in and income from delinquent or criminal activities such as skipping school, alcohol/marijuana use, vandalism, shoplifting, drug dealing, robbery, assault, or gambling during the previous twelvemonth period. Alcohol and drug use were not included so as not to doublecount with the other risky behavior measures. Measures of the total number of delinquent/criminal acts and the types of delinquent/criminal acts were both created. Since distributions of both measures are very similar, we opt to use only the first measure. This measure could be problematic because of the age differences among respondents in 1980. However, the chosen measure predicted very well the likelihood of spending time in jail. Further research could be considered to separate personal versus property crimes, frequently distinguished in the criminal behavior literature. The number of delinquent/criminal activities was grouped into four categories: no delinquent/criminal acts, 12, 38, and 9 or more.


Family Environment Variables

Family Structure: In 1979, data were collected on whom respondents lived with at the age of 14. Twentyeight categories of living arrangements were collapsed into six family structure categories: living with both biological parents, with single mother, with single father, with mother and stepfather, with father and stepmother, and other relatives and/or nonrelatives (including institutions). If a mother (father) was living with a man (woman) to whom she (he) is not married, we consider that person a stepfather (stepmother).
Parental Educational Achievement: Highest grade completed was collected for respondents' biological mothers and biological fathers in 1979. Parental educational achievement was broken down into four categories: high school dropout, high school graduate, some college, and college graduate.
Parental Alcohol Problems: The 1988 NLSY79 interview collected family history on alcoholism and problem drinking. Respondents were asked whether they had any relatives who had been alcoholics/problem drinkers at any time, their relationship to the alcoholic relatives, and the number of years they lived with the alcoholic relatives. Based on this information, we created a binary measure of whether a respondent had at least one alcoholic parent whom he/she lived with for at least one year.


Other Explanatory Variables

Sex: Individual respondent was assigned the value of "1" if he/she was identified as "male", otherwise the value of "0" was assigned.
Race and Ethnicity: The NLSY79 distinguishes among three mutually exclusive and exhaustive raceethnicity groups. A respondent was designated as "nonblack, nonHispanic", "black", or "Hispanic" based on a racial/ethnic variable from the NLSY79 Screener File. Since Hispanic is a mutually exclusive group, this classification is different than used for Census Bureau tabulations where Hispanics can also be classified as "black" or "white."
Educational Achievement: Data have been collected during each NLSY79 survey on respondents' current school enrollment status, highest grade attended and highest grade completed. Information on highest grade completed from 1979 through 1998 was compiled to obtain a measure for the respondent's educational achievement. There are three categories for this measure: high school dropout, high school graduate, and some college and up.
Rosenberg SelfEsteem Scale: The Rosenberg SelfEsteem Scale administered during the 1980 interview was used to construct a summary score that measures the selfevaluation that an individual makes and customarily maintains. It describes a degree of approval or disapproval toward oneself (Rosenberg, 1965). The summary score ranges from 4 to 40, with higher scores designating higher selfesteem.
Table A1. Means and Sample Sizes of Variables Used in the Analysis Variable Sample Size Weighted Mean Youth Risky Behavior Variables
Age of Alcohol Initiation
1115
9717 20.4% 1617
9717 37.2% 1819
9717 26.9% not by 19
9717 15.5% Age of Marijuana Initiation
1115
8595 23.2% 1617
8595 22.3% 1819
8595 12.1% not by 19
8595 42.4% Age of Cocaine Initiation
1115
8569 1.2% 1617
8569 4.7% 1819
8569 7.5% not by 19
8569 86.5% Age of Sex Initiation
1115
9640 21.7% 1617
9640 34.7% 1819
9640 23.8% not by 19
9640 19.8% Number of Times Committed Crimes/Delinquencies
9+
9320 15.1% 38
9320 25.3% 12
9320 22.7% 0
9320 37.0% Adult Outcome Variables
Alcohol Abuse or Dependence around Age 30
9986 13.8% Pastmonth Drug Use around Age 30
8803 11.1% Ever Being in Jail by age 33
9986 3.7% Ever in Poverty between Ages 2529
9986 23.5% Ever on Welfare between Ages 2133
9986 25.0% Number of Years in Poverty at Ages 2529
9430 0.51 Number of Years on Welfare at Ages 2133
9986 1.08 Percent Time Employed Between the end of Formal Schooling and age 33
9801 0.91 Age when Reached 2 Years of Tenure after Formal Schooling
8467 24.97 Marital and Fertility Status at Age 33
Never married without children
9144 15.3% Never married with children
9144 6.4% Married without children
9144 11.9% Married with children
9144 43.6% Married but divorced without children
9144 5.2% Married but divorced with children
9144 17.7% Other Explanatory Variables
Male
9986 50.9% Race/Ethnicity
Nonblack nonHispanic
9986 79.3% Black
9986 14.2% Hispanic
9986 6.6% Educational Attainment
High school dropout
8399 9.5% High school graduate
8399 42.8% Some college and up
8399 47.6% Rosenberg Selfesteem Scale (440)
9555 32.46 Family Type
Both biological parents
9847 75.2% Single mother
9847 12.1% Single father
9847 1.2% Mother and step father
9847 7.6% Father and step mother
9847 1.8% Others (relatives. nonrelatives, institution)
9847 2.2% Mother's Education Attainment
High school dropout
9327 32.0% High school graduate
9327 46.9% Some college
9327 11.1% College Graduate
9327 10.0% Father's Education Attainment
High school dropout
8478 33.7% High school graduate
8478 36.8% Some college
8478 11.1% College Graduate
8478 18.4% Biological Parent with Drinking Problem
8094 20.3%


Description of Multivariate Regression Estimates

To obtain accurate estimates of the effects of youth risky behaviors and family environment on adult outcomes, different types of regression techniques were utilized for different adult outcomes. Many econometric textbooks provide good discussions of the principles and practice of the regression analyses used in this study, e.g. Greene (2000), Judge, et al (1985). These regression techniques were applied to assess the independent effects of youth risky behaviors and family environment on adult outcomes. By independent effects, we mean the effects after other important factors affecting adult outcomes have been controlled or adjusted. These "effects" are associations between the dependent and independent variables and do not necessarily reflect a causal relationship.


OLS Regression

Ordinary Least Squares (OLS) regression is used to estimate continuous outcome variables that are normally distributed. In this report, percent time employed between the end of formal schooling and the age of 33 is estimated using OLS regression.
Since OLS regression assumes a linear function, the interpretation of estimated coefficients is simple and straightforward. Estimated coefficients from OLS regressions measure changes in the outcome variable resulting from a unit change in an explanatory variable. For example, in estimating percent time employed, we obtained an estimated coefficient of 0.02 for males relative to females. This means that the percent time employed for male respondents was on average 0.02 higher than that of female respondents.
Most explanatory variables in our analysis are categorical variables. In estimating outcome models, one of the categories of each categorical variable has to be dropped due to colinearity. The omitted category becomes the reference group. Any estimates for other categories become relative to the reference group. For example, age of marijuana initiation has four categories: initiated at ages 1115, at ages 1617, at ages 1819, and at ages older than 19 or never initiated. If the reference group is "initiated at ages 1115" and the estimated parameter for the group who initiated at ages 1617 is 0.003 in the OLS regression of percent time employed, this means that those who initiated at ages 1617 had on average percent time employed 0.003 higher than those who initiated at ages 1115.


Logistic Regression

Logistic regression is used for modeling outcomes that are binary (1/0) variables. A linear probability model has a number of shortcomings in estimating binary dependent variables (Judge et al 1985, Cox and Snell, 1989). Adult outcomes that are binary in our report are alcohol abuse or dependence, pastmonth drug use, ever being in jail by the age of 33, ever being under poverty at ages 2529, and ever being on welfare at ages 2133.
Coefficient estimates from logistic regression do not allow an easy interpretation. Instead, odds ratios, an alternative and preferred measure, are used to present estimated results. Odds ratios measure the relative probability of the estimated outcome among one group relative to the reference group. For example, in estimating the probability of alcohol abuse and dependence, we obtained an odds ratio of 2.25 for males relative to females. Then the interpretation is that male respondents in the sample were 2.25 times as likely to develop alcohol abuse and dependence as female respondents in the sample. If an odds ratio for a group is greater than 1, then this group is more likely to end up with the outcome compared with the reference group (whose odds ratio is 1); if an odds ratio is less than 1, then this group is less likely to end up with the outcome compared with the reference group.


Multinomial Logistic Regression

Estimation of unorderedchoice dependent variables requires a multinomial logistic model (Greene, 2000). It is intended for use when the dependent variable takes on more than two outcomes and the outcomes have no natural ordering. In our study, the family formation variable, "fertility and marital status at the age of 33", takes on six outcomes without natural ordering. Similar to logistic regression, the multinomial logistic regression provides a measure of the probability of one outcome relative to the reference outcome, known as relative risk. However, it is more difficult to interpret the relative risk from multinomial logistic regression since there are multiple equations. As an alternative, prediction is used to aid interpretation. We use the "method of recycled predictions", in which we vary characteristics of interest across the whole data set and average the prediction (STATA Reference, version 7). For example, in our data set we have those who initiated sex at ages 1115, 1617, 1819, and those who had not initiated sex by age 19. We first assume that all respondents initiated sex at ages 1115 but hold their other characteristics constant. We then calculate the probabilities for each fertility and marriage outcome. We repeat this exercise for the other three initiation groups. The difference between any two sets of calculated probabilities, then, is the difference due to different ages of sex initiation, holding other characteristics constant. For example, the predicted probabilities of "never married with children" for the four age categories of sex initiation are 0.112, 0.087, 0.075, and 0.042, respectively.


Negative Binomial Regression

Negative binomial regression is used to model count dependent variables. A count variable, for example, the number of years in poverty, is assumed to follow a Poisson distribution. The Poisson distribution has the feature that its mean equals its variance. Since the variance of a count variable is often empirically larger than its mean, a situation known as overdispersion (Hausman, Hall and Griliches, 1984), in a negative binomial regression, the Poisson parameter is assumed to follow a Gamma distribution. In our study, two outcome variables are of the nature of count data: years in poverty between the ages of 2529, and years on welfare between the ages of 2133.
An estimated coefficient from a negative binomial regression is interpreted as percentage changes in the outcome variable given a unit change in an explanatory variable. To keep the interpretation comparable, we present results from negative binomial regressions in the form of marginal effects. Marginal effects measure changes in an outcome variable resulting from a unit change in an independent variable. For example, in estimating the number of years in poverty, we obtained a marginal effect of 0.231 for males relative to females (whose marginal effect is set to 0), which means male respondents on average spent 0.231 years less in poverty compared with female respondents. In essence, the interpretation of marginal effects is equivalent to estimated coefficients in linear models.


Weibull Regression

The Weibull distribution is one of the distributions used in modeling survival or duration data. The variable of interest in the analysis of duration is the length of time that elapses from the beginning of some event either until its end or until the measurement is taken, which may precede termination. In our study, we apply Weibull regression to the age when reached 2 years of tenure with one employer after ending formal schooling. The starting point is the age when formal schooling ended and the ending point is when two years of tenure was reached.
The Weibull distributional function is nonlinear. To keep the interpretation of results comparable, we present results in the form of marginal effects as well. Marginal effects measure changes in an outcome variable resulting from a unit change in an independent variable. For example, in estimating age when reached 2 years of tenure after ending formal schooling, we obtained a marginal effect of 1.003 for males relative to females (whose marginal effect is set to 0). This means it took male respondents on average 1.003 years less than female respondents to reach 2 years of tenure after ending formal schooling.


Statistical Significance of Estimates

The outcome variables in this study were assumed to follow different distributions and therefore a variety of regressions were estimated. To demonstrate statistical significance of estimated results consistently and as simply as possible, we chose to show pvalues for all estimates, be it odds ratios or marginal effects. Pvalues range from 0 to 1 and are the observed "tail" probability of a statistic being at least as extreme as the particular observed value when the null hypothesis is true. Usually the null hypothesis in our study is that the estimated coefficient is 0. For example, if the estimated coefficient of male relative to female from the OLS regression of percent time employed yields 0.02 and the tstatistic is 3.45 with the associated pvalue at 0.01, then the probability is as high as 99 percent (1 minus pvalue) that the true effect of sex on percent time employed is not 0. The lower the pvalue, the higher the statistical significance of the estimated result. The cutoff for statistical significance in this report is a pvalue of 0.1.


Endnote

25. The creation of the variable "last time enrolled in school" takes into account the highest degree obtained by each individual respondent. An algorithm was developed to create this variable.
