For each outcome measure, program impacts were estimated as the difference in regression adjusted mean values between the program and control groups. These impacts were estimated both overall and for each site individually. The overall estimate was obtained simply by averaging the estimated impacts for each of the four individual sites. This approach was preferred to weighting each site according to the size of its sample, which would have arbitrarily given some sites (most notably Teens in Control) more importance when computing a pooled estimate.
Variable  Definition 

Measures of Risk Behavior and Behavioral Consequences  
Sexual Abstinence and Sexual Activity  
Remained Abstinent  Binary variable: equals 1 if youth reported never having had sexual intercourse; equals 0 if youth reported having had sexual intercourse (ever). 
Abstinent Last 12 Months  Binary variable: equals 1 if youth reported not having had sex in last 12 months; equals 0 if youth reported having had sex in last 12 months. 
Number of Sexual Partners  Categorical variable, with five categories: (1) remained abstinent; (2) one sexual partner ever; (3) two sexual partners ever; (4) three sexual partners ever; and (5) four or more sexual partners ever. 
Age at First Intercourse  Continuous variable, equal to the age that youth who have not remained abstinent report having first had intercourse. Youth who have remained abstinent are assigned missing values (dropped from the analysis). 
Expectations for Future Behavior  
Expect to Abstain Through High School  Binary variable: equals 1 if youth reported expecting to abstain through high school (including those who have previously had sex); equals 0 otherwise. Youth who were 18 or older at the time of the survey were dropped from the measure. 
Expect to Abstain as a Teenager  Binary variable: equals 1 if youth reported expecting to abstain until age 20 (including those who have previously had sex); equals 0 otherwise. Youth who were 20 or older at the time of the survey were dropped from the measure. 
Expect to Abstain Until Marriage  Binary variable: equals 1 if youth reported expecting to abstain until married (including those who have previously had sex); equals 0 otherwise. 
Risks of STDs and Pregnancy  
Unprotected Sex at First Intercourse  Categorical variable, with three categories: (1) remained abstinent; (2) had sex and reported using a condom the first time; (3) had sex and reported not using a condom the first time. 
Unprotected Sex Last 12 Months  Categorical variable, with four categories: (1) abstinent last 12 months; (2) had sexual intercourse last 12 months and always used condom; (3) had sexual intercourse last 12 months and sometimes used condom; and (4) had sexual intercourse last 12 months and never used condom. 
Birth Control at First Intercourse  Categorical variable, with three categories: (1) remained abstinent; (2) had sex and reported using birth control the first time; (3) had sex and reported not using birth control the first time. 
Birth Control Last 12 Months  Categorical variable, with four categories: (1) abstinent last 12 months; (2) had sexual intercourse last 12 months and always used birth control; (3) had sexual intercourse last 12 months and sometimes used birth control; and (4) had sexual intercourse last 12 months and never used birth control. 
Possible Consequences of Teen Sex  
Ever Been Pregnant  Binary variable: equals 1 if respondent reported ever having been (or gotten someone) pregnant; equals 0 otherwise. 
Ever Had a Baby  Binary variable: equals 1 if respondent reported ever having had a baby; equals 0 otherwise. 
Ever Had a (Reported) STD  Binary variable: equals 1 if youth reported that a doctor said s/he had an STD; equals 0 otherwise. 
Other Risk Behaviors  
Smoked Cigarette (Past Month)  Binary variable: equals 1 if respondent reported having smoked a cigarette at least once in last month; equals 0 otherwise. 
Drank Alcohol (Past Month)  Binary variable: equals 1 if youth reported having drunk alcohol at least once in last month; equals 0 otherwise. 
Used Marijuana (Ever)  Binary variable: equals 1 if youth reported ever having used marijuana; equals 0 otherwise. 
Potential Mediators of Teen Sexual Activity  
Ability to Identify STDs  
Overall Identification of STDs  Continuous (scale) variable: the percent of 13 diseases that are correctly identified as actual STDs (such as chlamydia) or false STDs (such as diabetes). 
Identification of True STDs  Continuous (scale) variable: the percent of the nine actual STDs correctly identified. 
Identification of False STDs  Continuous (scale) variable: the percent of the four nonSTDs correctly identified. 
Understanding of Pregnancy and STD Risks  
Knowledge of Unprotected Sex Risks  Continuous (scale) variable: the percent correct of two items, which asked the respondent whether one instance of unprotected sex can result in (1) a pregnancy, (2) an STD. 
Knowledge of STD Consequences  Continuous (scale) variable: the percent correct of three items, which asked the respondent whether STDs can cause (1) cancer, (2) fertility problems, (3) increased risk for asthma. 
Perceived Effectiveness of Condoms  
Perceived Effectiveness at Preventing Pregnancy  Categorical variable: respondent reported that when used correctly, condoms either usually, sometimes, or never prevent pregnancy, or that s/he was unsure. 
Perceived Effectiveness at Preventing HIV  Categorical variable: respondent reported that when used correctly, condoms either usually, sometimes, or never prevent HIV, or that s/he was unsure. 
Perceived Effectiveness at Preventing Chlamydia and Gonorrhea  Categorical variable: respondent reported that when used correctly, condoms either usually, sometimes, or never prevent chlamydia and gonorrhea, or that s/he was unsure. 
Perceived Effectiveness at Preventing Herpes and HPV  Categorical variable: respondent reported that when used correctly, condoms either usually, sometimes, or never prevent herpes and HPV, or that s/he was unsure. 
Perceived Effectiveness of Birth Control Pills  
Perceived Effectiveness at Preventing Pregnancy  Categorical variable: respondent reported that when used correctly, birth control pills either usually, sometimes, or never prevent pregnancy, or that s/he was unsure. 
Perceived Effectiveness at Preventing HIV  Categorical variable: respondent reported that when used correctly, birth control pills either usually, sometimes, or never prevent HIV, or that s/he was unsure. 
Perceived Effectiveness at Preventing Chlamydia and Gonorrhea  Categorical variable: respondent reported that when used correctly, birth control pills either usually, sometimes, or never prevent chlamydia and gonorrhea, or that s/he was unsure. 
Perceived Effectiveness at Preventing Herpes and HPV  Categorical variable: respondent reported that when used correctly, birth control pills either usually, sometimes, or never prevent herpes and HPV, or that s/he was unsure. 
Source: Wave 4 Survey of Teen Activities and Attitudes (Mathematica Policy Research, Inc., 2005), administered to youth 42 to 78 months after enrolling in the Title V, Section 510 Abstinence Education Program study sample.
Note: See Appendix C for the wording of the individual survey questions (and responses) on which the measures are based. 

Multivariate Estimation

The regression analysis used weighted least squares models and pooled data across all four sites. Each regression model included a series of binary variables reflecting the interaction between program site and program status (program or control group). The sitespecific estimate is obtained from the regression simply from the difference between the binary variables corresponding to that site's program and control groups. The pooled impact estimate for a given outcome is obtained from the average of these four programcontrol differences. The weights used in the regressions accounted for the variability in the probability of selection to the program or control groups as well as for youth who did not complete the final followup survey.^{(2)} Standard errors from the models were calculated taking into account the variability associated with these weights.
In addition to these variables, the regression models included a large number of variables to control for individual demographic and background characteristics measured from the baseline survey (Table III.4). For the small fraction of the sample who did not complete a baseline survey (fewer than 5 percent), a supplemental survey was administered at the next survey to collect key demographic information such as age, gender, and race/ethnicity. For other covariates, missing data were imputed using the mean for the sample in a given program site.
Table III.4.
Explanatory (Control) Variables Used in the Final Impact AnalysisDemographics and Background Characteristics
Site
Enrollment cohort
Date of interview
Responded to previous surveys
Gender
Age
Race/ethnicity
Presence of mother figure
Presence of father figure
Parents marriedBaseline Contextual Factors
Communication with parents
Unmarried sister got pregnant
Sibling dropped out of school
ReligiosityBaseline Measures of Behaviors and
Potential Mediators of Teen Sex
Had sex
Perceived consequences of sex
Views on abstinence
Ability to resist pressure for sex
Expectations to have sex
Knowledge of STDsAlong with sitelevel results, the report presents estimated impacts on behavioral outcomes for several subgroups of potential interest.^{(3)} Among these are subgroups defined by gender and several measures that might be linked to eventual behavior, such as baseline support for abstinence, religiosity, marital status of parents, and television viewing. All of these subgroups were defined from survey data collected at baseline, prior to any potential influence of the programs. A final subgroup, enrollment cohort, is also investigated because of important variation found across cohorts in an earlier DHHS study report (Maynard et al. 2005). The first of these subgroups includes youth enrolled in the 19992000 or the 20002001 cohorts; the second includes youth enrolled in the final, 20012002 cohort.
Impacts were estimated for one subgroup at time, following nearly the same methods as described above for the full sample. The only difference with these methods is that explanatory terms were added to the regression models reflecting the interaction between a given subgroup of interest (for example, gender) and each of the site dummies and the "site by treatment" interaction terms. Estimates for a given subgroup were then computed using the coefficients on these terms, following the same procedure described above.


Missing Outcomes Data

Although nonresponse on the individual survey questions was generally very low, typically just one or two percent, for certain outcomes it could still result in slightly biased estimates of outcome measures if left unaddressed. The first set of these questions pertain to knowledge questions for example, "can you get pregnant if you have sexual intercourse only once?" where there is a single correct answer. For these questions, it is likely that youth who completed most of the survey section on knowledge, but skipped an individual question or two, did so because they did not know the correct answer. Thus, in order not to understate the proportion of youth who were unsure of a correct answer, the response on individuallyskipped knowledge questions was categorized as "don't know/unsure." In contrast, youth who skipped an entire section are excluded from the analysis for that set of outcomes.
A more serious form of missing data pertains to conditional questions, meaning that they are answered by youth only if they provide a particular response on a prior question or questions. For example, in order to answer the question on the number of sexual partners, the respondent must first indicate on the survey that s/he has had sexual intercourse. Since youth who have not had sexual intercourse can correctly be assigned a value of zero partners, this conditional wording means that all missing values for the question will pertain to youth who have had sexual intercourse. In turn, unless there are no missing data, the reported mean value for the full sample will be incorrect in this case understating the mean number of sexual partners. To correct for this conditional item nonresponse, missing values were imputed following a commonly used "hotdeck procedure." This procedure assigns a value on the item that was missed based on the reported values of youth with characteristics similar to those of the item nonrespondents. Through this method, the estimates for the program and control groups preserve the natural variability of the sample.


Nonparticipation and Crossover

As noted in Chapter II, a sizeable proportion of youth assigned to the program group in the two sites with elective programs, ReCapturing the Vision and FUPTP, did not participate in any program classes or other services (35 percent and 43 percent, respectively). To address this program nonparticipation, impact estimates are presented two ways in the report. The first is for the full program group. This estimate reflects the average effect of having the opportunity to participate in the program, whether or not the youth actually chose to participate. These estimates are featured throughout the report since it generalizes to the youth who were made eligible for the programs. The second is for only those youth in the program group who actually participated. These estimates are derived following the procedure developed by Bloom (1984), which divides the fullsample estimate by the participation rate. Because the standard errors and significance levels associated with the participantonly estimates are roughly similar to those for the full program group, impact estimates found not statistically significant for the full program group are typically not statistically significant for the participants either. As a result, the conclusions from the study do not differ substantively when based on one set of measures or the other.
Crossover of control group youth into the program group was rare, including at most 5 percent of the sample. For this reason, the report does not present estimates that account for crossover. To the extent that youth who did cross over experienced positive benefits from participating in the programs, the impact estimates reported are understated slightly.


Statistical Power

For the full sample, the statistical power of the study to detect impacts is high. Based on the observed explanatory power of the regression models, the study sample supports detection of true overall program impacts of roughly 0.08 standard deviations. (This is based on standard assumptions of 80 percent statistical power and 90 percent statistical confidence, twotailed.) For a proportional outcome with a mean of 50 percent, this reflects an estimated impact of roughly 4 percentage points. Program impacts that are smaller in size may also be detected from the study sample, but the likelihood of doing so is below the 80 percent probability (power level) that is commonly preferred.
For the individual program sites, statistical power is naturally lower. This is particularly true in the two sites that experienced program nonparticipation, ReCapturing the Vision and FUPTP. For example, in the absence of nonparticipation, the size and allocation of the study sample would support detection of true sitespecific impacts on the order of 0.16 standard deviations or larger for ReCapturing the Vision and 0.18 standard deviations for FUPTP. However, in light of the existing nonparticipation, the impacts on participants would need to be considerably larger about 0.25 standard deviations for ReCapturing the Vision and 0.32 standard deviations for FUPTP given equivalent levels of statistical power and confidence. This means the available samples in these two sites provide a high likelihood of detecting (that is, stating as statistically significant) true participant impacts only if they are fairly large; for example, for a proportional outcome with a mean of 50 percent, the minimum detectable impacts for participants are about 13 and 16 percentage points in the two respective sites. For the remaining two sites, My Choice, My Future! and Teens in Control, detectable impacts (at 80 percent power) are better roughly 0.17 and 0.13 standard deviations, respectively.


Hypothesis and Sensitivity Testing

For each impact estimate, a two tailed t statistic tests the null hypothesis that there is no difference between the regression adjusted means for the program and control groups. The associated p value, which reflects the probability of obtaining the observed impact estimate when the null hypothesis of no effect is true, is used to judge the likelihood that a program had a measurable (statistically significant) impact. For categorical outcome variables, a ttest is conducted on the mean (proportion) for each response. In addition, an Fstatistic tests the null hypothesis that there is no difference between the distributions of responses for the two experimental groups. This statistic is computed from a sitespecific multinomial logistic regression of the categorical outcome variable on an indicator for program status and the covariates listed in Table III.4. The findings based on the Fstatistics are consistent with those based on the individual ttest statistics.
Impact estimates with pvalues less than 0.10, on twotailed tests, are denoted in the report by asterisks and referred to in the text as statistically significant (Table III.5). While researchers sometimes use a lower pvalue, 0.05 or less, to determine significance, this higher threshold allows a careful assessment of the findings across the range of outcomes being examined. The adoption of this threshold, however, does raise the likelihood of detecting significant impacts that have resulted merely by chance. Therefore, when interpreting the findings, attention is paid to whether significant impact estimates are isolated or whether they are part of a pattern of significant estimates that would point more strongly to a true program effect.
Additional analyses were conducted to examine the robustness of the impact estimates presented in the report. These included estimating impacts through logistic regression models (for binary outcomes) rather than linear probability models, and estimating impacts dropping various combinations of regression adjustment, data imputation, and sample weights. Across all these alternative estimates, findings were consistent with those presented in the report.
Table III.5.
Conventions for Describing
Statistical Significance of Program Impact Estimatespvalue of
Impact EstimateSymbol Used to
Denote pvalueImpact Estimate Is Considered
Statistically Significant from Zerop < 0.01 *** Yes 0.01 < p < 0.05 ** Yes 0.05 < p < 0.10 * Yes p > 0.10 [none] No 1. Copies of these surveys are available online at [http://www.mathematicampr.com].
2. Selection weights were calculated as the inverse probability of selection to the group of assignment. Nonresponse weights were calculated using standard modeling techniques to estimate the probability of survey nonresponse as a function of baseline covariates.
3. Subgroups defined by race/ethnicity could not be investigated because of the very high correlation between program site and a given racial/ethnic group.

View full report
"report.pdf" (pdf, 626.39Kb)
Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®