For each impact estimate, a two tailed t statistic tests the null hypothesis that there is no difference between the regression adjusted means for the program and control groups. The associated p value, which reflects the probability of obtaining the observed impact estimate when the null hypothesis of no effect is true, is used to judge the likelihood that a program had a measurable (statistically significant) impact. For categorical outcome variables, a t-test is conducted on the mean (proportion) for each response. In addition, an F-statistic tests the null hypothesis that there is no difference between the distributions of responses for the two experimental groups. This statistic is computed from a site-specific multinomial logistic regression of the categorical outcome variable on an indicator for program status and the covariates listed in Table III.4. The findings based on the F-statistics are consistent with those based on the individual t-test statistics.
Impact estimates with p-values less than 0.10, on two-tailed tests, are denoted in the report by asterisks and referred to in the text as statistically significant (Table III.5). While researchers sometimes use a lower p-value, 0.05 or less, to determine significance, this higher threshold allows a careful assessment of the findings across the range of outcomes being examined. The adoption of this threshold, however, does raise the likelihood of detecting significant impacts that have resulted merely by chance. Therefore, when interpreting the findings, attention is paid to whether significant impact estimates are isolated or whether they are part of a pattern of significant estimates that would point more strongly to a true program effect.
Additional analyses were conducted to examine the robustness of the impact estimates presented in the report. These included estimating impacts through logistic regression models (for binary outcomes) rather than linear probability models, and estimating impacts dropping various combinations of regression adjustment, data imputation, and sample weights. Across all these alternative estimates, findings were consistent with those presented in the report.
|Symbol Used to
|Impact Estimate Is Considered
Statistically Significant from Zero
|p < 0.01||***||Yes|
|0.01 < p < 0.05||**||Yes|
|0.05 < p < 0.10||*||Yes|
|p > 0.10||[none]||No|
2. Selection weights were calculated as the inverse probability of selection to the group of assignment. Non-response weights were calculated using standard modeling techniques to estimate the probability of survey non-response as a function of baseline covariates.
3. Subgroups defined by race/ethnicity could not be investigated because of the very high correlation between program site and a given racial/ethnic group.