Examination of Clinical Trial Costs and Barriers for Drug Development. Appendix D: Additional Data Cleaning Steps


We performed the final cleaning and compilation of the various clinical trial data elements using the statistical software STATA. For some combinations of cost component, phase, and therapeutic area, Medidata did not have enough underlying trial data to provide means and variances while still maintaining confidentiality of client information. Because these missing values resulting from these data gaps would render the model’s total cost calculations incomplete, we worked closely with Medidata to extrapolate them as accurately as possible. For the outsourcing and clinical costs that were missing, Medidata multiplied overall U.S. means by phase and therapeutic-area specific factors to create tables of derived costs that could be used to fill in data for phase-therapeutic area combinations for which those measures were missing. Similarly, missing variances were filled in using the overall U.S. variances from the same pool of data used to derive the means. For the counts/non-cost data elements (Number of Site Management Months; Number of Project Management Months, and Number of Site Monitoring Days), Medidata used phase-specific factors to create tables of derived values. However, due to data limitations, these could not be broken down further by therapeutic area. Thus, we used the derived means and variances for these fields to fill in missing values across all therapeutic areas. Missing values in the Number of Planned Patients (per site) and Number of Sites (per study) fields were extrapolated using phase-specific averages across all other therapeutic areas. Finally, Number of SDV Fields (per study) could not be derived by phase or therapeutic area; therefore, in all cases where this measure was blank, it was estimated with the overall U.S. number for all phases and all therapeutic areas.

In addition to filling in missing values for the fields from Medidata, we also had to find data to populate other fields that were missing altogether. Medidata collects data on cost per IRB approvals and cost per IRB amendments which was provided to ERG; however, they do not collect data at this time on the number of IRB approvals or IRB amendments for each study. Therefore they did not have counts by which to multiply the IRB-related costs. To generate counts of IRB approvals, we assumed that one approval would be needed for each site in the study, and created a field called Number of IRB Approvals (per study), which was set to equal the Number of Sites (per study) field provided by Medidata. To obtain counts of IRB amendments, we turned to the literature on clinical trial costs and found counts of protocol amendments in a 2011 study by Kenneth Getz and other researchers at Tufts CSDD (described in Section 4).17 The study reported average numbers of amendments by therapeutic area, and separately by phase (across all therapeutic areas). Thus, we were able to use a similar method to that described above for extrapolating missing values to derive amendment counts by phase and therapeutic area; therapeutic areaspecific factors were calculated and then multiplied by the phase-specific amendment counts, allowing us to fully populate a new field called “ umber of IRB Amendments (per study).” For therapeutic areas for which there was no counterpart in Getz, et al. (2011), we used the counts for the “Other” category.

An additional cleaning step was necessary to reconcile some minor discrepancies between the data obtained from the literature and the data received from Medidata. Specifically, the mean trial phase lengths from DiMasi, Hansen, & Grabowski (2003) were, for a few therapeutic area-phase combinations, slightly shorter than the number of site management months or the number of project management months (defined below) provided by Medidata. To resolve these discrepancies, we set the trial phase length equal to the maximum of these three variables: the mean phase lengths from DiMasi, Hansen, & Grabowski (2003), the number of site management months (from Medidata), and the number of project management months (from Medidata).

17 For the purposes of this study amendments were defined as “any change to a protocol requiring internal approval followed by approval from the IRB, ethical review board (ERB), or regulatory authority. Only implemented amendments—that is, amendments approved both internally and by the ethics committee—were counted and analyzed in this study” (Getz, et al., 2011).

View full report


"rpt_erg.pdf" (pdf, 1.89Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®