The literature on telemedicine evaluation expresses concern about the rigor and consistency of methods used in the field. In a field where large, prospective randomized clinical trials (RCTs) are the methodological gold standard for evaluating the safety and efficacy of pharmaceuticals and other medical interventions, teleconsultations and other telemedicine applications present numerous evaluative challenges.
Among the shortcomings cited in the literature of telemedicine evaluations are small sample sizes, flawed and poorly implemented study designs, and inaccurate and imprecise measurement (Bashshur 1998). Specific recommendations for improving the methodology include pooling of data across programs, using RCTs, and using case control studies with relevant meta-analyses (Yellowlees 1998).
A recent effort to conduct a meta-analysis of the costs associated with telemedicine is instructive regarding the methodological strength of the available body of telemedicine. Drawing from a comprehensive literature search, the investigators identified 551 non-duplicative, English language articles reporting the findings of studies of the costs of telemedicine. Of these only 38 articles had usable quantitative cost data. Among these, so many were inadequately designed or conducted that it was not possible to perform a traditional meta-analysis. A large proportion of the studies had such severe methodological flaws as omission of the number of consultations or patients, minimal longitudinal data, and lack of uniformity in cost analysis. As a result, the investigators concluded that "it is premature for any statements to be made, either positive or negative, regarding the cost-effectiveness of telemedicine in general" (Whitten et al. 2000).
Nitzen et al. (1997) attempted to ensure methodological rigor by establishing a gold standard, requiring that each patient be examined by multiple physicians, conducting the in-person and teleconsultations within a very short time span, conducting matched-pair analyses on all study data, and by calculating kappa coefficients, both for comparison of their findings with other studies and as a check on their success in reducing bias in the study design.
In response to inquiries about the need to improve the rigor of telemedicine evaluations, several of our expert interviewees acknowledged shortcomings but also noted that many technologies in widespread clinical use have not been subjected to high standards of evidence.
Based on our review of the literature, expert interviews, and site visits, we have organized prevailing evaluation methodology issues into the following categories:
- technological maturity;
- focus of evaluation;
- perspective of evaluation;
- comparator (control group/intervention);
- randomization; and
- time horizon (i.e., study duration or follow-up).