|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Reporting Clinical Trial Results To Inform Providers, Payers, And Consumers
Results of randomized clinical trials are the preferred "evidence" for establishing the benefits and safety of medical treatments. We present evidence suggesting that the conventional approach to reporting clinical trials has fundamental flaws that can result in overlooking identifiable subgroups harmed by a treatment while underestimating benefits to others. A risk-stratified approach can dramatically reduce the chances of such errors. Since professional and economic incentives reward advocating treatments for as broad a patient population as possible, we suggest that payers and regulatory bodies might need to act to motivate prompt, routine adoption of risk-stratified assessments of medical treatments safety and benefits.
Everything should be made as simple as possible, but not one bit simpler. Clinicians, policymakers, and governmental regulatory bodies rely on the randomized controlled trial (RCT) as the preferred "evidence" for establishing the benefits and safety of medical treatments. Improving the quality of this evidence base has become a target of international health policy with the recent proposal for mandatory registration of clinical trials in an effort to eliminate post hoc decisions not to publish trial results when they reflect unfavorably on the treatment being studied.1 We present another problem with the base of evidence on the safety and benefits of medical treatments, along with a proposed solution. The main results of clinical trials are presented as the average benefit across all people in the trial. To aid decisions in applying the trial results to individuals, researchers commonly conduct subgroup analyses to identify specific groups of patients who might benefit more or less than average. Such analyses typically compare groups that differ in a single attribute (such as age or sex), and such "one-variable-at-a-time" analyses often yield little useful information. Researchers have proposed that evaluating overall risk using multivariable prediction tools (hereafter referred to as "risk-stratified" analysis) could overcome some of the limitations inherent in conventional approaches to analyzing and reporting clinical trials.2 To examine this issue, we evaluated clinical trials using conventional analyses and compared these results with those obtained using risk-stratified analyses that examine benefits for patients who were at lower versus higher risk. We conclude that the conventional approach has fundamental flaws that can result in overlooking identifiable subgroups harmed by a treatment while greatly underestimating benefits to others. Therefore, the "medical evidence" used to make critical policy decisions (for example, on drug safety, insurance benefit packages, and performance measures) might be systematically misleading or incomplete. We further conclude that although a risk-stratified approach could better elucidate how the safety and benefits of treatments vary across the population, it also runs counter to current professional and economic incentives to promote treatments to as broad a patient population as possible. Therefore, the adoption of risk-stratified assessments of the safety and benefits of medical treatments might require the active intervention of payers and regulatory bodies.
The average benefits observed in a clinical trial often do not reflect the benefits observed in all, or even most, patients in a clinical trial.3 For example, Exhibit 1
Further, when there is an appreciable risk of treatment-related adverse events, reporting only the average result of a clinical trial might obscure a group that is harmed by treatment.4 Consider a hypothetical treatment that decreases the baseline risk of patients suffering a bad outcome by 30 percent over five years, but at a price of two treatment-related severe adverse events every year for every thousand patients treated. In this instance, the average benefit (NNT = 125) greatly underestimates the benefit for high-risk patients (NNT = 29) but overestimates the benefit for the median patient (NNT = 200) (Exhibit 1
Heterogeneity of baseline risk (and benefits), like that demonstrated in Exhibit 1
Theory and calculations. When situations such as that shown in Exhibit 1
Exhibit 2
Real-world examples from the medical literature.
Having demonstrated the statistical utility of this approach, let us move to asking if it really matters in some important clinical trials. The original analysis of the European Carotid Surgery Trial (ECST) found that carotid endarterectomy (CEA), a surgical procedure designed to relieve blockages of arteries in the neck, reduced the absolute risk of major stroke or death by almost 12 percent (NNT = 9). Conventional "one-variable-at-a-time" subgroup analysis failed to identify any patient subgroup that would not benefit from this surgical procedure (in agreement with a previous study), so the authors endorsed that CEA should be recommended "for most patients with a recent nondisabling carotid TIA when the symptomatic stenosis is greater than 80%."7 However, in a landmark study, Peter Rothwell and Charles Warlow reanalyzed the ECST using a risk prediction tool. Upon reanalysis, patients with a higher risk score (baseline five-year stroke risk = 40%) received dramatic benefits from surgery (NNT = 3), but the typical patient in the study (baseline stroke risk The Global Utilization of Streptokinase and Tissue Plasminogen Activator for Occluded Coronary Arteries (GUSTO) trial presents another dramatic example. GUSTO found a significant decrease in mortality for acute heart attack patients who were treated with a new "clot-busting" medication" (a thrombolytic medication called tPA) when compared with results for those treated with an older, less expensive "clotbuster" (streptokinase). However, risk-stratified analyses of GUSTO found dramatic variations in benefit that once again could not be identified using conventional subgroup analysis.9 For example, David Kent and colleagues divided patients based on an externally validated risk/benefit stratification model that predicted (1) risk of death due to heart attack, (2) risk for brain hemorrhage (a known complication of clotbusters), and (3) relative benefit from treatment (determined by time from symptom onset to time of clot-buster administration).10 They found that 25 percent of GUSTO subjects accounted for more than 60 percent of the total benefit. However, they found that half of the GUSTO population received little to no net benefit from tPA, and they also identified a group in which the risk of tPA-related brain hemorrhage exceeded tPAs benefits.
Yet another example, similar to the "negative" trial presented in Exhibit 2 In each of the cases above, conventional subgroup analysis was unable to accurately detect variations in benefit and safety that were clinically important and readily identifiable using a risk-stratified approach. In the first two examples, interventions were promoted for people who receive little or no benefit and potential harms to identifiable low-risk subgroups were ignored, while in the latter case, the potential benefit in a sizable patient subgroup was completely missed. These examples are particularly compelling in that they relate to treatments that are expensive and commonly used. It is noteworthy, therefore, that despite these important findings, including uncovering safety problems that were not identified in conventional subgroup analysis, subsequent clinical trials in these clinical areas (and there have been many) have generally not reported risk-stratified analysis. Current approach to reporting RCT results. We reviewed clinical trials published in the Journal of the American Medical Association, the Lancet, or the New England Journal of Medicine during 2001 and identified 108 clinical trials that reported results on major patient outcomes, such as mortality or major morbidity.12 Of the 108 eligible trials, 42 (39 percent) reported no subgroup analysis. Nearly all subgroup analyses reported on single patient attributes in isolation. Only four studies (4 percent) reported treatment benefit for lower- versus higher-risk patients, and only one of these studies used a robust statistical method.
It is well recognized that investigators almost always have a perceptual bias toward viewing their results positively. Strong professional, political, and financial incentives often amplify this predisposition. This study addresses a variant of this bias: the desire to promote a beneficial treatment for use in as many people as possible. If we as a society are to make the best use of our health care dollars, we need to know who truly benefits from increasingly costly interventions. The current conventions for analyzing and reporting the results of clinical trials fail to provide policymakers with essential information for making such decisions. However, the evidence base for informing providers, payers, and consumers could be dramatically improved by one simple addition to conventional reporting of clinical trials: Whenever a multivariable prediction tool is available, the observed relative and absolute risk reduction for subjects with higher versus lower predicted net benefit should be reported using risk-stratified analysis. When using a validated prediction tool and robust statistical methods, this approach can represent a single a priori statistical comparison, thereby avoiding the high risk of false positive and false negative results inherent in multiple "one-variable-at-a-time" subgroup analysis. Even for small studies that have marginal statistical power, risk-stratified analysis will still be valuable for comparing results between different studies or conducting meta-analyses. When possible, prediction tools should be externally developed and validated and should be part of the prespecified a priori analysis plan. Certainly no analytic technique can fully account for all important factors (such as basic design or sample-size limitations); however, our results clearly suggest that risk-stratified analysis can detect safety problems and identify high-benefit subgroups that cannot be detected by conventional methods. Proposals for change. Given the bias against publishing negative results, investigators might be more likely to conduct robust risk-stratified analysis when the overall study results show no benefit (a negative trial) than when the trials average result is positive. For example, we found only one clinical trial published in 2001 that used an analytic approach similar to what we propose. This study, which examined a treatment for unstable coronary syndrome, used a multivariable risk-stratified analysis and found that low-risk patients did not receive substantial benefit from treatment.13 It is therefore interesting that subsequent positive clinical trials examining treatments for acute coronary syndromes (including a study by the same investigator published in the same journal three years later) have not used this risk-stratified approach.14 Given current incentives, we think it unlikely that most researchers will voluntarily conduct and report analyses evaluating whether low-risk subgroups do not benefit from a treatment. Journals also have an understandable bias toward reporting more positive and easy-to-understand results, which might in part explain why editorial boards have not required risk-stratified analysis. Pressure from organizations representing purchasers and consumers interests (such as the Leapfrog Group, National Committee for Quality Assurance, and others) and those setting guidelines for clinical trial reporting (such as the Consolidated Standards for Reporting of Trials, or CONSORT) could help advocate for more complete reporting of medical evidence.15 Given the obvious economic incentives for industry (and researchers with strong financial connections to industry) to get treatments approved for as broad a population as possible, regulatory agencies such as the U.S. Food and Drug Administration (FDA) and the U.K. National Institute for Clinical Excellence (NICE) should consider requiring risk-stratified analysis, since, as discussed above, safety problems in identifiable subgroups can be missed if we continue to rely on conventional reporting. There is at least one precedent for having the FDA require risk-stratified analysis. In 2001 a clinical trial demonstrated the efficacy of a new and expensive treatment (drotrecoginabout $10,000$16,000 per patient) for people with severe life-threatening infections (sepsis). However, the FDA advisory board noted that a measure of disease severity (Acute Physiology and Chronic Health Evaluation, or APACHE, score) was collected in this study but not considered in the published analyses. When a risk-stratified analysis was later required, it was found that the 50 percent of patients with lower mortality risk (APACHE II scores less than 25) did not benefit from treatment (relative risk = 0.99 [0.75, 1.30]). As a result, this treatment was approved for use only in those with higher APACHE scores.16 Criticisms of this decision often focused on the post hoc nature of this analysis, whereas it could be more appropriate to wonder why the most logical and important subanalysis was not part of the a priori analysis plan. Caveats and future study. Although a multivariable approach is an important advance in reporting medical evidence, it has limitations. Individual risk factors can have complex and linked effects on both the benefits and risks of treatment, which calls for great care in model development and validation.17 One particularly challenging question is how best to coordinate validating and updating population-specific prediction tools and facilitating their optimal use in day-to-day clinical practice. However, better information technology, especially Internet and handheld device applications, could greatly aid this effort. In addition, clinical practice and informed patient decision making can be greatly improved by even a qualitative understanding, without any mathematical calculations, that benefit is highly dependent on baseline risk. Still, reality can present unwanted complexity. It can be far easier to deal with simple averages and artificial dichotomies. Thus, we predict that the most difficult challenge for risk-stratified analysis will come from the ways in which this approach will inevitably make decision making more challenging and nuanced. Risk-stratified analysis will make explicit that the amount of expected benefit for individuals almost always exists along a continuum, with some people residing in a range in which statistical certainty ranges from "no benefit" to "moderate benefit." The false dichotomies of the current paradigm might be erroneous, but they are also often much more congenial for provider, patient, and policymaker alike, since they are congruent with binary decision making (to treat or not to treat). Therefore, risk-stratified reporting could be more accurate in estimating an individuals risks and benefits of treatment, but it also runs the risk of inducing policy paralysis by illuminating the arbitrary nature of any prespecified treatment threshold. We propose that instead of retreating to the comfort of false dichotomies, we consider intermediate steps for our policy decisions. For example, instead of deciding whether coverage of a treatment should be zero or 100 percent, we could adjust copayments based on the amount of expected benefit (instead of basing copayments on the cost of treatment alone).18 Similarly, instead of considering performance measures as met or not met, we might need to consider the degree of importance of the deviation from recommended care.19 Certainly, allowance for patients preferences should become even more important when there is greater uncertainty regarding the likely risks and benefits of treatment. Understanding how treatment benefits vary between lower-versus higher-risk patients is fundamental to optimal policy decision making, but adoption of this approach will often run counter to the incentives faced by those funding, conducting, and reporting clinical trials. The public may best be served by proactive policies advancing risk-stratified analysis rather than simply expecting researchers to adopt this approach voluntarily. Regulatory bodies and payers have a particularly strong interest in advancing risk-stratified assessments of medical evidence, to decrease the chances that incomplete analyses mislead us to extending expensive and burdensome treatments to those who may derive no benefitor worse, harm.
Rodney Hayward (rhayward{at}umich.edu) is director of the Department of Veterans Affairs (VA) Center for Practice Management and Outcomes Research, VA Ann Arbor Healthcare System, and a professor of medicine and public health at the University of Michigan. David Kent is an assistant professor of medicine at Tufts University School of Medicine and a clinical investigator at the Institute for Clinical Research and Health Policy Studies at TuftsNew England Medical Center in Boston. Sandeep Vijan is a research scientist at the VA Ann Arbor Healthcare System and an assistant professor in the Department of Internal Medicine, University of Michigan School of Medicine. Timothy Hofer is research scientist at the VA Ann Arbor Healthcare System and an associate professor in the Department of Internal Medicine, University of Michigan School of Medicine. The authors thank Adam Tremblay for conducting the literature review and Joel Howell, Joy Pritts, and two anonymous reviewers for their comments on earlier drafts. This work was supported in part by Veterans Affairs (VA) Research and Development, the VA Health Services Research and Development Service (Grant no. QUERI DIB 98-001); and the VA Cooperative Studies Program (Grant no. CSP #465), with additional support being provided by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health (Grant no. P60 DK-20572). David Kent is supported by a Career Development Award from the National Institute for Neurological Disorders and Stroke (Grant no. K23 NS44929-01). The above views and opinions are those of the authors and do not necessarily reflect those of the U.S. Department of Veterans Affairs or the University of Michigan.
This article has been cited by other articles:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||