|
PERSPECTIVE
Using Evidence Reports: Progress And Challenges In Evidence-Based Decision Making
Mark Helfand
This Perspective describes the advantages of using systematic evidence reviews in preferred drug deliberations. By involving decisionmakers in defining the scope, evidence reports help focus deliberations on clinically important questions and reduce the likelihood of bias. They define the limits of the evidence and the magnitude of differences among compared drugs; however, decisionmakers must consider other factors and apply their values to their decisions in an explicit, defensible manner. GRADE, an international consortium of systematic-review developers and users, is seeking to improve the process of incorporating considerations other than the strength of evidence in making decisions that are evidence based.
As noted in the paper by Earl Steinberg and Bryan Luce, the term "evidence based" is being attached to everything: not only guidelines and coverage decisions, but conferences, journals, departments, and even training programs.1 No certificate or license is required to use the label. The term "evidence based" generated 2.4 million Google hits, versus only 1.6 million for "low carb."
Once "evidence based" became entrenched in the lexicon, a legion of critics emerged, blaming evidence-based medicine for all of the social ills it was intended to remedy.2 By contrast, Steinberg and Luce present a refreshingly focused critique, describing the most important limitations of systematic reviews for decision making. They argue that systematic reviews are useful and incorporate important principles of research but do not ensure good decision making.3 In this Perspective I describe how systematic reviewers and users are addressing the problems Steinberg and Luce highlight in their paper.
|
Systematic Reviews, Background Papers, And Evidence Reports
|
|---|
Before the term "evidence based" was coined, a few organizations pioneered the use of comprehensive literature reviews to make clinical policies. In the United States, these were the U.S. Preventive Services Task Force (USPSTF), convened in 1984, and the American College of Physicians Clinical Efficacy Assessment Program (CEAP), which dates from about the same time. These bodies called their reviews "background papers" or "technology assessment reports." Each report addressed several bodies of evidence, examined observational studies as well as trials, and were used to make real-life decisions. Consequently, their subcommittee members had to consider the relevance of the evidence to practice; how to address information gaps; and the need to consider the magnitude of benefits and harms, not just their frequency.4
Over many years the mature evidence-based decision-making programs, particularly the USPSTF and the U.K. National Center for Clinical Evidence (NICE), have taken measures to improve the quality of their decision making. In 1998, for example, the USPSTF revised its quality ratings to include more detailed assessment of the characteristics of individual randomized controlled trials (RCTs) and observational studies. It also revised its system for grading recommendations to incorporate the magnitude of net benefit (benefits minus harms) instead of relying only on the quality of the evidence.5
|
What Evidence Reports Already Do
|
|---|
The Drug Effectiveness Review Project (DERP), described by Daniel Fox, incorporates methods used by the AHRQ-designated Evidence-based Practice Centers (EPCs), which were established in 1997.6 Like the old "background papers," EPC evidence reports are broader in scope than most systematic reviews and have several features designed to make them more useful to decisionmakers. To develop the questions a report will address, the EPC seeks input from experts, stakeholders, and patients to identify the patient populations, interventions, health outcomes, and harms. These parameters are summarized in an analytic framework, which makes clear the chain of logic underlying the case for the service and the separate bodies of evidence that will be examined to test the strength of this case.7 These innovations provided a framework to address many of the issues Steinberg and Luce raise.
In 2001 Oregon introduced evidence reports into state Medicaid decision making about preferred drugs. The immediate effect was to engage participantsand the publicin wrestling with questions such as, What do we mean by good evidence? How should we reconcile conflicting studies? When do we use our judgment to fill in the gaps? Participants became aware that these questions are important and that their deliberative processes were a new laboratory for trying to answer them.
By laying out the available studies and their strengths and flaws, the DERP evidence reports clarify for clinicians, pharmacists, and the public which assertions about drugs are based on evidence from clinical studies and which are not. Some participants are shocked to find that the evidence base for common practices is flimsier than they had thought. In the absence of an evidence report, studies may be introduced selectively into deliberations; for example, an advocate for one drug or another will introduce an abstract from a recent meeting purporting to show that the favored drug is the only one that, say, is eliminated quickly from the kidney in patients who are taking blood thinners. The details of the study that would enable one to judge its strength and relevance may never actually be discussed. The fact that there may be ten other studies of the same drug that came to a different conclusion is not brought up. The scientific issuesrate of elimination, interaction with blood thinnersmay or may not be clinically relevant (it could be that differences in the rate of elimination make no difference in the drugs dose or safety). This short advertisement for this drug may divert the groups attention from issues that are more important to patientswhich may, in fact, be the whole purpose of discussing this abstract in the first place.
The evidence reports help decisionmakers identify the important questions up front and remove bias in finding evidence about these questions. In Oregon, the first state to use systematic drug-class reviews, panels of citizens met to select the questions each review should address. The purpose was to ensure that the reviews targeted the decisions and outcome measures that mattered to physicians, pharmacists, and, especially, patients. This approach is labor-intensive and messier than letting the systematic reviewers choose the questions, but it helps ensure that the evidence report itself and the subsequent deliberations about the evidence stay focused on what matters to the people who have to live with the results.
The DERP is a relatively new project. The states that use DERP reports have improved the quality and completeness of debate and have made the presentation of evidence less biased, but they have not yet developed a full complement of rules to link the evidence to their decisions. Unlike the USPSTF, for example, the DERP does not use analytic frameworks. Unlike NICE, the DERP does not use decision analysis or cost-effectiveness analysis to integrate information and examine the consequences of uncertainty about many probabilities and outcomes simultaneously.8 Also unlike NICE, the DERP hasnt explicitly considered the role of observational studies in complementing what can be learned from efficacy trials.9 But because they encounter them in their work, Steinberg and Luce enumerate issues as a "to do" list that systematic reviewers and DERP participants will recognize.
Remove bias from judgments about "fatal flaws."
There is no consensus on how to determine whether a studys flaws are so egregious that the study should be excluded from further consideration. The empirical data about the impact of flaws in study design or execution are incomplete, which makes overzealous application of systems to rate internal validity hazardous. As Steinberg and Luce note, the first step taken by the Agency for Healthcare Research and Quality (AHRQ) to address the problem was to commission a report from the RTI International/University of North Carolina (UNC) Evidence-based Practice Center that summarized the strengths and weaknesses of existing systems.10 The only logical solutionfunding and conducting better research to examine the consequences of study flawswill take time. In the meantime, decisions about excluding studies because of poor quality should be as explicit as possible, making it possible for individual skeptics to assess for themselves whether the specific flaws cited by the reviewers justify the low rating.
Assess external validity.
In the context of group decision making, physicians and others find it hard to take account of conflicting evidence, uncertainty, information gaps, and relevance to "real" practice. Too often, for example, participants in an evidence-based decision-making process pay too little attention to the relevance (external validity) of efficacy studies. But is it really the fault of our reviews? Or is the problem with the decision-makers themselves?
In fact, evidence reports bring out the "reality gap" between the actual evidence ("what there is") and the key questions the review sought to answer ("what clinicians and patients want and need to know"). Because of the growth of evidence-based medicine, research studies are now expected to provide detailed information about the methods used to recruit and screen potential subjects; eligibility criteria for the study and how their application affected the sample; and whether run-in periods or other techniques were used to weed out subjects who may be noncompliant, unlikely to respond, or susceptible to adverse effects from the intervention under study. Systematic reviewers recognize the need to emphasize effectiveness studies over efficacy studies and to highlight discrepancies between everyday practice and the conditions of efficacy trials.
For example, our systematic review of osteoporosis screening pointed out that the major trials of bisphosphonates excluded women who have dyspepsia or gastroesophageal reflux disease (GERD), a group that includes the majority of postmenopausal women to whom the trial results have been applied. Another review noted that some trials of drugs to delay symptoms of Alzheimers disease measured scores on a five-item memory test but not whether the patient could find his or her own way home from a bus stop.
Despite the attention paid, it is not at all clear that decisionmakers will begin to value information about relevance or applicability as much as they should. Instead of placing the burden on systematic reviews to enable decisionmakers to judge the relevance of efficacy trials, the answer may be to increase the number of trials that are conducted in everyday practice settings, making relevant, practice-based effectiveness research the norm rather than the exception.
Set criteria for grading a body of evidence and an overall service.
Decision-makers must make value judgments about the importance of differences among drugs in their efficacy and safety. When there are few good-quality studies, or when the studies do not measure some of the important outcomes, decisionmakers must consider the gaps in the evidence. They also must take into account whether gaps in the research information base overwhelm what can be discerned from published studies. As Steinberg and Luce note, evidence reports themselves do not provide sufficient guidance to make these calls. Systematic reviews define the limits of the evidence, clarifying when the assertions about the value of the intervention are based on strong evidence from clinical studies. They do not tell one what to do. In addition to evidence, decisions should be influenced by other factors, including consideration of justice and equity; the value society places on different health outcomes; and the potential for unrecognized harms.
The most notable effort under way is the Grades of Recommendations, Assessment, Development, and Evaluation (GRADE) consortium, an initiative to make it easier for users of systematic reviews to assess the judgments behind recommendations.11 The steps in the GRADE approach are to consider (1) the quality of evidence across studies for each important outcome; (2) which outcomes are critical to a decision; (3) the overall quality of evidence across these critical outcomes; (4) the balance between benefits and harms; and (5) the strength of recommendations.
For the past three years, GRADE participants have examined what factors should be considered at each of these steps. The result is a more systematic, comprehensive approach to making these judgments. For example, to assess the overall quality across studies for a particular outcome, the GRADE system considers the strength (magnitude) of the association, the type of studies, and whether they have serious flawsas well as other factors than might lower quality, such as inconsistency of results within and between study types, directness, the probability of reporting bias, evidence of a dose-response gradient, and whether plausible confounders would tend to reduce or augment the observed effect. The last stepassigning a grade reflecting the strength of the recommendationrequires consideration of trade-offs, the quality of evidence for critical outcomes, translation of evidence into practice for a specified clinical application, and uncertainty about baseline risk. The strength of the GRADE approach is that for the first time, it makes explicit the role of important considerations in decision making other than the strength of the evidence and the magnitude of effect. GRADE also has the great advantage of being closely aligned with the Cochrane Collaboration: As GRADE identifies approaches that facilitate greater transparency in making judgments, it will encourage the Cochrane review groups to incorporate these ways of summarizing evidence. For example, GRADE is helping the collaboration design a template for a balance sheet to summarize benefits and harms that will be incorporated into Cochrane reviews.
Expected utility.
Steinberg and Luce also argue persuasively that the strength of evidence required for a positive recommendation "should depend on the consequences of drawing a wrong conclusion." This concept follows directly from expected utility theory, which underlies decision analysis and several other prescriptions for "rational" decision making.
Although I agree completely with Steinberg and Luce that the threshold for evidence should depend on the expected value of the benefits and of the harms, I do not think that evidence-based medicine will be the deciding factor in its wider adoption, because many physicians are simply reluctant to make decisions in this manner. For example, the USPSTF rejected this type of reasoning when it decided to recommend against taking vitamin supplements to reduce the risk of breast cancer. The evidence is from observational studies and translates, at best, to a 50 percent chance that vitamin A supplements reduce the risk of cancer by 30 percent and a 50 percent chance that there is no benefit at all. On the other hand, large trials provide very strong evidence that, in the doses provided, supplemental vitamin A is not harmful. Therefore, the expected net benefit of using supplemental vitamin E to prevent breast cancer must be positive: A 50 percent chance of a 30 percent reduction translates into an overall 15 percent reduction in breast cancer incidence, and the net benefit (benefits minus harms) is positive because we are certain there is no harm. But the task force members were reluctant to recommend a practice when there is a 50 percent chance that it will be overturned by new evidence, and they rejected this line of reasoning.
We are still a long way from perfection in our efforts to implement an explicit, defensible evidence-based decision-making process. One has to start somewhere, though, and systematic reviews are a great starting point. Inserting systematic reviews into a deliberative process calls immediate attention to the type, quality, and quantity of evidence supporting assertions about effectiveness and harms. They can improve the quality of dialog about clinical and policy interventions and force decisionmakers to confront tough questions about evidence and decisions. Their real future, though, can be achieved only by meeting the challenges these questions present and translating the answers into a broader, more robust practice of decision making. Inevitably, if states stay with systematic reviews, they will want to invest in the "R&D" of evidence-based decision making and develop consistent, defensible strategies for using the evidence.
Mark Helfand (helfand{at}ohsu.edu) is a staff physician at the Portland Veterans Affairs (VA) Medical Center and a professor of medicine and director of the Oregon Evidence-based Practice Center, Oregon Health and Science University, all in Portland.
The author acknowledges John Santa for helpful suggestions. No outside funding was used in the preparation of this paper.
- E.P. Steinberg and B.R. Luce, "Evidence based? Caveat Emptor!" Health Affairs 24, no. 1 (2005): 8092.[Abstract/Free Full Text]
- See, for example, W.A. Rogers, "Evidence based Medicine and Justice: A Framework for Looking at the Impact of EBM upon Vulnerable or Disadvantaged Groups," Journal of Medical Ethics 30, no. 2 (2004): 141145.[Abstract/Free Full Text]
- D. Fox, "Evidence of Evidence-based Health Policy: The Politics of Systematic Reviews in Coverage Decisions," Health Affairs 24, no. 1 (2005): 114122.[Abstract/Free Full Text]
- See J.R. Feussner and L.J. White, "The Clinical Efficacy Assessment Program of the American College of Physicians," Annals of the New York Academy of Science 703 (1993): 268271, which describes the process in 1993.[CrossRef][Web of Science][Medline]
- See R. Harris et al., "Current Methods of the U.S. Preventive Services Task Force: A Review of the Process," American Journal of Preventive Medicine 20, no. 3 Supp. (2001): 2134, available online at www.ahrq.gov/clinic/ajpmsuppl/harris1.htm (26 October 2004).
- Fox, "Evidence of Evidence-based Health Policy."
- Harris et al., "Current Methods."
- J. Mason et al., "A Framework for Incorporating Cost-Effectiveness in Evidence-based Clinical Practice Guidelines," Health Policy 47, no. 1 (1999): 3752.[CrossRef][Web of Science][Medline]
- M.F. Drummond, "Experimental versus Observational Data in the Economic Evaluation of Pharmaceuticals," Medical Decision Making 18, no. 2 Supp. (1998): S1218.[Free Full Text]
- S. West et al., Systems to Rate the Strength of Scientific Evidence, Evidence Report/Technology Assessment no. 47, prepared by the RTI InternationalUniversity of North Carolina Evidence-based Practice Center under Contract no. 290-97-0011, Pub. no. 02-E016 (Rockville, Md.: Agency for Healthcare Research and Quality, April 2002).
- See Grades of Recommendations, Assessment, Development, and Evaluation (GRADE) Working Group, "Grading Quality of Evidence and Strength of Recommendations," British Medical Journal 328, no. 7454 (2004): 14901494[Abstract/Free Full Text]. For a separate grading initiative led by Family Practice journals, see M.H. Ebell et al., "Strength of Recommendation Taxonomy (SORT): A Patient-centered Approach to Grading Evidence in the Medical Literature," American Family Physician 69, no. 3 (2004): 548556.[Web of Science][Medline]

What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. M Cohen, K. Ambert, and M. McDonagh
Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update
JAMIA,
September 1, 2009;
16(5):
690 - 704.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. B. Petitti, S. M. Teutsch, M. B. Barton, G. F. Sawaya, J. K. Ockene, T. DeWitt, and on behalf of the U.S. Preventive Services Task For
Update on the Methods of the U.S. Preventive Services Task Force: Insufficient Evidence
Ann Intern Med,
February 3, 2009;
150(3):
199 - 205.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. J. Neumann
Emerging Lessons From The Drug Effectiveness Review Project
Health Aff.,
July 1, 2006;
25(4):
W262 - W271.
[Abstract]
[Full Text]
[PDF]
|
 |
|
|