QUICK SEARCH:   [advanced]
Author:
Keyword(s):
Year:  Vol:  Page: 

   

 

Health Affairs, 24, no. 1 (2005): 80-92
doi: 10.1377/hlthaff.24.1.80
© 2005 by Project HOPE
 
New Online
 * New Issue: China & India
 * Obesity In China
 * Pay Cuts For Medicare Docs
 * Access To Care Woes
This Article
* Abstract Freely available
* Reprint (PDF)
* Submit a response to this article
* Alert me when this article is cited
* Alert me when eLetters are posted
* Alert me if a correction is posted
Services
* E-mail this article to a friend
* Similar articles in this journal
* Similar articles in ISI Web of Science
* Similar articles in PubMed
* Alert me to new issues of the journal
* Add to My Personal Archive
* Download to Citation Manager
*Reprints & Permissions
Citing Articles
* Citing Articles via HighWire
* Citing Articles via ISI Web of Science (26)
* Citing Articles via Google Scholar
Google Scholar
* Articles by Steinberg, E. P.
* Articles by Luce, B. R.
* Search for Related Content
PubMed
* PubMed Citation
* Articles by Steinberg, E. P.
* Articles by Luce, B. R.
Related Collections
* Evidence-Based Medicine
* Health Reform
* Physicians
* Quality Of Care

Evaluating Evidence

Evidence Based? Caveat Emptor!

Earl P. Steinberg and Bryan R. Luce

   Abstract
 
Medical practices, clinical practice guidelines, clinical performance measures and measurements, and a variety of health care–related administrative decisions, such as insurance coverage decisions, are claiming to be "evidence based" with increasing frequency. In this paper we examine the "evidence based" label; discuss how evidence ought to have been assembled, evaluated, and synthesized; and when evidence is sufficient for the "evidence-based" moniker to rightfully apply. We also highlight several considerations other than the strength of evidence that are relevant to several common types of health care–related administrative decisions and that influence the extent to which the resulting decisions are truly evidence based.


If you are doing almost anything related to health care today, being "evidence based" is de rigueur. Even when it is not obligatory to do so, claiming to be "evidence-based" conveys a measure of credibility nowadays that is valuable to have. Thus, it is useful to consider what the term really means in the health care context.

The factors driving interest in evidence-based decision making are essentially the same factors that earlier drove interest in technology assessment and outcomes research. These include (1) recognition that there is much geographic variation in the frequency with which medical and surgical procedures are performed, the way in which patients with a given disease are managed, patient outcomes, and the costs of care, which cannot be explained by differences in patients’ demographic or clinical characteristics; (2) strong evidence that much of the care that is being provided is inappropriate (that is, likely to provide no benefit or to cause more harm than good); (3) indications that many patients are not receiving beneficial services; and (4) continuously rising health care costs. Because the label "evidence based," like the label "low carb," is nearly ubiquitous, it is important to examine it more closely. In this paper we review the methods that should be employed to rate strength of evidence, and we describe the differences involved in evaluating the strength of evidence that emerges from a single study versus a body of evidence, as well as the additional issues that need to be considered when rating the evidence underlying a clinical practice guideline or performance measure. We then describe several considerations other than strength of evidence that may be pertinent to particular health care–related decisions. We conclude by discussing four types of health care–related decisions and the extent to which they deserve to be considered "evidence based."

   Brief Historical Perspective
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
The term "evidence-based medicine" seems to have first appeared in publication in the early 1990s.1 Although described as a "paradigm shift," it really was more of a next stage in the evolution of a focus on critical appraisal of available evidence that had been in progress for years. From 1960, when randomized controlled trials (RCTs) became increasingly common, until the mid-1980s, bodies of evidence were summarized in Review Articles. These were written by clinical experts, lacked a formal critical appraisal of study design methods, and provided a more qualitative than quantitative synthesis of available evidence. Practicing physicians’ care patterns were based largely on their medical training, local custom and opinions, as well as their own clinical experience. This period is sometimes referred to as the period of "eminence-based medicine."

From 1970 through the mid-1980s there was an increasing focus on "technology assessment." During this time, the congressional Office of Technology Assessment and Institute of Medicine emphasized the need for well-designed studies to evaluate technologies meaningfully.2 Very few clinical practice guidelines were published during this period. Beginning in the mid-1980s, evaluative research then moved toward "outcomes research," reflecting an increased tendency to compare strategies for managing clinical problems rather than individual technologies, and a focus on functional status and patient satisfaction in addition to clinical outcomes. Interest in "real-world outcomes" also grew. Stimulated by concerns that payers’ efforts to cut costs would "cut muscle along with fat," professional medical and surgical societies began to issue clinical practice guidelines in an effort to preempt nonclinicians from dictating how medicine would be practiced. Early guidelines tended to be strongly influenced by expert opinion. During the past ten years, methods for identification, critical appraisal and synthesis of published evidence have become more formal, rigorous, quantitative, and sophisticated.3

   How Should The Strength Of Evidence Be Rated?
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
Much progress has been made during the past two decades in the development of sound methods for rating the strength of evidence derived from an individual study, implicit in a body of evidence composed of the results of many studies, and underlying a clinical practice guideline or standard. Although certain methods are shared in each of these three types of activities, additional methodological issues arise as one moves from assessing an individual study, to a body of evidence, to a guideline or standard.

Evaluation of an individual study. Unfortunately, the fact that a report regarding a scientific study has been published in a peer-reviewed journal does not guarantee that the study design was methodologically sound; that the study was well-conducted, even if the study design was methodologically sound; that the analysis of study data was performed correctly; or that the study results were interpreted properly. As a result, before deciding how much credence to give to any study’s findings, it is necessary to critically evaluate the "quality" of the study. Kathleen Lohr and Timothy Carey define study quality as "the extent to which all aspects of a study’s design and conduct can be shown to protect against systematic bias, non-systematic bias, and inferential error."4

The RTI International–University of North Carolina Evidence-based Practice Center (EPC) recently completed a comprehensive review of approaches that have been employed to rate the quality of evidence reported in individual studies.5 The EPC’s findings were striking: 121 different approaches for rating the quality of an individual study were identified, but only 19 of them met the EPC’s a priori standards for such assessments.6

Because different types of study designs have different degrees of susceptibility to bias that can threaten studies’ validity, one of the most important considerations when evaluating the quality of an individual study is the particular study design employed. There is general agreement that the susceptibility to bias is lowest in a well-designed and -executed RCT, and increases in the following order of other types of study designs: nonrandomized controlled trials, prospective or retrospective cohort studies, cross-sectional studies, case control studies, case series and registries, and case reports.7

That said, one cannot determine the quality of an individual study solely on the basis of the type of study design. A poorly designed study that is high in the preceding hierarchy may well be more susceptible to bias than a well-designed study that is lower. One of the authors (Steinberg) and his colleagues, for example, assigned a "methods score" to each of 857 articles related to management of end-stage renal disease based on a critical appraisal of twenty-four aspects of the methods employed in each study.8 The same group also assigned a subjective global rating of methodological quality (excellent, very good, good, fair, poor) to each study. They then examined the statistical association between the type of design employed in the study and each of these two measures of study quality. Although both the average quantitative methods score and the average subjective global rating of study quality decreased monotonically as one descended the hierarchy of study designs described above, there was a broad range of methods scores within any individual category of study-design types, with considerable overlap in scores across study-design categories. A detailed discussion of the many aspects of study design that need to be considered to meaningfully assess the strength of evidence that emerges from a study is beyond the scope of this paper; it is available in clinical epidemiology textbooks and elsewhere.9

One point that should be made here, however, relates to the relative strength of direct versus indirect evidence. Evidence is said to be "direct" if both the use of the treatment and the occurrence of the outcomes are observed in the same study. In contrast, evidence is said to be "indirect" if two or more bodies of evidence are required to relate the use of the treatment to the occurrence of health outcomes.10 Direct evidence is preferred because it does not require any assumptions about the integrity of the links needed to connect the different bodies of evidence. Many, if not most, evidence-based clinical practice guidelines are based on indirect evidence because they rely on a chain of reasoning that is based on several distinct bodies of evidence. We return to this point later.

Evaluation of a body of evidence. When evaluating a health care intervention, one would always like to be able to review the results of more than one study. Obvious benefits of having multiple studies include increased sample size and an increased likelihood of determining whether the results reported in any given study were attributable to chance. It is more difficult to rate the quality of a body of evidence related to a health care intervention than to evaluate the quality of an individual study, however. To rate the strength of evidence that emerges from a group of studies, one has to not only identify all relevant studies and evaluate the quality of each individual study, but also assess the consistency of study results and the heterogeneity of key elements of study design to determine the comparability of studies.

The RTI-UNC EPC also recently completed a comprehensive review of approaches used to grade the strength of evidence that emerges from the entire body of research on a particular topic.11 In this case, the EPC identified forty approaches for rating the strength of an overall body of evidence, only eight of which met standards that the EPC had set for those types of evaluations.12 Key considerations when evaluating a body of evidence include whether the approach used to identify potentially pertinent literature was comprehensive and unbiased, and whether bias was avoided in evaluating, synthesizing, and interpreting available evidence.

Meta-analysis and other statistical approaches that can be used to develop a single estimate of the effectiveness or safety of a particular intervention, or both, have become more sophisticated during the past decade. Approaches for rating the strength of a body of evidence also have become richer, based on consideration of the quality (internal validity of each study), quantity (for example, the number of studies or aggregate sample size), consistency (the extent to which similar findings are reported in studies that employed similar or different study designs), and coherence of evidence (do the findings make sense as a whole?).13 Even so, a synthesis of the results of multiple studies often involves not only quantitative analysis but also substantial subjectivity in the form of judgments regarding the admissibility (that is, is a particular study fatally flawed, and hence to be ignored?), relevance, or importance of individual pieces of evidence.

Development of an evidence-based clinical practice guideline. Evaluating the strength of evidence underlying a clinical practice guideline is even more difficult and complicated than assessing the quality or strength of a body of evidence. One reason for this is that evaluation of the strength of evidence underlying a practice guideline typically requires evaluation of several bodies of evidence, each of which relates to a different "link" in a chain of reasoning that, in total, underlies the practice guideline. In this sense, many clinical practice guidelines are based on chains of direct evidence that, in aggregate, constitute indirect evidence. A second reason is that when there is such a chain of reasoning underlying a guideline, one needs a heuristic to produce a rating of the strength of evidence underlying all of the links in the chain of reasoning in combination.

One could employ many potential approaches to rate the quality of evidence underlying a clinical practice guideline that is based on a chain of reasoning.14 For example, one could use the lowest grade of evidence assigned to any of the links in the chain of reasoning; use the mean or median grade of evidence assigned to each of the links in the chain of evidence; decide what constitutes the most important link in the chain of reasoning and base the strength of evidence underlying the guideline on the strength of evidence underlying that link in the chain; or calculate a weighted average of the grades of evidence assigned to each link in the chain, where the weights are based either on the relative importance of, or the number of patients enrolled in studies related to, each link in the chain of reasoning. Thus, rating the strength of evidence underlying a particular clinical practice guideline is a more complicated exercise than many people realize. These considerations also highlight the fact that it is difficult to produce clinical practice guidelines that are completely evidence based. In fact, opinion often fills in gaps in the evidence base related to a chain of reasoning that underlies a clinical guideline.

A good example of a chain of reasoning consisting of links based on various strengths of evidence and a link based in part on opinion is a clinical practice guideline regarding reuse of hemodialyzers that was issued by the National Kidney Foundation (NKF).15 Some dialysis centers reuse hemodialyzers multiple times, which reduces cost but also can reduce the effectiveness and safety of dialysis. After reviewing all available evidence, an expert panel appointed by the NKF concluded that dialyzers should only be reused if they scored high enough on a particular test related to their structure. There was evidence regarding the best test to employ as well as how that test should be performed, but a recommendation regarding the cutoff value below which the dialyzer should no longer be used was based in part on evidence and in part on opinion. Should such a guideline be labeled as being based on evidence or opinion? If the strength of evidence regarding the best test to employ, compared with how that test should be performed, differs, how should the strength of evidence underlying such a guideline be rated?

   Important Considerations Other Than Strength Of Evidence
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
In addition to the quality or strength of evidence, which tends to be based on assessments of the internal validity of available studies, other issues should be considered when clinical management or policy decisions need to be made.

The first is that the absence of evidence regarding the effectiveness or safety of a particular health care intervention does not mean that the intervention is not safe or effective. Unfortunately, because many medical practices have not been rigorously evaluated, we do not really know what their impacts on effectiveness and safety are. Michael Millenson, citing work by John Williamson, claimed that more than half of all medical treatments, and perhaps as many as 85 percent, have never been validated by clinical trials.16 According to an expert committee of the Institute of Medicine (IOM), only about 4 percent of all services have strong strength of evidence, and more than half have very weak or no evidence.17

A second important consideration is that a rating of the strength of evidence regarding the effectiveness or safety of a particular health care intervention does not in and of itself provide any insight into the magnitude (or importance) of the effectiveness or safety of that technology. For example, there could be strong evidence that a particular health care intervention has a very large impact, or strong evidence that it has a very small impact, on patient outcome. Similarly, although the apparent impact of a technology on patient outcome might be large, the strength of evidence regarding that impact could be strong or weak.

A third important consideration is the relevance of available evidence to either the particular patient or patient population to be cared for, or the particular policy decision that needs to be made. In comparison to the (justifiably) tremendous importance that has been placed on the internal validity of individual studies (that is, the validity of the findings for the population and settings that were studied), it is striking how little importance has been placed on evaluating or considering the external validity of individual studies or an entire body of evidence (that is, the relevance or generalizability of the findings for populations or care settings other than those that were studied). This disparity is illogical. If one were confronted with a randomized or nonrandomized study that showed that a group of patients who received a particular health care intervention had clinically and statistically better outcomes than a group of patients who received no intervention or a placebo, but the two groups of patients were quite different in terms of age, sex, and clinical characteristics, one would not consider that evidence to be particularly strong or high quality. One would consider the evidence to be even weaker or more suspect if, in addition, there were important differences between the physicians who cared for the patients in the two groups. Somehow, however, the same individuals who would dismiss such a study as being seriously if not fatally flawed are willing to apply, or often don’t warn the intended audience about applying, the results of a well-designed study of an intervention evaluated in a narrowly defined group of patients to a much broader group of patients who differ from the study group in clinically meaningful ways.

To determine the external validity (generalizability) of a study’s findings, one needs to consider whether (1) the patients enrolled in the study are similar in terms of demographic (age, sex, race) and clinical characteristics (severity of primary disease, number and types of comorbidities) to those to whom the health care intervention might be applied, and (2) the real-life setting approximates that tested in the research setting (for example, the skill and experience of the practitioner and the quality of nursing and availability of specialized support services).

The importance of these two considerations is reflected in the distinction made between "efficacy" and "effectiveness." A health care intervention is considered to be "efficacious" when there is evidence that the intervention is beneficial when administered by experts in a research setting. Such evidence is typically derived from a controlled study in which a narrowly defined population has been enrolled. In contrast, a health care intervention is considered to be "effective" when there is evidence that the technology is beneficial when it is administered by a representative sample of physicians in routine practice settings to the full spectrum of patients to whom the technology is likely to be provided in real life. André Knottnerus and Geert Jan Dinant have cleverly referred to this conflict between internal and external validity as the need for "medicine-based evidence" as opposed to "evidence-based medicine."18 Lohr and colleagues have highlighted the importance of the same distinction in the realm of health care–related policy decision making: "The fundamental basic science imperative may be to generate information about the efficacy of health care interventions, but the practical realities of policy and economic decisions call for knowledge about effectiveness."19 In our view, a much higher priority needs to be placed on performance of Phase IV (postmarketing) studies of the safety and effectiveness of Food and Drug Administration (FDA)–approved technologies for both labeled and off-label indications.20

The importance of monitoring the impacts of health care interventions as they are used in real life was highlighted recently by a study that demonstrated that the impacts of a particular treatment were quite different (and sometimes lethal) when it was used in patients who differed in clinically meaningful ways from those enrolled in the clinical trials upon which a practice recommendation was based.21 The authors of the editorial that accompanied this study urged that "every effort should be made to define the inclusion criteria for clinical trials as broadly, and the exclusion criteria as narrowly, as possible, so that the findings are relevant to the greatest proportion of patients in clinical practice."22 Just as important, investigators should make a point of highlighting particular patient populations to whom it might be hazardous to apply a study’s findings.

A fourth important consideration is that a judgment regarding how strong the evidence should be when making a particular type of decision should depend on the consequences of drawing a wrong conclusion. One’s willingness to act on comparatively weak evidence may logically be influenced by the anticipated nature and magnitude of the impact of a particular intervention on patient outcome.

There are two types of errors that one can make when drawing conclusions from a study: (1) concluding that there is a difference between two alternatives when, in fact, an observed difference is attributable to chance (Type I error), and (2) concluding there is no difference between two alternatives when, in fact, there is a difference (Type II error). The consequences of these types of errors depend on the types of outcomes that may be affected by the intervention (for example, survival, major disability, or minor symptoms), the potential magnitude of the difference in impact of the alternative interventions on those types of outcomes (effect size), and the patient’s own preferences or value judgments with regard to tradeoffs between potential outcomes (for example, quantity versus quality of life).

The final relevant considerations relate to cost—the cost of intervening, the magnitude of the potential improvement in health outcome relative to the cost of the intervention (cost-effectiveness), and who is paying for the intervention.

   Types Of Decisions
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
One of the reasons that real-world decisions regarding health care are often so difficult to make is that all of the considerations we have just discussed apply: the potential nature and magnitude of the impact of a health care intervention; the best estimate of the probability that those impacts will occur; the uncertainty surrounding that estimate, as reflected in the strength of available evidence regarding safety and efficacy; the uncertainty surrounding that estimate as a result of uncertainty regarding the generalizability of study findings to other patient populations or care settings (efficacy versus effectiveness); the fact that data from studies regarding an intervention’s impacts are not always available; the consequences of being wrong from different people’s or entities’ perspectives; and costs in light of limited resources and potential alternative uses of them. To illustrate how these considerations may take on varying degrees of importance in different decision-making contexts, and the extent to which resulting decisions should be considered "evidence based," we discuss four types of health care decisions/judgments that policymakers and society likely assume are based on evidence.

FDA approval for marketing. For a company to market a drug or device, current law requires the company to demonstrate to the FDA’s satisfaction that the product is "safe and effective." In reality, the FDA requires companies to prove that their products are efficacious and improve net health outcome (that is, their benefits outweigh their harms) when used as described on the product’s label.

The FDA’s focus on efficacy rather than effectiveness is pragmatic. To develop evidence regarding whether a drug or a therapeutic device has a hypothesized effect, companies typically perform RCTs with narrowly defined patient populations with few comorbidities in research settings in which their product is compared with a placebo. When performed properly, such studies provide the best evidence of a biological effect. They also are easier to perform, and less likely to show adverse effects, than studies involving diverse patients and many physicians. Since it is impossible to demonstrate that a product will never cause adverse effects, the FDA requires that a company provide a variety of types of evidence that the risks associated with a product are "acceptable" compared with the demonstrated benefits of the product. For example, many chemotherapies are very toxic, but their side effects are deemed acceptable when considered in light of their benefits.

It is worth noting that although the FDA requires a company to show that its product is efficacious, it does not require a company to show that its product is at least as efficacious as products that are already in use. Given the FDA’s mission, this policy is reasonable, particularly in light of patients’ varied responses to different treatments. If a product "works" and is "safe enough," society probably benefits from having the product available for use, even if it is less efficacious or less safe than other products.

Although the FDA does not permit a company to market a product for an off-label indication, it does not regulate how physicians use a product once it is introduced into commerce. Consequently, many drugs and devices are used in patient populations in whom, and for indications for which, they were never evaluated. The net impact of some of those uses is undoubtedly quite different from those demonstrated in the RCTs that prompted the FDA to approve the product for marketing. The cost associated with performing studies of such off-label uses would be quite high, and the FDA has elected to let the medical profession and the marketplace weed out ineffective off-label uses, instead of requiring companies to study them. We would be much better informed if providers, insurers, and product manufacturers systematically evaluated the effectiveness of labeled and off-label uses. Unfortunately, manufacturers often lack an economic incentive, and providers and insurers typically lack the resources, to perform such studies.

In addition, because the number of patients enrolled in most RCTs is modest and the frequency with which many adverse events occur is low, clinically significant adverse events associated with a product often are not detected before a product is approved for marketing. In instances in which the FDA has lingering concerns about a product’s safety but the product has important benefits, it may require a manufacturer to perform postmarketing surveillance of the product until the product has been used in large numbers of patients. Voluntary reporting of adverse events is far less reliable than systematic postmarketing surveillance.

These considerations highlight an important but often unrecognized fact: namely, that FDA approval of a product for marketing does not in and of itself provide strong evidence that the product is either safe or effective when used in routine clinical practice. For this reason, David Eddy has suggested that even after a technology has been approved by the FDA, the technology should be considered "investigational" if there is not sufficient evidence to enable appropriately trained, motivated, and impartial people to draw conclusions about the magnitudes of the effects of the treatment as it is going to be used in clinical practice, compared with no treatment, on all of the health outcomes they consider important.23 Given the nature of FDA decision making, FDA approval thus is typically necessary but not sufficient to determine whether a technology is investigational in this sense.

Insurance coverage decisions. Just as heterogeneity of patients’ responses and variability in physicians’ technical expertise provide a rationale for the FDA to approve technologies that have been shown to be better than placebo but not better than technologies that are already available, the same considerations provide a rationale for insurance coverage in the same circumstances. Public and private insurers, however, can play an important role in ensuring that evidence exists regarding the effectiveness and safety of health care interventions in routine practice by conditioning coverage on the availability of such evidence for new technologies and by limiting coverage for new technologies to settings comparable to those in which the safety and efficacy of the technologies were established.

Eddy has argued persuasively for "evidence- and outcomes-based criteria" for health insurance coverage—that is, that treatments covered by health insurers should be backed by sufficient evidence to determine their effects on outcomes that people can experience and care about, such as death, pain, suffering, and disability; and that a comparison of the outcomes should show that the treatment is effective, beneficial, and cost-effective.24 The alternative to "evidence- and outcomes-based criteria" for insurance coverage is to cover treatments whose effectiveness and net impact on health outcomes are unproven.

The BlueCross BlueShield Association’s Technology Evaluation Center (TEC) employs many of these concepts when evaluating health care interventions.25 Specifically, TEC determines whether available scientific evidence permits a conclusion concerning the effect of the technology on health outcomes and whether the technology improves net health outcome. In addition, TEC determines whether a technology is at least as beneficial as any established alternatives and whether the improvement in health outcome is likely to be attainable outside the investigational setting. TEC does not make insurance coverage decisions. Rather, individual Blue Cross and Blue Shield plans, as well as other insurers, consider but are not bound by TEC assessments when making coverage decisions. As a result, Blues plans and other insurers sometimes cover technologies that don’t meet TEC criteria and sometimes don’t cover technologies that do meet TEC criteria. Thus, as was true in the case of FDA decision making, the fact that a technology is covered by an insurer does not necessarily mean there is strong evidence that the technology is safe and effective. In addition, the absence of insurance coverage for a technology does not mean that a technology is not both safe and effective.

There are circumstances in which there is a rationale for providing insurance coverage for interventions that Eddy defines as "investigational." One such circumstance involves interventions that have been in use for long periods of time but have not been evaluated in well-designed studies. Although we would like to have information regarding the net health impacts of these interventions, there may be no practical way of obtaining that information. In May 2004, however, the Tennessee state legislature limited coverage under the state Medicaid program (TennCare) to items and services for which there is adequate "empirically-based objective clinical scientific evidence of its safety and effectiveness for the particular use in question."26 While we support such a standard in the case of new services, we believe that more harm than good may be done if nonexperimental evidence is ignored when such a strict standard is being applied to services that have been in use for long periods of time.

A second circumstance in which there is a rationale for providing insurance coverage for interventions that Eddy defines as "investigational" involves situations in which the likelihood of death or severe disability in the absence of treatment is high and there are promising but not definitive data that a treatment could improve a patient’s outcome. In fact, many people might be happy to buy insurance that would cover investigational health care interventions. In a setting of limited resources, particularly with health insurance becoming so expensive, it makes more sense to offer coverage for investigational interventions as an option than as part of a basic insurance policy.

Insurance payment amounts. Once an insurer decides to cover a particular health care intervention, it must decide how much it will pay for the service and how much of that amount will be paid by the insurer and by the insured person. Although a full discussion of this subject is beyond the scope of this paper, we comment on one dimension of it: how much to pay for a technology relative to another technology. In our view, it is hard to rationalize paying more for Technology A than Technology B if the two technologies are functionally equivalent—that is, if their safety and effectiveness are the same. In such a circumstance, instead of covering only the less costly technology, we believe that insurers should cover both technologies but pay the lesser amount for both of them. The Medicare Prescription Drug, Improvement, and Modernization Act (MMA) of 2003, however, includes a provision that precludes the secretary of health and human services (HHS) from making a judgment that two health care interventions are functionally equivalent, thus precluding Medicare from employing the commonsense notion that one should not purchase a product that is more costly than another product that is equivalent to it.

Evaluation of quality of care. Quality of care is being evaluated for an increasing number of purposes with what are said to be "evidence-based" measures. These purposes include to inform the public regarding the quality of care provided by particular providers, promote quality improvement, financially reward providers who deliver higher-quality care, and provide a financial incentive to patients to use providers who deliver higher-quality care. Providers and payers sometimes differ in their views regarding how strong the evidence underlying a measure of the quality of care provided by a health plan, hospital, or physician must be to employ it to compare the performance of different providers, with payers sometimes being willing to employ measures that are based on evidence that providers consider to be less than compelling. Even when a clinical performance measure is based on a guideline with strong underlying evidence, when constructing the measure, one must be careful to consider the impact of alternative technical specifications on the sensitivity and specificity of the measure, as well as the implications of false positives versus false negatives; to ensure that the illness severity of the patient population to whom the quality measure is being applied is similar to that to which the guideline on which it is based applies; to distinguish between use of a technology for diagnosis versus screening; and to account for legitimate differences in patients’ preferences and physicians’ judgment.27 Physicians have good reason to be concerned about whether adequate attention has been paid to these issues during construction of measures used to assess the performance of individual physicians or physician groups.

   Concluding Comments
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
Rigorous methods now exist for identifying, critically appraising, and synthesizing available evidence derived from clinical studies regarding particular health care interventions. Even so, because these methods are not applied consistently and because the results of the syntheses are sometimes not interpreted properly, there is much variation in the validity of health care–related decisions, judgments, and recommendations that claim to be "evidence based." In addition, evidence may be available for some but not all issues related to a decision or recommendation that has to be made, or the evidence that is available may not be directly relevant to the situation to which it is being applied. As a result, physicians, policymakers, and others acting on the basis of judgments, recommendations, or measures labeled as being "evidence based" should not blindly assume that the label truly applies. Additionally, investigators and clinical specialty societies should comment more consistently on the patient populations to whom it may be hazardous to apply a study’s conclusions and should clarify how a study’s conclusions might be adapted for them to be relevant to patient populations or provider groups other than those studied.

   Editor's Notes
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 
Earl Steinberg (esteinberg{at}resolutionhealth.com) is president and chief executive officer of Resolution Health Inc., a company based in San Jose, California, that is focused on reducing health care costs and improving quality of care, and an adjunct professor of medicine and of health policy and management at the Johns Hopkins University in Baltimore, Maryland. Bryan Luce is senior research leader and chairman of the board, MEDTAP International Inc., in Bethesda, Maryland.

The authors gratefully acknowledge the assistance of Kimberly Niebauer in performing literature searches for this paper.

   NOTES
 Top
 Brief Historical Perspective
 How Should The Strength...
 Important Considerations Other...
 Types Of Decisions
 Concluding Comments
 Editor's Notes
 NOTES
 

  1. The term first appeared in G.H. Guyatt, "Evidence-based Medicine," ACP Journal Club 114, no. 2 (1991): A-16. A more substantial paper on the topic appeared shortly thereafter: Evidence-Based Medicine Working Group, "Evidence-based Medicine: A New Approach to Teaching the Practice of Medicine," Journal of the American Medical Association 268, no. 17 (1992): 2420–2425. For at least the prior decade, however, David Eddy and others had been championing efforts to develop evidence-based guidelines and to base insurance coverage decisions on critical appraisal of published evidence.[Free Full Text]
  2. Council on Health Care Technology, Institute of Medicine, Medical Technology Assessment Directory: A Pilot Reference to Organizations, Assessments, and Resources (Washington: National Academies Press, 1988); and Council on Health Care Technology and Divisions of Health Sciences Policy and Health Promotion and Disease Prevention, Assessing Medical Technologies (Washington: National Academies Press, October 1985).
  3. The term "meta-analysis" began to be used to describe quantitative approaches to synthesizing evidence from multiple randomized controlled trials. See, for example, K.A. L’Abbe, A.S. Detsky, and K. O’Rourke, "Meta-Analysis in Clinical Research," Annals of Internal Medicine 107, no. 2 (1987): 224–233.
  4. K.N. Lohr and T.S. Carey, "Assessing ‘Best Evidence’: Issues in Grading the Quality of Studies for Systematic Reviews," Joint Commission Journal on Quality Improvement 25, no. 9 (1999): 470–479.
  5. S. West et al., Systems to Rate the Strength of Scientific Evidence, Evidence Report, Technology Assessment no. 47, Pub. no. 02-E016 (Rockville, Md.: Agency for Healthcare Research and Quality, 2002).
  6. K.N. Lohr, "Rating the Strength of Scientific Evidence: Relevance for Quality Improvement Programs," International Journal for Quality in Health Care 16, no. 1 (2004): 9–18.[Abstract/Free Full Text]
  7. See, for example, S.H. Woolf et al., "Assessing the Clinical Effectiveness of Preventive Maneuvers: Analytic Principles and Systematic Methods in Reviewing Evidence and Developing Clinical Practice Recommendations," Journal of Clinical Epidemiology 43, no. 9 (1990): 891–905[CrossRef][ISI][Medline]; R.P. Harris et al., "Current Methods of the U.S. Preventive Services Task Force: A Review of the Process," American Journal of Preventive Medicine 20, no. 3S (2001): 21–35; and Canadian Task Force on Preventive Health Care, "CTFPHC History/Methodology," 5 August 2003, www.ctfphc.org/ctfphc&methods.htm (14 October 2004).
  8. E.P. Steinberg et al., "Methods Used to Evaluate the Quality of Evidence Underlying the National Kidney Foundation–Dialysis Outcomes Quality Initiative Clinical Practice Guidelines: Description, Findings, and Implications," American Journal of Kidney Diseases 36, no. 1 (2000): 1–11.[ISI][Medline]
  9. See D.M. Eddy, A Manual for Assessing Health Practices and Designing Practice Policies: The Explicit Approach (Philadelphia: American College of Physicians, 1992), for a good overview of the topic.
  10. See D.M. Eddy, "Investigational Treatments: How Strict Should We Be?" Journal of the American Medical Association 278, no. 3 (1997): 179–185[CrossRef][ISI][Medline]; and C.D. Mulrow and K.N. Lohr, "Proof and Policy from Medical Research Evidence," Journal of Health Politics, Policy and Law 26, no. 2 (2001): 249–266.[Abstract]
  11. West et al., Systems to Rate the Strength of Scientific Evidence.
  12. Lohr, "Rating the Strength of Scientific Evidence."
  13. See ibid.; and Eddy, "Investigational Treatments."
  14. See Steinberg et al., "Methods Used to Evaluate the Quality of Evidence"; and GRADE Working Group, "Grading Quality of Evidence and Strength of Recommendations," British Medical Journal 328, no. 7454 (2004): 1490–1494.[Abstract/Free Full Text]
  15. National Kidney Foundation, "DOQI Clinical Practice Guidelines for Hemodialysis Adequacy and Peritoneal Dialysis Adequacy," American Journal of Kidney Diseases 30, Supp. 2 (1997): S1–S136.[Medline]
  16. M.M. Millenson, Beyond the Managed Care Backlash: Medicine in the Information Age, PPI Policy Report no. 1 (Washington: Progressive Policy Institute, 1997); and J.W. Williamson, Assessing and Improving Health Care Outcomes: The Health Accounting Approach to Quality Assurance (Cambridge, Md.: Ballinger Publishing Co., 1978).
  17. M.J. Field and K.N. Lohr, eds., Guidelines for Clinical Practice: From Development to Use (Washington: National Academies Press, 1992).
  18. J.A. Knottnerus and G.J. Dinant, "Medicine based Evidence, a Prerequisite for Evidence based Medicine," British Medical Journal 315, no. 7116 (1997): 1109–1110.[Free Full Text]
  19. K.N. Lohr, K. Eleazer, and J. Mauskopf, "Health Policy Issues and Applications for Evidence-based Medicine and Clinical Practice Guidelines," Health Policy 46 (1998): 1–19.[CrossRef][ISI][Medline]
  20. See S.R. Tunis, D.B. Stryer, and C.M. Clancy, "Practical Clinical Trials: Increasing the Value of Clinical Research for Decision Making in Clinical and Health Policy," Journal of the American Medical Association 290, no. 12 (2003): 1624–1632.[Abstract/Free Full Text]
  21. D.N. Juurlink et al., "Rates of Hyperkalemia after Publication of the Randomized Aldactone Evaluation Study," New England Journal of Medicine 351, no. 6 (2004): 543–551.[Abstract/Free Full Text]
  22. J.J.V. McMurray and E. O’Meara, "Treatment of Heart Failure with Spironolactone—Trial and Tribulation," New England Journal of Medicine 351, no. 6 (2004): 526–528.[Free Full Text]
  23. Eddy, "Investigational Treatments."
  24. Ibid.
  25. See Blue Cross Blue Shield Association, "TEC Assessments," 2004, www.bcbs.com/tec/tecprocess.html (14 October 2004).
  26. A. Schneider, "Tennessee’s New ‘Medically Necessary’ Standard: Uncovering the Insured?" Pub. no. 7139, 30 July 2004, www.kff.org/medicaid/7139.cfm (14 October 2004).
  27. See L.C. Walter et al., "Pitfalls of Converting Practice Guidelines into Quality Measure: Lessons Learned from a VA Performance Measure," Journal of the American Medical Association 291, no. 20 (2004): 2466–2470.[Abstract/Free Full Text]


Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati    What's this?


This article has been cited by other articles:


Home page
American Journal of Medical QualityHome page
A. M. Shea, V. DePuy, J. M. Allen, and K. P. Weinfurt
Use and Perceptions of Clinical Practice Guidelines by Internal Medicine Physicians
American Journal of Medical Quality, May 1, 2007; 22(3): 170 - 176.
[Abstract] [PDF]


Home page
LSHSSHome page
N. B. Ratner
Evidence-Based Practice: An Examination of Its Ramifications for the Practice of Speech-Language Pathology
Lang Speech Hear Serv Sch, October 1, 2006; 37(4): 257 - 267.
[Abstract] [Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
P. J. Neumann
Emerging Lessons From The Drug Effectiveness Review Project
Health Aff., July 1, 2006; 25(4): W262 - W271.
[Abstract] [Full Text] [PDF]


Home page
Eval Health ProfHome page
L. W. Green and R. E. Glasgow
Evaluating the relevance, generalization, and applicability of research: issues in external validation and translation methodology.
Eval Health Prof, March 1, 2006; 29(1): 126 - 153.
[Abstract] [PDF]


Home page
J Am Board Fam MedHome page
J. Abramson and B. Starfield
The Effect of Conflict of Interest on Biomedical Research and Clinical Practice Guidelines: Can We Trust the Evidence in Evidence-Based Medicine?
J Am Board Fam Med, September 1, 2005; 18(5): 414 - 418.
[Full Text] [PDF]


Home page
Am J Health Syst PharmHome page
D. Young
Policymakers, experts review evidence-based medicine
Am. J. Health Syst. Pharm., February 15, 2005; 62(4): 342 - 343.
[Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
M. Helfand
Using Evidence Reports: Progress And Challenges In Evidence-Based Decision Making
Health Aff., January 1, 2005; 24(1): 123 - 127.
[Abstract] [Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
D. Mendelson and T. V. Carino
Evidence-Based Medicine In The United States--De Rigueur Or Dream Deferred?
Health Aff., January 1, 2005; 24(1): 133 - 136.
[Abstract] [Full Text] [PDF]


Home page
Health Aff (Millwood)Home page
P. J. Neumann, N. Divi, M. T. Beinfeld, B.-S. Levine, P. S. Keenan, E. F. Halpern, and G. S. Gazelle
Medicare's National Coverage Decisions, 1999-2003: Quality Of Evidence And Review Times
Health Aff., January 1, 2005; 24(1): 243 - 254.
[Abstract] [Full Text] [PDF]



Home | Current Issue | Archives | Topic Collections | Search | Blog | Subscribe | Contact Us | Help

© 2001-2005 Project HOPE–The People-to-People Organization
Terms and Policies