Health Affairs, 24, no. 1 (2005): 174-179
doi: 10.1377/hlthaff.24.1.174
© 2005 by Project HOPE
 
New Online
 * Getting Health Reform Done
 * After the State of the Union
 * Incremental Reform
 * E-Health in Developing World
 * Most-Read Articles in 2009
This Article
* Abstract Freely available
* Reprint (PDF)
* Submit a response to this article
* Alert me when this article is cited
* Alert me when Comments are posted
* Alert me if a correction is posted
Services
* E-mail this article to a friend
* Similar articles in this journal
* Similar articles in Web of Science
* Similar articles in PubMed
* Alert me to new issues of the journal
* Add to My Personal Archive
* Download to Citation Manager
*Reprints & Permissions
Citing Articles
* Citing Articles via HighWire
* Citing Articles via Web of Science (16)
* Citing Articles via Google Scholar
Google Scholar
* Articles by Garber, A. M.
* Search for Related Content
PubMed
* PubMed Citation
* Articles by Garber, A. M.
Related Collections
* Health Reform
* Physicians
* Quality Of Care
* Evidence-Based Medicine

Implementing Evidence

PERSPECTIVE

Evidence-Based Guidelines As a Foundation For Performance Incentives

Alan M. Garber

   Abstract
 
Clinical guidelines, which increasingly build upon impartial analysis of evidence from well-designed studies, have become highly credible sources of information about what forms of care are effective. Consequently, they are attractive as foundations for performance incentives. Unfortunately, they are often complex, and frequently it is infeasible to gather the information required to assess compliance with guidelines at reasonable cost. I discuss the problems in implementing evidence-based guidelines and steps that could be taken to make them more useful as a basis for performance measurement.


It is no longer controversial to assert that rewards for meeting quality or performance criteria can change medical practice. Employers and health plans are committed to "pay-for-performance" initiatives, and the Medicare Prescription Drug, Improvement, and Modernization Act (MMA) of 2003 contains major provisions to develop financial incentives to improve hospital quality.1 Several years of experience confirm that the challenges of implementing incentives to promote better-quality care are not easily surmounted.2 Arguably, the biggest obstacle has been the mismatch between the size of the incentives and the investment needed to measure, then meet, quality targets. But a larger payout is not all that is needed. Changes in program design and implementation will also be necessary. Improvements in the ways that evidence-based clinical guidelines are adapted for use in performance measures would be an important step in that direction.

Clinical guidelines would not have such an important role in the design of performance incentives if it were easy to reward providers for improvements in the outcomes of the care they give. Outcome-based performance measures have obvious appeal: If a doctor or hospital achieves lower rates of complications, faster recoveries, and lower mortality, does it matter how? However, the practical barriers to their broader use are daunting. David Eddy writes:

When the main health outcomes for an important condition are infrequent, delayed, weakly controllable, or heavily confounded, blind adherence to outcomes will produce inaccurate results...A poorly designed outcome measure can easily do harm than good. The solution is to use more process measures.3

Since outcome measures are unlikely to replace process measures anytime soon, it is imperative to ensure that the process measures used are valid indicators of health care quality. Evidence-based clinical guidelines, which have the credibility that comes from a combination of impartiality and deep clinical and scientific expertise, are attractive in this regard. Many are based on rigorous analyses of available evidence, carrying the imprimatur of specialty societies and other well-regarded expert groups. Furthermore, the best guidelines produced by the American College of Physicians (ACP), the American College of Cardiologists (ACC), the American Heart Association, the U.S. Preventive Services Task Force, and other prestigious groups, are clear and detailed, lending themselves to adaptation as criteria for the quality of care.

   Uneasy Application Of Guidelines To Performance Measurement
 Top
 Uneasy Application Of Guidelines...
 Bridging The Gap Between...
 Editor's Notes
 NOTES
 
Despite their virtues—and the likelihood that no readily available alternative has a stronger scientific foundation—evidence-based guidelines are not an ideal platform for performance incentives. To be used effectively, of course, a high-quality guideline must exist. The number of conditions for which guidelines are available is steadily increasing, but many patients receive care for conditions or combinations of conditions for which suitable guidelines are not available.4 Furthermore, evidence-based clinical guidelines are rarely written with performance incentive programs in mind. They are intended to improve clinical care by describing a set of actions that physicians should consider when managing patients with specific health conditions, and sometimes to influence financing and other aspects of policy that could affect guideline implementation.

Flexibility. Most well-accepted guidelines, especially those produced by specialty societies, leave considerable discretion to the treating physician. The authors anticipate the objections that practicing physicians might raise, and most guidelines acknowledge numerous circumstances in which exceptions to their general recommendations are justified. To gain the approval of physicians, many of the recommendations incorporated in clinical guidelines are vague enough to permit flexibility ("choose the medication based on the patient’s medical history, preferences, and ability to comply with follow-up care"). Others trade vagueness for detail and complexity by explicitly describing clinical nuances and many modifying considerations.

Flexibility, whether obtained by vagueness or complexity, makes it difficult to determine whether the care a person received was consistent with the recommendations. Guidelines that address controversial practices, such as prostate cancer screening, commonly state that individual preferences are critical to determining which patients should receive the intervention. Without interviewing each patient directly, it can be impossible to determine whether the care complied with the guideline.5 Even when the guideline itself is unambiguous, costly chart review is usually necessary to determine whether it was followed. Few institutions or physician practices have information systems that enable them to extract detailed clinical data, comparable to the contents of a paper chart. And even a paper chart can lack important detail. Electronic health information has been a cornerstone of nearly all plans for quality improvement in hospitals and doctors’ offices for many years, but the effective deployment of electronic medical records and related advances has proceeded remarkably slowly. Until providers are able to use powerful information systems to improve the accuracy and drive down the cost, the translation of detailed evidence-based guidelines into performance measures will proceed slowly and imperfectly.6

Accountability. Accountability is one of the biggest challenges to quality improvement and measurement. From a payer’s or purchaser’s point of view, when all of a patient’s care is provided by an integrated system or group, such as Kaiser Permanente or a Department of Veterans Affairs (VA) facility, assigning responsibility for care is straightforward.7 That is, it is possible to identify who should receive a performance incentive, even if the group itself must develop internal mechanisms to ensure that the performance standards are met. The challenges are greater in a typical fee-for-service or preferred provider organization (PPO) environment. Even the conceptual basis for assigning responsibility is unclear when a patient is treated by multiple physicians, some of whom the patient selects without the concurrence or even knowledge of the others. An adult with diabetes mellitus could receive care regularly from an internist, cardiologist, ophthalmologist, and podiatrist, each of whom could adjust medications and share in the monitoring of disease complications and the side effects of treatment. Who is responsible if the patient fails to receive a recommended treatment such as an angiotensin-converting enzyme (ACE) inhibitor? If the patient requires a toe amputation that should have been preventable, which of several physicians and nurses caring for the patient should be considered responsible? To what degree does the patient bear responsibility? Accountability for health outcomes remains a challenge in nearly every setting, but it is most severe for the majority of Americans who do not receive care from an integrated group. Guidelines have offered little help. Few discuss who should be responsible for each aspect of care.8

Limits in the evidence base. Limits in the evidence base available to guideline authors have repercussions for those guidelines’ application to performance measurement. Randomized controlled trials, the most respected source of information for clinical guidelines, are conducted in relatively small and often unrepresentative populations. To make recommendations that apply to most of the patients that doctors see in their offices, guideline authors nearly always extrapolate to groups that were not adequately represented in the trials. Do estimates of the efficacy of mammography at diagnosing early-stage breast cancer in middle-aged women apply to younger populations that were not included in the large studies? Do results of tests for colon cancer in high-risk patients apply to low-risk groups? In the act of translating rigorous evidence into specific recommendations for patient care, judgment must be applied, and different experts can reach different conclusions from the same evidence.

Strong and weak evidence. The best guidelines clearly distinguish between recommendations that are strongly supported by direct evidence and those that depend on extrapolation or judgment. But it is not always clear how to use such information. Performance measures that are based only on the most rigorous and direct evidence often apply to a small, narrowly defined patient population. Sometimes performance incentives will be superfluous for that population, because so many of them already receive appropriate care. Performance measures based on more speculative recommendations will greatly expand the scope of care that can be assessed, but they risk sacrificing validity.

Costs and quality standards. Finally, nearly all U.S. evidence-based guidelines and performance incentive systems share an expressed goal of improving quality of care. Yet "quality of care" can be interpreted in different ways. Most modern guidelines are designed to promote care that has proved effective while discouraging practices that are of unknown effectiveness or ineffective. But beyond basic agreement that high-quality care excludes ineffective or harmful practices, there are fundamental, frequently unstated differences in interpretation. Most importantly, some guidelines consider the value, or cost-effectiveness, of care, rather than effectiveness alone. The distinction is crucial. It is all but a cliché to describe the rate at which beta-blockers and ACE inhibitors are prescribed for heart failure patients as a key measure of the quality of care. These inexpensive drugs are highly effective. The left ventricular assist device (LVAD) is also an effective treatment for patients with severe congestive heart failure, but few if any guidelines recommend LVAD placement, and the rate of LVAD placement does not appear in performance or quality-of-care measures.9 The LVAD is a relatively new technology, but the tremendous cost of keeping a patient alive with the device has undoubtedly dampened any enthusiasm for building performance measures around its use.

Few if any guidelines explicitly state that they are intended to maximize quality of care without regard to cost. Indeed, advocates for quality improvement efforts, including performance incentive programs, often claim that quality can be improved while lowering cost. But many, if not most, medical interventions that improve health outcomes increase spending. Explicit statements that cost-effectiveness (producing the best outcomes for a given cost) is the basis for including a specific recommendation in a guideline remain uncommon, at least in the United States. Absent such an explicit statement of goals, it is difficult to know whether and how performance improvement programs, which themselves are often silent about cost-effectiveness, should reward providers who offer care that is effective but not cost-effective. For those programs that are implicitly designed to reward value, evidence-based guidelines that ignore cost help with only part of the task.

Deficits in state-of-the-art guidelines. Many of these issues are manifest in state-of-the-art guidelines that may advance patient care but whose translation into performance measures poses enormous challenges. For example, guidelines recently issued by the ACC and the American Heart Association concerning the management of ST-segment elevation myocardial infarction, a common form of heart attack, are admirably comprehensive and detailed. Furthermore, like other sophisticated guidelines, they clearly indicate the quality of evidence supporting each recommendation. But the guideline document is more than 200 published pages long, has 1,400 references, and requires clinical information that would be difficult to acquire without in-depth chart review.10 Many other guidelines are much simpler yet also require information that is not readily available from electronic sources—for example, the recent ACP guidelines for the evaluation of patients with chronic stable angina are a model of precision and balance between simplicity and accuracy.11

   Bridging The Gap Between Guidelines And Performance Incentives
 Top
 Uneasy Application Of Guidelines...
 Bridging The Gap Between...
 Editor's Notes
 NOTES
 
Guidelines such as these are designed to guide patient care, not to provide a framework for rewarding better clinical performance. Some of the problems in translating them into performance measures will diminish over time, especially when better information systems are ubiquitous. Until then, how can authors and expert groups produce guidelines that better support performance incentives?

Tie guidelines to strong incentives. Authors of guidelines should first recognize that employers and health plans trying to implement performance incentives are an important audience for their efforts. As these authors know, when guidelines are tied to strong incentives, they are much more likely to be used. Thus, their efforts will have greater impact if they produce guidelines that can be readily adapted for performance measurement. The communities that develop evidence-based clinical guidelines and those that develop performance incentive systems are largely distinct, and better communication between them will facilitate the translation of guidelines into performance measures.

Use data from diverse settings. To help make their guidelines translatable, authors should learn more about the kinds of data that are available—and unavailable—in diverse health care settings. Everything from specialized questionnaires to chart review to administrative or population-level data has been used to assess the quality of health care. Guideline authors, especially those expert in evaluating clinical evidence, are well aware that critical information needed to guide therapy is often missing from the literature. They could readily appreciate that the data used in published studies are more detailed than what would typically be available in administrative databases and the limited clinical data used for performance measurement. By applying their knowledge of the evidence base underlying the guidelines and their skills at interpreting the data, they could offer useful advice to the performance-incentive community. They could help determine, for example, whether the data available in many performance measurement efforts are simply too limited to enable conclusions about the quality of care. They could also identify areas in which quality is more easily judged or in which targeted data collection could, at reasonable cost, provide much information about quality.

Match magnitude of incentives with importance of performance criteria. Even in the best-designed programs, not all performance criteria are equally important. Ideally, pay-for-performance programs would reserve large rewards for meeting quality standards that lead to large benefits, while providing smaller incentives for meeting performance criteria that correspond to small health benefits.12 For example, delays in administering appropriate therapy for a heart attack or stroke greatly increase mortality and disability; the incentives to treat these conditions quickly and appropriately should thus be larger than the incentives for other practices that should be encouraged but produce smaller benefits, such as avoiding the use of ineffective antibiotics for treatment of the common cold.13 Some programs respond to the challenge by assigning greater weight to the more important criteria when calculating an overall performance score. Typically, however, there is only a loose relationship between the magnitude of the incentive and the importance of the specific performance measure.

Improving the match between the incentive and the benefit, of course, may require much analytic effort and detailed information. But some of the work is already done during guideline development. Particularly when they address complex problems or detailed management strategies, evidence-based guidelines could help by offering information about the benefit corresponding to each specific recommendation they make, or at least by assigning a priority to each recommendation. Many guidelines already contain qualitative statements about the extent of benefits. Furthermore, the evidence review that leads to state-of-the-art guidelines is a source of quantitative estimates of benefit.

Consider guideline authors’ opinions. Sometimes a treatment may seem to be beneficial, but convincing studies have not been completed. What should be done about the large gray areas, in which there is substantial but not compelling evidence of benefit? Frequently, for example, multiple observational studies, all of them subject to potential bias, have shown that a particular clinical strategy is associated with better outcomes. Some experts will conclude that such information is always insufficient to support a recommendation for the strategy. But some observational studies are more convincing than others. The opinions of the guideline authors, who know the evidence and clinical context well, would be very informative. Some of the recommendations might be strong candidates as performance measures if, for example, there were a consensus that the recommendation was almost certain to lead to better outcomes despite the lack of compelling studies.

Be explicit about guideline goals. Finally, both the authors of guidelines and those who design incentive payments need to be explicit about what they are trying to achieve. Are they trying to promote effective care, not considering costs, or high-value, cost-effective care? Do they want to improve patient satisfaction, or are they primarily concerned with mortality, functional status, and physical discomfort? The goals of guideline authors will not always be identical to those of performance incentive systems, but guidelines can still be useful if they are explicit about the rationale for and consequences of following each recommendation.

None of these steps is a substitute for better data and thoughtfully applied incentives. However, efforts to promote better-quality care are gaining momentum as large employers and policymakers have come to view poor quality as a crisis for U.S. health care. They will not wait for health care providers to put better information systems in place. As the validity and reliability of performance measures improve, it will become appropriate to increase the magnitude of the incentives. With better underlying performance measures and stronger incentives, we are likely to accelerate our progress toward a health care system that rewards, and produces, better health.

   Editor's Notes
 Top
 Uneasy Application Of Guidelines...
 Bridging The Gap Between...
 Editor's Notes
 NOTES
 
Alan Garber (garber{at}stanford.edu) is a staff physician at the Department of Veterans Affairs (VA) Palo Alto Health Care System and associate director of the VA Center for Health Care Evaluation. Garber is the Henry J. Kaiser Jr. Professor at Stanford University, where he is also a professor of medicine, economics, and health research and policy. He is the founding director of the Center for Health Policy and the Center for Primary Care and Outcomes Research at Stanford’s School of Medicine and research associate and director of the Health Care Program, National Bureau of Economic Research.

This work is supported in part by the Homer Laughlin Endowment and by an Investigator Award in Health Policy Research from the Robert Wood Johnson Foundation. The author has benefited from comments by and conversations with Jay Bhattacharya, Douglas Owens, Sara Singer, Kathy McDonald, Arnold Milstein, and Robert Wachter, none of whom should be assumed to endorse the views expressed here.

   NOTES
 Top
 Uneasy Application Of Guidelines...
 Bridging The Gap Between...
 Editor's Notes
 NOTES
 

  1. See M.B. Rosenthal et al., "Paying for Quality: Providers’ Incentives for Quality Improvement," Health Affairs 23, no. 2 (2004): 127–141.[Abstract/Free Full Text] The CMS has an ongoing pay-for-performance demonstration program in addition to the new features added by MMA. MMA includes demonstration projects that incorporate pay-for-performance concepts, such as the Medicare Health Care Quality Demonstration Program in Section 646, and provisions for an Institute of Medicine study to examine performance measures and determine how to align them with the Medicare program.
  2. See R.A. Dudley et al., "Strategies to Support Quality-based Purchasing: A Review of the Evidence," Technical Review 10 (Prepared by the Stanford–University of California, San Francisco, Evidence-based Practice Center under Contract no. 290-02-0017), Pub. no. 04-0057 (Rockville, Md.: AHRQ, July 2004).
  3. D.M. Eddy, "Performance Measurement: Problems and Solutions," Health Affairs 17, no. 4 (1998): 17.[CrossRef][Medline]
  4. AHRQ’s National Guideline Clearinghouse, which includes only guidelines that have been updated in the past five years and that meet a number of other criteria, lists hundreds of evidence-based guidelines, and new guidelines are continually added. See AHRQ, "National Guideline Clearinghouse," 18 October 2004, www.guideline.gov (19 October 2004).
  5. See L.C. Walter et al., "Pitfalls of Converting Practice Guidelines into Quality Measures: Lessons Learned from a VA Performance Measure," Journal of the American Medical Association 291, no. 20 (2004): 2466–2470.[Abstract/Free Full Text]
  6. Some performance incentive programs reward providers directly for adopting electronic medical records and related information systems. It is costly, however, to offer incentives large enough to markedly accelerate the adoption of sophisticated information systems, particularly when any one payer or purchaser accounts for a small fraction of a provider’s patient population.
  7. When the actions and choices of patients themselves affect outcomes, as is true for many chronic diseases, it can be difficult to assign responsibility even when an integrated group provides all of the patient’s medical care. Without perfect risk adjustment, a provider could be rewarded and penalized based on the patients it attracts, rather than the quality of care it delivers.
  8. Guidelines that specify which type of provider should deliver each type of care could easily become lost in controversy. A recommendation that only a gastroenterologist should perform sigmoidoscopy, for example, would take the procedure out of the hands of the general internists and family physicians who often perform it. For some interventions, the financial consequences would be substantial. With so much at stake, such a guideline would be unlikely to gain broad acceptance.
  9. A randomized trial of LVADs demonstrated that the devices increased survival among patients with severe congestive heart failure, although nearly all treated patients died within two years. See E.A. Rose et al., "Long-Term Use of a Left Ventricular Assist Device for End-Stage Heart Failure," New England Journal of Medicine 345, no. 20 (2001): 1435–1443.[Abstract/Free Full Text]
  10. For example, one recommendation states that "permanent ventricular pacing is indicated for persistent second-degree AV block in the His-Purkinje system with bilateral bundle-branch block or third-degree AV block within or below the His-Purkinje system after STEMI." An expert interpretation of the electrocardiogram would be needed to determine whether a patient had this indication. See E.M. Antman et al., ACC/AHA Guidelines for the Management of Patients with ST-Elevation Myocardial Infarction, 2004, www.acc.org/clinical/guidelines/stemi/index.pdf (19 October 2004).
  11. See V. Snow et al., "Evaluation of Primary Care Patients with Chronic Stable Angina: Guidelines from the American College of Physicians," Annals of Internal Medicine 141, no. 1 (2004): 57–64.[Abstract/Free Full Text]
  12. Dudley et al., "Strategies to Support Quality-based Purchasing," reviews the theory underlying the design of performance incentives and implications for incentive size and type.
  13. Inappropriate antibiotic prescribing is a common measure of (poor) quality and is part of the Californiawide pay-for-performance initiative of the Integrated Healthcare Association.


Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati    What's this?


This article has been cited by other articles:


Home page
The Annals of PharmacotherapyHome page
J. Moen, K. Antonov, C. A Larsson, U. Lindblad, J L. G Nilsson, L. Rastam, and L. Ring
Factors Associated with Multiple Medication Use in Different Age Groups
Ann. Pharmacother., December 1, 2009; 43(12): 1978 - 1985.
[Abstract] [Full Text] [PDF]


Home page
Health Informatics JournalHome page
S. L. West, C. Blake, Zhiwen Liu, J. N. McKoy, M. D. Oertel, and T. S. Carey
Reflections on the use of electronic health record data for clinical research
Health Informatics Journal, June 1, 2009; 15(2): 108 - 121.
[Abstract] [PDF]


Home page
J. Am. Soc. Nephrol.Home page
F. Tentori, W. C. Hunt, M. Rohrscheib, M. Zhu, C. A. Stidley, K. Servilla, D. Miskulin, K. B. Meyer, E. J. Bedrick, H. K. Johnson, et al.
Which Targets in Clinical Practice Guidelines Are Associated with Improved Survival in a Large Dialysis Organization?
J. Am. Soc. Nephrol., August 1, 2007; 18(8): 2377 - 2384.
[Abstract] [Full Text] [PDF]


Home page
American Journal of Medical QualityHome page
A. M. Shea, V. DePuy, J. M. Allen, and K. P. Weinfurt
Use and Perceptions of Clinical Practice Guidelines by Internal Medicine Physicians
American Journal of Medical Quality, May 1, 2007; 22(3): 170 - 176.
[Abstract] [PDF]


Home page
Health Aff (Millwood)Home page
J. D. Ketcham, L. C. Baker, and D. MacIsaac
Physician Practice Size And Variations In Treatments And Outcomes: Evidence From Medicare Patients With AMI
Health Aff., January 1, 2007; 26(1): 195 - 205.
[Abstract] [Full Text] [PDF]


Home page
J Oncol PractHome page
A. C. Wolff and C. E. Desch
Clinical Practice Guidelines in Oncology: Translating Evidence Into Practice (and back)
J. Oncol. Pract, November 1, 2005; 1(4): 160 - 161.
[Full Text] [PDF]


Home page
JAMAHome page
C. M. Boyd, J. Darer, C. Boult, L. P. Fried, L. Boult, and A. W. Wu
Clinical Practice Guidelines and Quality of Care for Older Patients With Multiple Comorbid Diseases: Implications for Pay for Performance
JAMA, August 10, 2005; 294(6): 716 - 724.
[Abstract] [Full Text] [PDF]