This Article
* Abstract
* Submit a response to this article
Services
* E-mail this article to a friend
* Alert me to new issues of the journal

I N T E R V I E W
W E N N B E R G & M U L L A N
W E B E X C L U S I V E
7 October 2004
Wrestling With Variation:
An Interview With Jack Wennberg

The creator of modern-day evaluative clinical sciences discusses
what motivated him to define and pursue this area of study.


By Fitzhugh Mullan


ABSTRACT:

For thirty years Jack Wennberg has studied variations in medical practice, from rates of tonsillectomy in Vermont villages in the 1970s to the cost of dying in the nation’s major medical centers today. Along the way he has spawned the field of clinical evaluative science, created the Dartmouth Atlas of Health Care, stimulated the creation of a new federal agency (the Agency for Healthcare Research and Quality), and challenged many presumptions about what constitutes good medical care. In this interview with Fitzhugh Mullan, he reflects on health care reform and how to change clinical practice.

Fitzhugh Mullan: You can fairly be credited with being both the Christopher Columbus and the Johnny Appleseed of clinical variation—you discovered it, and you have worked hard to bring it to the attention of the medical and health policy communities. But when you started your work, the common presumption was that doctors practiced in “usual and customary” ways, which were quite well standardized. The term “variation” as you came to use it was really not part of the medical vocabulary. The first time that you wrote about variations in health services, I believe, was the paper you published with Alan Gittelsohn in Science in 1973. How did you get started studying variations?

John Wennberg: The early work was done when I was at the University of Vermont in the early 1970s as the director of the state Regional Medical Program (RMP). I had just finished my medical residency at Johns Hopkins, where I’d trained as an internist with a specialty in renal disease. But I had also taken an MPH [master of public health degree] and started on a doctorate in sociology. The RMP had a large budget with quite a vague set of goals having to do with controlling heart disease, cancer, and stroke. So since I’d been trained in epidemiology and interested in social systems, it was a fairly natural thing for me to want to develop a system for measuring the performance of the system, particularly since our goal was to regionalize care, in an effort to better combat heart disease, cancer, and stroke. We set up a data system and developed a strategy for measuring resource inputs to market areas in an effort to correlate resource inputs with utilization. We wanted to measure outcomes, but mortality was about the only thing immediately available to us.

The whole process was made feasible by Kerr White’s work. Kerr had been at the University of Vermont prior to coming to Hopkins, where I was fortunate to study with him. Kerr had persuaded most Vermont hospitals to join a hospital discharge abstract system called the Physician’s Activity Study in Vermont. For every hospitalization, it generated information on the patient’s diagnoses, surgical procedures, age, sex, and place of residence. For hospitals that didn’t belong to the data service, we sent RMP staff into their record rooms to make our own abstracts. We thus obtained information on virtually all hospitalizations of Vermont residents. We also sent our staff into all nursing homes and home health agencies to obtain similar data. I was fortunate to be able to get the Medicare Part B database by simply going to the Blue Cross offices in Concord. We then began to examine Vermont’s health care system from a population-based, epidemiologic perspective. Some people say that we invented the concept of medical care epidemiology in this process.

Mullan: What led you to your observations on variations?

Wennberg: We had a very extensive database, which allowed us to look not only at the acute sector, but at the ambulatory care sector, the private care sector, and nursing homes. We divided Vermont into local hospital service areas based on how patients in each Vermont town used the system. For each “market” we could view the variation topic as a systems problem; were there trade-offs between use of hospitals, nursing homes, and home health agencies? The short answer was no. And we were able to develop measures to quantify the physician workforce allocated to each market—labor input is what we called it. We knew the number of internists, pediatricians, surgeons, and GPs [general practitioners] in various areas per 1,000 residents and could correlate those numbers with the number of hospitalizations and procedures; we could do the same thing for the numbers of hospital and nursing home beds. What we found was wide variation in resource input, utilization of services, and expenditures among neighboring communities and a strong association between resource inputs and use rates. Inevitably, we were led to the question: Were more doctors and more procedures actually producing better outcomes in terms of mortality?

Mullan: Did you have a sense that substantial variation was out there, or did you discover it as you went along?

Wennberg: I think it’s probably the latter, in the sense that there was virtually no literature to suggest the importance of supply in determining use rates. We really didn’t know what to expect. I went to Vermont believing in the general paradigm that science was advancing and that it was being translated rationally into effective care. At that time, economists and sociologists as well as patients and doctors believed in the concept that the physician was competent to act as the purchasing agent for the patient—that delegating decision making to the doctor led to wise choices on behalf of the patient. Public policy was also framed around the belief that the physician also acted as an agent for society so that when resources were used to capacity, society could confidently respond by increasing capacity to meet medical need as defined by the doctors. We could thus rely on the “agency” of the doctor for the well-being of the patient and the system. The central tendency of the market was rational.

I had read enough sociology and was aware of the overt and covert functions within systems that I came to the RMP work armed with some skepticism about human behavior. Having read that literature, I was prepared for interpreting what we found. But I don’t think I went into it with the expectation that we would find a such a marked variation in medical practice. Variation, as it turned out, was everywhere. For instance, we lived between Stowe and Waterbury. My kids went to the Waterbury school system ten miles down the road. But if we had lived about a hundred yards north, they would have gone to the Stowe school system. In Stowe 70 percent of the kids had their tonsils out by the time they were fifteen years old, as opposed to only 20 percent in Waterbury.

Mullan: How did your documentation of variation link to your subsequent observations about supplier-induced demand?

Wennberg: All of our data suggested supplier-induced demand. The rates of surgical specialty activity were strongly associated with the presence of the respective types of surgeons. More internists were associated with more diagnostic tests and physician visits. And then there was that epidemic of tonsillectomies near my home. It was pretty easy to conclude, from the ground level, that supplier-induced demand was strongly operative.

Mullan: Why did you choose to publish this work in Science?

Wennberg: I didn’t choose it. We tried the conventional medical journals and received form-letter rejections. This still happens. Generally, we don’t bring good news. Science was the journal of last resort, but we were delighted to get the paper accepted.

Mullan: What sort of effect did the publication have?

Wennberg: The response was muted initially but gained ground over time. The paper was really very important for me because it described a set of problems that have occupied me ever since: What are the causes of unwarranted variation, of variation that cannot be explained on the basis of illness, patient preferences, or dictates of scientific medicine? What are the consequences? When is more better? When is there too much care and the attendant likelihood of iatrogenic illness? Under what normative standards should variations in health care delivery be evaluated?

Mullan: In 1982 you published “Variations in Medical Care among Small Areas” in Scientific American, in which you recapitulated many of these themes. In that piece you also raised the notion of patient preference—the informed consumer. These ideas were a departure from the conventional medical thinking of the time. What led you to them?

Wennberg: I left Vermont in 1973 and went to Harvard. John Bunker, Benjamin Barnes, and Fred Mosteller had organized a yearlong seminar to examine surgical practices. The group included economists, decision theorists, epidemiologists, biostatisticians, and clinicians all sitting around, hassling, trying to figure out what was going on in surgical practice. The variation issue struck home for these people. Duncan Neuhauser wrote a paper on hernia operations in which he presented evidence that patients’ preferences were really quite variable. That was the first time that I’d seen patient preferences as a strong issue.

Mullan: You moved to Dartmouth in 1980, and from that time on your work was closely associated with Maine and with prostate disease. How did that come about?

Wennberg: In 1973 Dan Hanley, who was editor of the Maine Medical Journal, published three articles by Alan Gittelsohn and me that showed that Maine suffered as much as Vermont from wide variations in care. But Dan was interested in more than simply publishing the data. With financial support from the Commonwealth Fund, he established a program in Maine to organize physicians to respond to practice variations. The program was organized around what we called the three-step process: (1) Look for an obvious explanation involving “bad behavior” by physicians. (2) If this doesn’t resolve the variation, assess the literature and see whether or not it is possible to resolve the uncertainties and conflicts among the different clinical camps with existing information. (3) If this doesn’t work, undertake outcomes research.

The group that really took off was the urologists. We had recorded striking variations in surgery for benign prostatic hypertrophy (BPH), a noncancerous enlargement of the prostate. In some parts of Maine, 60 percent of men had their prostates removed by age eighty; in other parts less than 20 percent did. As we went through the three-step process, it quickly became evident that basic facts regarding the outcomes of care for BPH were missing. But even more surprising, the physicians themselves weren’t all on the same page when it came to the reasons for doing surgery in the first place. After a prolonged and sometimes heated debate, it became apparent that there were two schools of thinking about why surgery was indicated. Most worked under the hypothesis that BPH surgery was required to make people live longer—that early surgery prevented development of bladder obstruction, kidney failure, and premature death in later years. But a minority thought that the natural history of untreated BPH was for most men quite benign—the risks associated with early surgery were not paid back by a significant gain in life expectancy. For them, the reason for doing surgery was to improve the quality of life by reducing urinary tract symptoms.

At this time, Al Mulley, Mike Barry, Jack Fowler, and I had been meeting to develop a strategy for conducting outcomes research. The willingness of Hanley’s urologists to join forces with us to undertake the third step in the process, and the generosity of the Hartford Foundation in funding our work for more than a decade, provided the opportunity to put our interdisciplinary strategy into action and to see it through to completion. In brief, the preventive theory was shown to be wrong. The principal reason for doing surgery, it turned out, was to improve the quality of life as it was affected by urinary tract symptoms. But it also became clear that BPH surgery could have a negative impact on another important aspect of the quality of life: sexual functioning. In other words, the decision to undergo prostate surgery involved a significant trade-off between urinary tract health and sexual functioning.

Coming to this understanding was an important milestone for our research team. We now had concrete evidence that for at least one important example of treatment variation, the key to rational choice for patients and for learning how much resources society would have allocated to meet surgical “need” was to overcome the old model of agency or delegated decision making. Medical ethics and good economics require that patients be actively involved in making decisions. In this context, the concept of shared decision making first emerged as the principle remedy for unwarranted variations for what we would later come to call “preference-sensitive” treatments.

This understanding led naturally to the next phase of our work: the development of decision aids to help patients sort out the complexity of treatment choices. We took advantage of interactive video technology that was becoming available at that time to develop a program to inform patients. We then undertook to study its impact on decision making. We learned some very interesting things. First, once patients are informed about what is a stake, most are willing—indeed, anxious—to participate in choice of their own treatment. Second, when patients participate actively, the treatment chosen more closely corresponds to their own values than it does when doctors choose for patients. We found out that patients who were very concerned about the negative impact of BPH surgery on sexual functioning tended to choose watchful waiting rather than BPH surgery, while those who were very concerned about urinary tract symptoms choose the opposite. Third, we began to get some benchmarks that let us know something about the “right rate” for BPH surgery—the rate that happens when patients rather than suppliers determine the rate. This was possible because one of our experimental sites was a staff-model HMO [health maintenance organization] where we could observe the rates of surgery before and after the implementation of shared decision making. Although the prestudy rates were already quite low compared to most places in the United States, once shared decision making was introduced, the rates dropped 40 percent, to a rate that was at the very bottom of the national distribution of BPH surgery at that time. The implication seemed pretty clear to us: The amount of BPH surgery provided in most parts of the United States probably exceeds the amount of surgery that informed patients want.

The Politics Of Research

Mullan: You have worked hard to bring quantitative methods to topics that weren’t recognized at all or thought not to be the stuff of hard science—practice variations, small-area analysis, patients’ preferences. Yet your thinking seems frequently to challenge the established order, to bring controversy to practices that have been long established and well accepted. Do you see your work as having a political mission?

Wennberg: Since it has so much to say about how health care markets seem to be working, it was relatively easy to attract attention. It really began in 1984. Health Affairs published a theme issue on practice variations, and we held a press conference at the Capitol to publicize it. Someone from the AMA spoke, I spoke, and some members of Congress spoke. I think that planted a seed. Some of the politicians were intrigued by variability. Over the years several hearings were held on the Hill. Bill Gradison [R-OH] in the House and David Durenberger [D-MN] in the Senate became interested and supportive. Dan Hanley of the Maine Medical Assessment Program really recruited George Mitchell [Democratic senator from Maine]. He and I met with Mitchell on more than one occasion and persuaded him that he needed to get involved.

Mullan: What did you have in mind?

Wennberg: The initial idea was an amendment to National Center for Health Services Research (NCHSR) legislation to fund variations research. That was quickly supplanted by the more ambitious legislation, which was enacted, that established a new Public Health Service agency, the Agency for Health Care Policy and Research (AHCPR) in 1990. And at that point, I was pretty active politically, working with Congress and also with the research community. I saw this as a winner for everybody. Our goal was to introduce clinical research into the health services research agenda, which had been dominated pretty much by policy wonks and economists. The legislation called for a new research vehicle, Patient Outcome Research Teams (PORTs) that would carry out medical effectiveness research on specified clinical problems associated with costly practice variation. The concept was built on the model of research we had developed in Maine, which established an interdisciplinary group of good people to “patrol” a clinical problem such as BPH. The goal of the patrolling was to uncover and explicate theory and to apply various analytic tools and methodologies to test that theory. It was also to keep up with innovation by bring new, promising technologies into early clinical trials as soon as possible. A number of PORTs were launched focusing on coronary artery disease, arthritis of the hip and knee, BPH, and low-back pain—conditions for which one of the treatment options involved discretionary surgery. While a good deal of progress was made in clarifying theory and explicating the role of patients’ preferences, the clinical trial networks to develop prospective evaluation on new treatment theories never developed.

Mullan: The birth of AHCPR at the beginning of the 1990s was followed fairly quickly by the health care reform period, during which you were also quite active. As I recall, there was the belief that the evaluative sciences would be at the core of a reformed health system, that outcomes research would be the coin of the new health realm. What happened?

Wennberg: Health care reform turned out to be a disaster. I did get involved because I knew Hillary Clinton through Chick Koop [C. Everett Koop, the former surgeon general]. Chick and I were asked to read the whole reform plan over at the very end, and my task was to be sure that outcomes research and patients’ preferences were adequately present. We also were able to define a special role for practicing physicians in governing medical practice by including a model based on Dan Hanley’s work in Maine. But it all went down. I think it was pretty well doomed all along. Health care reform was like the donkey without a tail. Everybody wanted to put their own on. It just didn’t work.

Mullan: AHCPR has had a mercurial life in the decade since then, including the acquisition of a new name: the Agency for Healthcare Research and Quality (ARHQ). What kind of marks do you give it?

Wennberg: In the mid-1990s the agency suffered the coming together of several bad things all at once—the failure of the Clinton legislation, the “Gingrich revolution” in Congress, and a handful of dissident orthopedists and neurosurgeons. This latter group didn’t like the fact that the PORT under Rick Deyo and Jim Weinstein found that there were some serious problems with back surgery that needed national attention. They went directly to their congressmen, who considered the surgeons’ complaints one more reason for zeroing out a government program. By that time Mitchell, Durenberger, and Gradison had all left Congress. The agency escaped annihilation, but it had to abandon funding for the PORTs and other clinical analyses that sought to improve the scientific basis of medicine by testing conventional medical theories. I think that John Eisenberg saved AHRQ because he was respected and because he picked a safe topic for his agenda:medical errors. I mean, who wants medical errors? Everybody’s against medical errors. Today AHRQ is irrelevant to the problems that I was interested in having it address. It is out of the business of determining the scientific bases of clinical practice. That’s why I believe that we have to establish the concept of clinical outcomes research as a central theme at the NIH [National Institutes of Health].

The Role Of The NIH

Mullan: I interviewed Elias Zerhouni for Health Affairs recently (Web Exclusive, 8 January 2004), and I asked him about the role of the health services research at the NIH. He responded with a comment to the effect that you can do research that looks for new science or you can do research that looks at “the difference between Coke and Pepsi.” He said he didn’t feel that both types of work could be done well at the NIH, and he favored the pursuit of new science. As long as your work is viewed as soft drink sampling, it seems hard to envision it at the NIH.

Wennberg: That’s a mindset that needs to be challenged, especially among people who believe in science. Medical theories need to be tested, and we’re awash in untested theories. Even new technologies that have passed the muster of a clinical trial (and many have not) move into practice in many unevaluated ways. What is done with a new technology once it’s in the market depends on the inventiveness of physicians, and they’re terribly inventive. Untested theories and practices are huge, expensive, and dangerous problems in this country.

Mullan: Why is it so important to move the evaluative sciences onto the NIH campus, and why has the leadership of the biomedical research community been so reluctant to embrace them?

Wennberg: The evaluative sciences are an important part of biomedical science, a view that is not permeating academic medicine sufficiently at this point. The research community doesn’t pay attention to evaluative science, in part because there’s no funding for it. There’s no basic training to speak of. There are no careers. In bench science or clinical investigation, researchers can get funding to work on a problem for decades. Through the PORT concept, we tried to make the same model work for the evaluative sciences. This is a problem that goes way beyond the research community. The employer community, the tax-paying community, and the Centers for Medicare and Medicaid Services (CMS) need to really understand that their cost problems relate, to a large extent, to unevaluated technologies. They need to give political support to a major upgrade for funding for evaluative clinical science. I would advocate moving ARHQ into the NIH, broadening its mandate, and increasing its funding to a billion dollars a year by getting contributions from the CMS and from the insurance industry, so that the agency goes into the NIH without being competitive for existing NIH research.

Mullan: You don’t think there’s inherent hostility toward health services research at the NIH?

Wennberg: I don’t think so. No scientist would be hostile to this kind of work.

Mullan: But they don’t consider it a priority, and some clinicians, as we’ve seen, aren’t eager to have evaluation scientists examining their practices.

Wennberg: I’d beg to differ on that. I think that most clinicians would not take that position. There will always be somebody whose theory is gored by evidence. I mean, that’s just the way life is. In the give-and-take of medicine’s evolution, there are going to be technologies and strategies that work better than others, and some will have to be weaned out. You just can’t live with the old and the new and have a sustainable economy.

The Dartmouth Atlas

Mullan: Tell me about the Dartmouth Atlas of Health Care. Where did the idea come from?

Wennberg: It came from all of our work in small-area analysis, which is quite geographical. We wanted to be able to use our data in a variety of different contexts—regulatory contexts as well as clinical management. It was the Clinton health plan that really motivated us, because we anticipated that it would be built around “Health Care Alliances” that were geographically based health insurance areas. So we approached the Robert Wood Johnson Foundation, and they gave us a large grant to take the Medicare data and to organize a national small-area analysis based on methods similar to those we had used in Vermont. When the Clinton legislation crashed, we had a lot of data but no customer. So we published an atlas. It has turned out to serve a lot of useful functions. It keeps reminding people about fundamental imbalances in health care, of the pervasiveness of supplier influence on utilization. The media love it and use it all the time.

The most important evolution is that over the last four years, we’ve moved beyond variation by geographic area to analyze variation among health care organizations. We can look at cohorts of patients who use one hospital or another and compare the resource allocation between them. We now know the answers to questions that previously could be asked only of staff-model HMOs. How many doctors per thousand are they using? How many hospital beds? What’s their surgery rate? How do they manage chronic illness? How many physician visits do they provide? How many days in the hospital? In intensive care? How much does it cost per capita? So all the variables that have been available at the area level now have an analogue available at the hospital-specific level.

Hospital-specific information is important because it should stimulate action. We hope that a recent article we published in the British Medical Journal will help motivate academic medical centers to get involved in rationalizing their own practice patterns. The variation is truly extraordinary. For example, during the last six months of life, patients who used the NYU [New York University] teaching hospital made, on average, seventy-six visits to physicians, 57 percent saw ten or more physicians, and the average patient spent almost a month in the hospital; patients using UCLA [University of California, Los Angeles] hospital spent, on average, 9.2 days in intensive care, patients made forty-four visits on average to physicians, and 51 percent saw ten or more physicians. Contrast this to the experience of patients loyal to UCSF [University of California, San Francisco]: the average patient spent less than twelve days in hospital and only 2.6 days in intensive care; patients made twenty-seven visits to physicians, on average; and only 30 percent saw ten or more physicians.

View Of Personal Success

Mullan: Your writings and teachings over the years have achieved great credibility. Although health care has changed considerably during your career, it has not always changed in the direction that your work might have suggested. Variation in practice is still rampant. The penetration of evaluative science into the research world has been limited. U.S. health care remains expensive and not very efficient. Are you frustrated? What is your take on your own level of success?

Wennberg: Well, I guess it just depends on how long one is willing to wait. I think that these issues will eventually reemerge as mainstream. We keep trying to prod it a little bit. I am hopeful that our recent progress in profiling academic medical centers will ignite the debate once again. The significance of these differences must surely be more than the difference between Coke and Pepsi. They are associated with more than a 2.5-fold difference in per capita costs; moreover, as Elliott Fisher and his colleagues have shown, the overuse of these services seems to be associated with worse outcomes.

Mullan: When you say “reemerge,” that implies that there was a period when they were mainstream?

Wennberg: Well, I think we almost had it at the time of the Clinton health care reform initiative. I think that it will happen again. The system is stubborn because it is, after all, 15 percent of the economy. There are a lot of oars in that water, and information per se is not going to change the fundamental economic incentives.

We’ve been trying to pursue models of reform that would allow the reimbursement system to align itself with the quality agenda. First, we think that in most parts of the country we probably have an excess capacity of both physicians and hospital beds in terms of what’s beneficial to the population. Second, we don’t know what the supply of specialists, particularly surgeons, really should be, if patients were the determinants of demand. We know what the supply is now, and we know that its fully utilized everywhere, no matter how much there is. We also know that the reason for this is because it is physicians’ influence over decision making. But once you open the market to information, as we’ve shown in several clinical trials of decision aides, demand drops among informed patients. We have identified three types of unwanted variation in the system—the underuse of effective care, the misuse of preference-sensitive care, and the overuse of supply-sensitive care. Lurking behind variations in patterns of care are often huge investments in expensive technologies that hospitals have made that are directly tied to the economic stability of those institutions. We have proposed the establishment of a Comprehensive Centers of Medical Excellence program in which medical centers would partner with Medicare, AHRQ, and the NIH to develop methods to deal with unwanted variation in the system. We very pleased to see our proposal become law as part of last year’s Medicare Reform Bill.

The Inevitability Of Reform

Mullan: A number of times in the mid-1990s I heard (as you perhaps did) the late Eli Ginzberg fret about the future. “We have made it to a one trillion dollar health care system,” he would say, “but I really don’t think the economy can sustain a two trillion dollar system. Something will have to give.” Eli is no longer with us, but we are moving rapidly toward his two-trillion-dollar Armageddon. What do you think? Is there some dollar figure or percentage of gross domestic product that will bring disaster or reform?

Wennberg: I don’t know about the number, but I think the trend is pretty clear. Employers are basically giving up. They’re trying to shift health costs to their employees, or they’re not providing anything at all. Once the employers give up on employer-based health insurance, the demand will grow to add health insurance to the national tax agenda, which all other civilized countries seem to do. That’s when I think reform will happen. It’s happening already. Many health plans now offer “tiered products” in which patients have poorer and poorer coverage—a kind of skin game. But I think there will be only so much skin that people will put into this game before they try to change the rules.

Mullan: Those new rules would provide for less variation and better quality of care at a cost that is sustainable.

Wennberg: That would be my hope.

Jack Wennberg (john.wennberg{at}dartmouth.edu) directs the Center for the Evaluative Clinical Sciences at Dartmouth Medical School in Hanover, New Hampshire. Fitzhugh Mullan (fmullan{at}projecthope.org) is a pediatrician, writer, and former director of the Bureau of Health Professions in the U.S. Department of Health and Human Services. He is a contributing editor of Health Affairs and author of Big Doctoring in America: Profiles in Primary Care (University of California Press and Milbank Memorial Fund, 2002).

DOI: 10.1377/hlthaff.var.73
©2004 Project HOPE–The People-to-People Health Foundation, Inc.