Web
Exclusive: Written by Michael Stuart and Sheri Ann Strite and reprinted
here with permission.
Pitfalls to
performance measurement
When drawing
conclusions from data, watch out for conflicting measures
Story originally
published January 23, 2006
Regulators, payers,
patients and others understandably want to determine which healthcare
providers deliver the highest quality of care. Interest in performance
measurement has increased since such groups such as the Institute
of Medicine, the Center for the Evaluative Clinical Sciences at
Dartmouth, the CMS, and the Rand Corp. have reported estimates that
20% to 50% of all prescriptions, visits, procedures and hospitalizations
in the U.S. may be inappropriate. That translates into patient deaths
and injury and the waste of an estimated $400 billion annually.
Clearly there is a
need to improve healthcare, and performance measurement is an increasingly
applied component of quality improvement. Yet there are pitfalls
to performance measurement of which many users are unaware.
We offer an outline
of some important limitations we believe healthcare leaders and
managers should consider as they attempt to improve quality through
the use of performance measures within their organizations. Providers
must distinguish between three main uses of performance measures:
required reporting, quality improvement, and evaluating organizational
or individual performance. Each of these uses requires a different
approach because of the possibility of invalid measures. Providers
must understand the limitations and pitfalls of drawing conclusions
from performance measurement because of the problems resulting from
conflicting measures, which can result in decreased quality, waste
and incorrect conclusions about individual health organizations
or practitioners.
Components
of a measure
A good performance measure consists of a valid numerator and denominator,
frequency of occurrence and the data-gathering process. It is often
expressed as a percentage or a rate. The text statement for a diabetic
patient-care performance measure might be, "The percent of
patients with a diagnosis of diabetes mellitus receiving at least
one hemoglobin A1c annually."
A valid denominator
means that it specifies the right base from which the measurement
will be made. The denominator must have appropriate criteria for
the pool of whom or what is eligible for measurement. For example,
a denominator measuring clinical improvement for cervical Pap smears
that does not exclude women without a cervix would be invalid.
Numerators generally
count events such as something that happens to a patient or something
patients receive - typically, this is an outcome, an intervention,
a service or a process. Examples of numerators are the number of
patients with diabetes mellitus from the denominator (the population
eligible for measurement) who have received an annual hemoglobin
A1c, or the percent of charts available for patient appointments
at an outpatient clinic.
A key point -- often
misunderstood -- is that numerators should be based on valid, useful
and usable scientific evidence. For health status outcomes and interventions,
this requires that we know what interventions are likely to result
in improvement. With rare exceptions, only well-designed and conducted
randomized controlled trials can demonstrate cause and effect relationships,
and only valid and useful information from trials data should be
used for interventions.
Medical leaders, administrators
and others should be aware of the significant potential for selection
bias, observation bias, confounding and the play of chance when
relying on observational data -- and performance measures are one
form of observational data. These threats to validity can confound
performance comparisons between institutions, units and individuals.
Trying to resolve this by adjusting for case mix is analogous to
using models to adjust for patient differences in observational
studies dealing with therapy -- the potential to be misled by confounding
factors remains high. Databases and clinical records are useful
for measuring processes, but are not reliable for attempting to
"prove" that a health status improvement was the result
of an intervention. Database information, observational studies
and opinions of experts can inadvertently mislead.
Unfortunately, because
of a general lack of knowledge of these potential problems, providers
may be required to report outcomes to stakeholders even when the
performance measure or the organization's outcomes lack validity.
Problematic
results
Feedback to a clinician in the form of a performance report may
be of great value as a way of encouraging his or her participation
in quality improvement efforts and focusing attention on improving
processes of care and attention to patients' needs. However, because
of significant validity and reliability problems inherent in observational
data, it is altogether a different issue when an individual provider's
performance is made available to others in the form of a performance
"report card" or when an individual's income is based
on a limited set of performance measures.
Here are several examples
of problems that can result:
-
The wrong denominator:
A colon cancer screening quality improvement project at the
Veterans Administration Hospital in San Francisco resulted in
the facility failing to meet a national target and the hospital
faced financial penalties. However, an audit revealed that 47%
of the patients included in the measure had declined screening,
12% failed to make their appointments for screening, 11% had
chart documentation that screening was not indicated and 42%
of the counted patients received diagnostic testing rather than
screening (i.e., they had signs or symptoms of disease). Thus,
the conclusion that the hospital was failing to meet national
VA benchmarks was incorrect.
-
The wrong numerator:
Some groups recommend routine screening of all newborns for
hearing problems during postpartum hospitalization-this is even
required by law in many states. There is, however, insufficient
evidence to conclude that such testing leads to improved speech
and language skills at 3 years of age. It is also unclear from
the best available evidence if potential benefits outweigh potential
harms of false-positive tests. Unfortunately, many stakeholders
demand performance data on this measure. When looking at internal
quality projects it would be better to select a more valid measure.
-
Problems in judging
the quality of a clinician: A physician may take appropriate
actions to improve quality of care, but because of patient factors,
systems factors or small sample size, the physician's performance
may not result in clinical improvement.
Examining the use of
profiling family physicians for glycemic control in their diabetic
patients is instructive. It has been reported that in a typical
family practice, only 4% or less of variance in hospitalization
rates, visit rates, lab utilization and glycemic control in diabetics
can be attributed to differences in physician practice patterns.
For profiles of glycemic control, outlier physicians could dramatically
improve their profiles by pruning their panels of as few as one
to three patients with the highest HbA1cs levels. This gaming of
the system could not be prevented by case-mix adjustment.
Leaders and managers
should remember that conclusions about individuals -- and even organizations
-- based on performance measures alone should be drawn with great
care. A key issue is how much of the outcome is due to selection
bias, sample size, and other factors such as how much of the outcome
is really under the control of the clinician or the health system
and how much is not.
Performance measurement
is an important component of quality improvement efforts in healthcare.
However, if performance measures are not designed and used correctly,
they are rendered statistically or clinically meaningless -- a waste
of resources and a threat to quality. To bring clinicians on board
with quality efforts and to improve patient safety and outcomes,
we must ensure that each measure captures the data intended and
is used appropriately.
Michael Stuart is president
of the Delfini Group and clinical assistant professor in the Department
of Family Medicine, University of Washington School of Medicine,
Seattle. Sheri Ann Strite is a principal and managing partner in
the Delfini Group, Portland, Ore. |