| Web
Exclusive: Written by Michael Stuart and Sheri Ann Strite
and reprinted here with permission.
Pitfalls
to performance measurement
When
drawing conclusions from data, watch out for conflicting measures
Story
originally published January 23, 2006
Regulators,
payers, patients and others understandably want to determine
which healthcare providers deliver the highest quality of
care. Interest in performance measurement has increased since
such groups such as the Institute of Medicine, the Center
for the Evaluative Clinical Sciences at Dartmouth, the CMS,
and the Rand Corp. have reported estimates that 20% to 50%
of all prescriptions, visits, procedures and hospitalizations
in the U.S. may be inappropriate. That translates into patient
deaths and injury and the waste of an estimated $400 billion
annually.
Clearly
there is a need to improve healthcare, and performance measurement
is an increasingly applied component of quality improvement.
Yet there are pitfalls to performance measurement of which
many users are unaware.
We
offer an outline of some important limitations we believe
healthcare leaders and managers should consider as they attempt
to improve quality through the use of performance measures
within their organizations. Providers must distinguish between
three main uses of performance measures: required reporting,
quality improvement, and evaluating organizational or individual
performance. Each of these uses requires a different approach
because of the possibility of invalid measures. Providers
must understand the limitations and pitfalls of drawing conclusions
from performance measurement because of the problems resulting
from conflicting measures, which can result in decreased quality,
waste and incorrect conclusions about individual health organizations
or practitioners.
Components
of a measure
A good performance measure consists of a valid numerator and
denominator, frequency of occurrence and the data-gathering
process. It is often expressed as a percentage or a rate.
The text statement for a diabetic patient-care performance
measure might be, "The percent of patients with a diagnosis
of diabetes mellitus receiving at least one hemoglobin A1c
annually."
A
valid denominator means that it specifies the right base from
which the measurement will be made. The denominator must have
appropriate criteria for the pool of whom or what is eligible
for measurement. For example, a denominator measuring clinical
improvement for cervical Pap smears that does not exclude
women without a cervix would be invalid.
Numerators
generally count events such as something that happens to a
patient or something patients receive - typically, this is
an outcome, an intervention, a service or a process. Examples
of numerators are the number of patients with diabetes mellitus
from the denominator (the population eligible for measurement)
who have received an annual hemoglobin A1c, or the percent
of charts available for patient appointments at an outpatient
clinic.
A key point -- often misunderstood -- is that numerators should
be based on valid, useful and usable scientific evidence.
For health status outcomes and interventions, this requires
that we know what interventions are likely to result in improvement.
With rare exceptions, only well-designed and conducted randomized
controlled trials can demonstrate cause and effect relationships,
and only valid and useful information from trials data should
be used for interventions.
Medical leaders, administrators and others should be aware
of the significant potential for selection bias, observation
bias, confounding and the play of chance when relying on observational
data -- and performance measures are one form of observational
data. These threats to validity can confound performance comparisons
between institutions, units and individuals. Trying to resolve
this by adjusting for case mix is analogous to using models
to adjust for patient differences in observational studies
dealing with therapy -- the potential to be misled by confounding
factors remains high. Databases and clinical records are useful
for measuring processes, but are not reliable for attempting
to "prove" that a health status improvement was
the result of an intervention. Database information, observational
studies and opinions of experts can inadvertently mislead.
Unfortunately,
because of a general lack of knowledge of these potential
problems, providers may be required to report outcomes to
stakeholders even when the performance measure or the organization's
outcomes lack validity.
Problematic
results
Feedback to a clinician in the form of a performance report
may be of great value as a way of encouraging his or her participation
in quality improvement efforts and focusing attention on improving
processes of care and attention to patients' needs. However,
because of significant validity and reliability problems inherent
in observational data, it is altogether a different issue
when an individual provider's performance is made available
to others in the form of a performance "report card"
or when an individual's income is based on a limited set of
performance measures.
Here are several examples of problems that can result:
-
The wrong denominator: A colon cancer screening quality
improvement project at the Veterans Administration Hospital
in San Francisco resulted in the facility failing to meet
a national target and the hospital faced financial penalties.
However, an audit revealed that 47% of the patients included
in the measure had declined screening, 12% failed to make
their appointments for screening, 11% had chart documentation
that screening was not indicated and 42% of the counted
patients received diagnostic testing rather than screening
(i.e., they had signs or symptoms of disease). Thus, the
conclusion that the hospital was failing to meet national
VA benchmarks was incorrect.
-
The
wrong numerator: Some groups recommend routine screening
of all newborns for hearing problems during postpartum
hospitalization-this is even required by law in many states.
There is, however, insufficient evidence to conclude that
such testing leads to improved speech and language skills
at 3 years of age. It is also unclear from the best available
evidence if potential benefits outweigh potential harms
of false-positive tests. Unfortunately, many stakeholders
demand performance data on this measure. When looking
at internal quality projects it would be better to select
a more valid measure.
-
Problems
in judging the quality of a clinician: A physician may
take appropriate actions to improve quality of care, but
because of patient factors, systems factors or small sample
size, the physician's performance may not result in clinical
improvement.
Examining
the use of profiling family physicians for glycemic control
in their diabetic patients is instructive. It has been reported
that in a typical family practice, only 4% or less of variance
in hospitalization rates, visit rates, lab utilization and
glycemic control in diabetics can be attributed to differences
in physician practice patterns. For profiles of glycemic control,
outlier physicians could dramatically improve their profiles
by pruning their panels of as few as one to three patients
with the highest HbA1cs levels. This gaming of the system
could not be prevented by case-mix adjustment.
Leaders
and managers should remember that conclusions about individuals
-- and even organizations -- based on performance measures
alone should be drawn with great care. A key issue is how
much of the outcome is due to selection bias, sample size,
and other factors such as how much of the outcome is really
under the control of the clinician or the health system and
how much is not.
Performance
measurement is an important component of quality improvement
efforts in healthcare. However, if performance measures are
not designed and used correctly, they are rendered statistically
or clinically meaningless -- a waste of resources and a threat
to quality. To bring clinicians on board with quality efforts
and to improve patient safety and outcomes, we must ensure
that each measure captures the data intended and is used appropriately.
Michael
Stuart is president of the Delfini Group and clinical assistant
professor in the Department of Family Medicine, University
of Washington School of Medicine, Seattle. Sheri Ann Strite
is a principal and managing partner in the Delfini Group,
Portland, Ore. |