|
| New!
Announcing the
Delfini Summer Seminar
July 28 & 29, 2010—Portland OR
Also see DelfiniClick™
for a collection of more in depth commentaries.
See here for freely available tools.
About PMID
Numbers: We frequently utilize a PMID number in place
of a citation. Where PMID numbers are available, enter that
number into the PubMed search box to retrieve that citation
and listing.
See below
for —
|
|
Delfini
on Evidence-based Medicine (EBM)
From Delfini's
perspective, evidence-based medicine is all about effective
use of science in health care.
Delfini's
Definition of Evidence-based Medicine
"Evidence-based
medicine is the use of the scientific method and application
of valid and useful science to inform health care provision,
practice, evaluation and decisions."
The use of science
is required to help reduce medical uncertainty, increase predictability
and inform about the probability of benefit or harm to whom.
There are many
factors that do — and should — guide medical decision
making. We advocate starting with reviewing the available science
and only using valid studies with useful outcomes. Once we are
informed about the science, we should take other factors, such
as cost, the patient perspective (e.g., benefits, harms, alternatives,
costs and uncertainties), clinician and patient satisfaction,
and other triangulation issues (e.g., regulatory issues, PR,
medical community impacts, marketing issues, medical-legal issues,
issues of purchasers, liability and risk management, cost, community
standards, accreditors, press, overall impact on the health
care organization, etc.), into account.
We believe it is
fine to make decisions on factors other than science. Our position
is simply that we should know the science first — and
not confuse opinion or other factors with science.
Evaluate the science
first — then you can "throw it over the wall"
into the realm of other decision considerations (the other triangulation
issues).
Practicing an
evidence-based approach means to us -
Delfini's
Hallmarks of Evidence-based Medicine
1) When seeking
information on a topic, a systematic search is conducted for
science and science-based information using evidence-based
searching and filtering techniques.
2) All sources
of information to guide medical decision-making are critically
appraised, using science-based principles, for validity and
usefulness.
3) Any conclusions
drawn from the science are carefully crafted to be as valid
as possible.
4) Methods used
and reporting are transparent so that the work can be evaluated
for quality, replicated and updated.
5) Clinical information
sources are updated when significant new information becomes
available and such information is periodically sought.
We then apply evidence-based
medicine to clinical quality improvement. To learn more about
the Delfini approach to clinical quality improvement & value,
read about the Delfini Evidence- and Value-based Clinical Quality
Improvement Model here.
More on Evidence-based
Medicine for consumers and their clinicians. |
|
The Need for Critical Appraisal Skills
We believe that
everyone involved in health care decision-making that affects
patients needs basic critical appraisal skills. Why?
- Poor and misleading
information is published even in the best medical journals.
And there are no reliable resources to meet even half of your
health care information needs. Read about the problems here.
- You want to
be able to know immediately whether health care information
you are getting comes from a study design that makes it impossible
to draw cause and effect conclusions.
- You want to
be able to quickly categorize the many studies you see or
hear about into "potentially useful and worthy of a closer
look" — or a waste of time.
- You want to
understand why some sources are "trustable" and
others are not.
- And even if
there was more trustable information available, we believe
you would still need basic critical appraisal skills as a
core competency, if you are involved in decisions affecting
patient care.
- Even with
trusted sources, you would still get "had" by
poor and misleading information without these skills.
- Bad information
will continue to be published.
- Thinking "case
series" and experientially is intuitive; scientific
thinking is not — you would still continue to apply
"case series" thinking.
- You would
continue to get misleading information from your patients,
your colleagues, the "experts," manufacturers,
the news and "urban legends."
- You need to
be able to communicate about science and why some treatments
are better than others — this is not only for communicating
with your patients, but also friends, family, reporters
and lawyers — not to mention your colleagues and others.
- Critical
appraisal skills are important for your professionalism
and useful in life generally.
It does not have
to be that hard. It makes medicine more interesting —
and can even be fun. Not to mention, improve care, help patients
and help prevent waste of resources — which can result
in harms. Read below for more EBM tips. |
|
The
5 "A"s of Evidence-based Medicine
Evidence-based medicine is important to us to answer the following
questions:

Here are
the EBM process steps at a high level:

Modified
by Delfini Group, LLC (www.delfini.org) from Leung GM. Evidence-based
practice revisited. Asia Pac J Public Health. 2001;13(2):116-21
.............
Anatomy of an Article - Delfini-style

.............
The Hunt for Usable Evidence

.............
The Steps
in Critical Appraisal
At
a high level, there are three for all primary studies:
- What
is the study design? Important for seeking valid and useful
evidence.
- What
is the validity of the individual study?
- What
are the results? Is this going to be useful, usable information?
You want
"usable evidence."
"Usable
evidence" = Validity + Effectiveness + Appropriateness
+ Usability.

.............
Filtering
for Strength of Study Design (Evidence Grading)
"Best
Available" evidence requires valid (meaning "probably
true") and clinically useful evidence. It is not "evidence"
until it has passed a rigorous critical appraisal test.

.............
Quick EBM Tip: How to Tell the Difference
Between an Experiment and an Observation Study
Practically
speaking, most health care professionals do not need know the
difference between a cohort study or a case/control study. However,
we all need to be able to distinguish an experiment from an
observational study so that we know the right study design to
answer our clinical question. The above graphic shows what kinds
of questions are to be answered by what kind of study design.
For therapy, screening and prevention, we always want to use
valid and useful information resulting from experiments, such
as randomized controlled trials.
The
quick tip. Did the patient or his or her physician CHOOSE
the treatment? If yes, this is an observational study,
and we are simply observing what happens naturally. If the treatment
was assigned, this is an experiment.
Reminder
that cause and effect conclusions cannot be drawn from observations,
but only experiments. Observational studies can only show associations
— not all associations are cause and effect.
.............
The Problems
with Case Series (to name but a few...)
Case
series are reports of interventions with NO COMPARISON group.
They can be useful for generating ideas for studies, but they
can be very harmful if applied as evidence! Unfortunately, physicians
and others are often misled by their results!
This issue is so important we have the text below available
as a [PDF]. Hand
one out to everyone you know!
| Problems
with Case Series |
Definition |
A
group of patients receives an intervention and outcomes
are assessed. There is no appropriate comparison group.
(Historical controls are sometimes used, but this is not
an appropriate comparison group.) |
Key
Points |
Case
series is not evidence – unless you have “all-or-none
results” which is rare — and there is still
potential for bias. Case series can be useful for hypothesis
generation. |
Key
Problem:
There is no comparison group. |
Patients
frequently improve after a medical visit, but outcomes might
otherwise be the same whether a treatment is administered
or not (see Observational Bias below). Comparisons reveal
associations by exposing differences. Lack of a comparison
group can make it appear as if there is an association between
an intervention and an outcome when, in fact, there is not. |
Conclusions |
Case
series can be useful in describing a clinical condition
or to generate ideas for study. However, because of the
above mentioned biases, case series can almost never
be relied upon to draw conclusions between interventions
and outcomes. Rarely, conditions where morbidity or mortality
is nearly 100 percent and, with the intervention, is decreased
dramatically, case series may be sufficient to draw conclusions
about the effect of the intervention on outcomes –
but it has to be emphasized that this is extremely
rare. |
| |
Bias
will always be present in case series.
- Selection
bias will always be present in case series.
- There
are usually no criteria for patient selection.
- Frequently
cases are not consecutively selected.
- Clinicians
usually report on those patients with the best outcomes.
-
Observation bias will always be present in case series.
- When
there is no blinding, clinician beliefs in or hopes
for an intervention can affect outcomes –
resulting in performance bias.
- Assessment
bias often occurs because lack of consecutively
selected patients can result in selective reporting
favoring the intervention.
- Key
Point —» Patients frequently
improve after a medical visit, but outcomes might otherwise
be the same whether a treatment is administered or not
(see above). Therefore, without a comparison group,
almost any intervention will appear to be beneficial
and attributed to medical care when, in fact, improvement
may be due to --
a) The self-limited nature of the condition,
b) Placebo effect,
c) Regression-to-the-mean – meaning that extreme
test values are statistically likely to move to an average
over time. When patients present with extreme values
and then seem to have improvement, it may be falsely
attributed to an intervention. A comparison group with
no intervention can help expose this effect. OR
d) Coincidence (chance).
- There
are reporting problems resulting from case series.
- Due
to publication bias, negative results are almost
never reported (the reporting of which would still
present its own problems since a negative-finding
case series would be highly prone to bias for the
above reasons).
- Authors
of case series frequently compare their results
to those of other case series. There is always the
possibility that the authors will select case series
for comparison that show their results in the best
light.
|
The Problems with Narrative Reviews
(aka Overviews)
Narrative
reviews can be very biased and misleading. Here is why. [PDF]
is also available for circulation to people who can be helped
by this.
| Problems
with Narrative Reviews (aka Overviews) |
| Definition |
Narrative reviews are “evidence-round ups” on
specific health care topics – but ones which do not
necessarily follow systematic evidence-based criteria. Systematic
reviews (including meta-analyses) can be contrasted with
narrative reviews – or overviews – in that they
generally follow a specific set of evidence-based criteria
(and yet, still require critical appraisal to determine
if they have met these standards.) |
| Key
Points |
Narrative reviews often do not meet important criteria to
help mitigate bias – frequently they lack explicit
criteria for article selection and frequently there is no
evaluation of selected articles for validity, as examples.
Only rigorously applied evidence-based methods can help
move toward predictability for health care outcomes. |
Key
Problem:
Big Potential for Bias |
There
is high potential for low methodological quality. Authors
frequently have expert opinions (and biases) and find studies
to support their positions (selection bias). |
| Conclusions |
Review articles can be useful for summarizing the literature
and providing guidance provided they are of high methodological
quality. However, because many reviews are not done in a
systematic way, they should not be relied upon to draw conclusions
about effective care. |
| |
Overviews
or narrative reviews are frequently published in the best
medical journals. Many clinicians rely upon these reviews
since they are considered a current “roundup”
of evidence and are accompanied by “useful”
recommendations from the (expert) author. Authors frequently
have expert opinions (and biases) and find studies to
support their positions (selection bias). You need to
be sure the review was done using a systematic approach.
- There
may be studies showing no effect or harm which are not
included in the review.
- The only
way to know if there has been a comprehensive search
and critical appraisal of the studies in the review
is to see the search strategy and criteria used for
study inclusion.
Unless you
see the search strategy, criteria for selecting and accepting
studies for entry into the review, the information (summary
points, conclusions) contained in the review may be invalid.
Caution is urged in using any reviews except valid systematic
reviews.
Criteria
—» If you are looking at a
review article that does not pass these criteria you are
likely to be wasting your time and drawing invalid conclusions
about the best clinical approach (you can get our Systematic
Review Appraisal Tool online at our Website.)
- Was there
an attempt to obtain all relevant studies for the review?
- Does
the review state inclusion/exclusion criteria for the
studies?
- Do the
criteria address study type, methods, the population
studied, intervention and outcomes?
- Were
the studies adequately evaluated for internal and external
validity?
- Population
- Study
type
- Study
Methods = Method of randomization ? Blinded assessment
? Outcomes (benefits, harms and risks) ? Loss to
follow-up ? ITT analysis
- Is there
a statement or chart rating/summarizing study quality?
- Were
there several reviewers who agreed on study validity?
- Was there
a summary/synthesis of the evidence?
Quantitative
summaries (meta-analysis) may not be possible in some
cases because of study heterogeneity. However, systematic
reviews may still be able to weight the best studies,
e.g., by validity, sample size. |
| For
questions of treatment, prevention and screening, is important
to favor valid systematic reviews that utilize Randomized
Controlled Trials (RCTs). |
Assessing Results

.............
Measures of
Outcomes — ARR, RRR and NNT: A Visual Explanation
This too is very important (and a wee bit hard to produce here).
Our suggestion? Download the [PDF].
Here's another one to give to everyone you know.
............
Understanding
Number-Needed-to-Treat @
The DelfiniClick™
|
| Evidence
Grading
Introduction
The purpose of grading clinical studies is to
condense the results from critical appraisal of a study
or studies into a brief “tag” which allows
categorization into various levels of quality. The need
to know the quality of a study or group of studies derives
from a need to accurately predict what outcomes are likely
to occur when various health care interventions are chosen
by clinicians, patients and others. Higher quality studies
are more likely to accurately predict efficacy and safety
outcomes than studies of lower quality. It appears that
lower quality studies often falsely inflate study results.
The table below summarizes the relative amount of exaggeration
that may occur with low quality RCTs.
Exaggeration
of Differences Between Outcomes in Intervention vs Control
Groups (High Quality vs Low Quality RCTs)
To see study citation and abstract, enter the
PMID number in the PubMed search window.
|
Study
Area of Concern |
Relative
Amount of Exaggeration Reference |
First
Author (PMID) |
| Generation
of Randomization Sequence |
51% |
Kjaergard
(11730399) |
| Concealment
of Allocation |
Up to
51% |
Schulz
(7823387), Kjaergard (11730399), Moher (9746022) |
| Blinding |
17%-44% |
Schulz
(7823387), Poolman (17332104), Kjaergard (11730399) |
| Assessing
outcomes through models |
50%
or greater |
Lachin
(11018568) |
Evidence
Grading Systems
There are many systems or schemes for evidence grading.
They are all based on identifying study flaws, often referred
to as threats to validity and involve evaluation of study
design, methodology and results.
Cautions
- When using
any grading system, it is important to review the criteria
used for arriving at each grade - these may vary even
when the grade “name” is identical.
- Also,
some grading systems may assign misleading quality grades
by inflating lower quality or invalid studies.
The Delfini
validity and usability grading scale is designed to be
easily understood, remembered and flexible to apply. The
concepts behind our grading system can be applied to individual
studies or conclusions from systematic reviews. It can
also be used to rate judgments based on evidence such
as clinical recommendations, guidelines, etc.
For more
details and grading advice, download our tool for •Evidence
Grading, Wording Conclusions & Results Tables
[WORD]
which is also available at Tools:
Evidence Tool Set. Read on for Delfini's methods: |
Delfini
Validity & Usability Grading Scale for Summarizing
the Evidence for Interventions
Grades
can be applied to individual studies, to conclusions within
studies, a body of evidence or to secondary sources such
as guidelines or clinical recommendations. General advice
is provided below.
NOTES
- Author’s
Conclusions are Uncertain
If the author’s conclusions are uncertain, it
may not be necessary to do a critical appraisal of the
study because an uncertain outcome automatically renders
an “uncertain” grade designation.
- Grade
B-U Advice
Because of its use for clinical applications, B-U should
be used conservatively. B-U is not a default grade.
Rather, it should be used when the study is probably
a B and the outcomes are highly likely to be true, but
it doesn’t quite comfortably reach a Grade B.
|
|
Grade
A:
Useful |
The
evidence is strong and appears sufficient to use in making
health care decisions – it is both valid and useful
(e.g., meets standards for clinical significance, sufficient
magnitude of effect size, physician and patient acceptability,
etc.)
Advice:
Studies achieving this grade should be outstanding in
design, execution and reporting with useful information
to aid clinical decision-making, enabling reasonable certitude
in drawing conclusions.
For a body
of evidence:
Several well-designed and conducted studies that consistently
show similar results
- For therapy,
screening, prevention and diagnostic studies: RCTs.
In some cases a single, large well-designed and conducted
RCT may be sufficient; however, without confirmation
from other studies results could be due to chance, undetected
significant biases, fraud, etc. In such instance the
study might receive a Grade A, but the Strength of the
Evidence should include a cautionary note.
- For natural
history and prognosis: Cohort studies
|
Grade
B:
Possibly Useful |
The evidence appears potentially strong and is probably
sufficient to use in making health care decisions - some
threats to validity were identified
Advice:
Studies achieving this grade should be of high quality
in design, execution and reporting with non-lethal threats
to validity and with sufficiently useful information to
aid clinical decision-making, enabling reasonable certitude
in drawing conclusions.
For a body
of evidence:
The evidence is strong enough to conclude that the results
are probably valid and useful (see above); however, study
results from multiple studies are inconsistent or the
studies may have some (but not lethal) threats to validity.
- For therapy,
screening, prevention and diagnostic studies: RCTs.
In some cases a single, large well-designed and conducted
RCT may be sufficient; however, without confirmation
from other studies results could be due to chance, undetected
significant biases, fraud, etc. In such instance the
study might receive a Grade A, but the Strength of the
Evidence should include a cautionary note.
- Also
for diagnosis, valid studies assessing test accuracy
for detecting a condition when there is evidence of
effectiveness from valid, applicable RCTs.
- For natural
history and prognosis: Cohort studies
|
Grade
B-U:
Possible to uncertain usefulness
|
The
evidence might be sufficient to use in making health care
decisions; however, there remains sufficient uncertainty
that the evidence cannot fully reach a Grade B and the
uncertainty is not great enough to fully warrant a Grade
U.
Study quality
is such that it appears likely that the evidence is sufficient
to use in making health care decisions; however, there
are some study issues that raise continued uncertainty.
Health care decision-makers should be fully informed of
the evidence quality. |
|
Grade
U:
Uncertain Validity and/or
Uncertain Usefulness
|
There
is sufficient uncertainty that caution is urged regarding
its use in making health care decisions.
- Uncertain
Validity: This may be due to uncertain validity due
to methodology (enough threats to validity to raise
concern – our suggestion would be to not use such
a study in most circumstances) or may be due to conflicting
results.
- Uncertain
Usefulness: Or this may be due to uncertain applicability
due to results (good methodology, but questions due
to effect size, applicability of results when relating
to biologic markers, or other issues). These latter
studies may be useful and should be viewed in the context
of the weight of the evidence.
- Uncertain
Validity and Usefulness: This is a combination of the
above.
- Uncertainty
of Author: If the author has reached a conclusion that
the findings are uncertain, doing a critical appraisal
is unlikely to result in a different conclusion. The
evidence leaves us uncertain regardless of whether the
study is valid or not. Critical appraisal is at the
discretion of the reviewer.
|
|
| An
example of a critical appraisal report @
The DelfiniClick™
— radiofrequency for the treatment of gastro-esophageal
reflux disease example: full
story; appraisal
only |
| Intention-to-Treat
Analysis — The Biased Case of Migraine |
|
Interntion-to-treat analysis (ITT) is an important consideration
in randomized, controlled trials. As described in the CONSORT
STATEMENT (http://www.consort-statement.org/),
among other things, ITT analysis “prevents bias caused
by the loss of participants, which may disrupt the baseline
equivalence established by random assignment and which may reflect
non-adherence to the protocol.”
ITT analysis is
defined as follows in the CONSORT STATEMENT:
“A strategy for analyzing data in which all participants
are included in the group to which they were assigned, whether
or not they completed the intervention given to the group.”
An easy way to
tell if an ITT analysis has been done is to look at the number
randomized in each group and see if that number is the same
number that is analyzed. Number in should be the same number
out — in each group as originally randomized.
And, as you can
see, determining whether an analysis meets the definition of
ITT analysis or not is incredibly easy. Yet many authors mislabel
their analyses as ITT when they are not. In one study, in articles
reviewed authors were found to say they had performed an ITT
analysis when 47% of the time they had not. (Kruse, R. B Alper
et al. Intention-to-treat analysis: Who is in? Who is out? JFamPrac
2002 Nov: (Vol 51) #11)
An article in BMJ
dealing with migraine illustrates some important points about
ITT analysis and reminds us that authors continue to
report outcomes in ways that are highly likely to be biased.
In the Schrader
study, 30 patients with migraine were randomized to receive
lisinopril and 30 were randomized to placebo. The authors, however,
only reported on 55 patients in their so-labeled
“intention-to-treat analysis” because of poor compliance.
This is not an intention-to-treat analysis.
The following is
reported by the authors:
| Schrader
H, Stovner, LJ, Helde G, Sand T, Bovim G. Prophylactic
treatment of migraine with angiotensin converting inhibitor
(lisinopril): randomised, placebo controlled, crossover
study. BMJ 2001;322:1-5 — article
available at — http://bmj.bmjjournals.com/cgi/content/full/322/7277/19. |
Results
In the 47 participants with complete data, hours with headache,
days with headache, days with migraine, and headache severity
index were significantly reduced by 20% (95% confidence
interval 5% to 36%), 17% (5% to 30%), 21% (9% to 34%), and
20% (3% to 37%), respectively, with lisinopril compared
with placebo. Days with migraine were reduced by at least
50% in 14 participants for active treatment versus placebo
and 17 patients for active treatment versus run-in period.
Days with migraine were fewer by at least 50% in 14 participants
for active treatment versus placebo. Intention to treat
analysis of data from 55 patients supported the differences
in favour of lisinopril for the primary end points. In the
intention to treat analysis in 55 patients, significant
differences were retained for the primary efficacy end points:
|
Intention
to Treat Analysis—55 Participants with Means (SD) |
| |
Lisinopril |
Placebo |
Mean
% reduction (95% CI) |
| Headache hours |
138 (130)
|
162 (134)
|
15 (0 to 30) |
| Headache days |
20.7 (14) |
24.7 (11) |
16 (5 to 27) |
| Migraine days |
14.6 (10) |
18.7 (9) |
22 |
| Conclusion:
The angiotensin converting enzyme inhibitor, lisinopril,
has a clinically important prophylactic effect in migraine.
|
The authors have
done as their primary analysis an “optimal compliance
analysis.” They also state they have done an ITT analysis
but they have not.
It is fine to do
non-ITT analyses – “as treated,” and “completer”
analysis are two common ones you will frequently see. But the
ITT analysis must be the primary analysis. Others are considered
secondary (and should be labeled and treated as such).
And so how does
one handle loss to follow-up? There are various methods, but
there is an important principle which should guide us —
the method should put the burden of proof on the intervention.
This is the opposite of our court system – “guilty
until proven innocent,” in effect. So what you do is assign
an outcome to those lost to follow-up that puts the intervention
through the toughest test. “Worse-case-basis” is
one method; “last-observed result” is another.
If you put the
intervention through the hardest test, and you still have positive
results (assuming the study is otherwise valid), you can feel
much more confident about the reported outcomes truly being
valid. If the missing subjects in the above-mentioned migraine
article are handled this way, there is no statistically
significant difference between lisinopril and placebo.
We are frequently
asked what is an acceptable percent loss to follow-up. It depends
on whether the loss to follow-up will affect the results or
not. We have seen what we consider to be important changes even
with small numbers lost to follow-up. We recommend that you
do sensitivity analyses (“what if”s) to see what
the effect might be if you had the data. Without doing an ITT
analysis, we are very uncomfortable about the results if five
percent or more of subjects have missing data for analyzing
endpoints -- and even less than five percent might have impact.
For those who would
like more information, the following article is an excellent
one on the subject and is very helpful for understanding issues
pertaining to ITT analysis and randomization as well:
Schulz
KF, Grimes DA
Sample size slippages in randomised trials: exclusions and the
lost and wayward.
The Lancet. Vol 359. March 2, 2000: 781-785
PMID: 11888606
See other reading
on ITT analysis is available here.
Very special thanks
to Murat Akalin, MD, MPH, UCSD, for selecting
a great article for case study, participating in this review,
doing the ITT analysis and encouraging us to write this. |
| Summary
Points About Harms |
| Safety
issues concern risks and harms which are events that cause problems
with meaningful outcomes (morbidity, mortality, quality of life,
functioning) or other cause other unwanted effects.
- Terms “safety,
risk, harm, adverse event, adverse effect, ADE” often
used interchangeably
- We sometimes
distinguish risk from harm
Harms are infrequent,
hard to find, usually not the topic of study, study duration
may be too short or population studied, too small. They are
often reported from weaker science such as case report data,
database research, observational studies or low quality RCTs.
(Reminder: Cause and effect can only be concluded from valid
RCTs.) Safety data are, therefore, usually not strong and often
likely due to chance. When good interventions are no longer
available due to poor safety information – which could
be inaccurate – patients may be harmed.
Clinical trials
tell us much about efficacy but little about harms because —
- Study populations
are carefully selected and frequently have only one disease;
frequently exclude pregnant women, children, elderly.
- Rare events
are difficult to find in RCTs because of small sample sizes.
- Harms from
long-term use of a drug are usually not known because trials
are of short duration.
- Drug interactions
are more likely in the real world than in RCTs.
Generally, there
is no easy way to find information regarding harms.
- Drug companies
do not have a large interest in studies looking at long-term
harms.
- The current
“system” is based on voluntary reporting.
- Case reports
(letters, short reports) are the main source of information
on harms.
- There are national
and international monitoring centers, e.g., FDA. However,
a problem with FDA data is that many (?most) reports do not
have cause—effect relationships.
Searching for harms
using PubMed —
- Large RCTs
should be sought, but problematic if harms are rare or late.
Also look for long-term follow-up of RCTs.
- Systematic
reviews of RCTs dealing with harms should be sought, but
harms may not be detected if some of the included trials
do not report harms or if harms are described in various
ways in different studies. Therefore, in some cases systematic
reviews may falsely indicate lack of harms that are subsequently
detected in large, well-designed and conducted RCTs.
- Search for
case-control and cohort studies, keeping in mind that observational
studies are prone to bias
- Key words:
“adverse effects, adverse events, adverse reactions,
adverse reaction monitoring, ADR, pharmacovigilance.”
Strategies we employ
for approaching harms —
- Review multiple
studies
- Note if support
(e.g., biologic plausibility, relatedness in outcomes, dose-response
relationship)
Review of confidence intervals (CI) for non-significant findings
to discern if there is a clinically meaningful possibility
with the interval
- Consider applying
composite endpoints
- Review the exclusions.
Exclusion of patients otherwise likely to experience side-effects
may affect generalizability of results of side effects reporting
(eg, may happen if patients are restricted to those who are
not naïve or may occur through a run-in and exclusion
period).
- Caution:
if subjects who have experienced or are likely to experience
adverse outcomes known to be associated with the intervention
being studied are excluded, then search for condition,
intervention and adverse outcome (and potentially comparator)
without limits — but potentially narrowing for cohort.
It might be reasonable to limit review to recent studies
as they probably have a discussion of prior findings.
- Apply cautious
wording with caveats
- Clinicians are
urged to follow FDA recommendations
References:
- BMJ. 2004;329:2-3.
- BMJ. 2004;329:44-47.
More on harms —
|
| Flowing
Evidence into Cost Analysis: Powerful Ways with NNT |
| This
example serves several purposes.
It illustrates
the critical importance of including time period in NNT in order
to better understand efficacy.
It shows
how you can use NNT in an effectiveness and cost analysis.
1. Read the fictional
scenario below. This is typical of the kinds of issues
faced by doctors, P & T committees, QI staff, etc., daily.
2. Think about
your reaction.
3. Then click on
the icon to see how important NNT can be. Remember, NNT is always
to be associated the length of the time of study. This example
really drives this home.
Fictional
Scenario
- Lifetime
risk of hip fracture in women is 15% with significant mortality
(20-30% of women die in the first year following hip fracture).
- HRT is now found
to have many risks. Other fracture prevention drugs have risks.
There is a new (fictional) drug on the market that has fewer
risks and that many docs are starting to use on high and moderate
risk women. The drug is getting a lot of press attention and
has good evidence behind it.
- Many women in
your organization are requesting information and treatment
for prevention of fractures. Many are asking about this new
drug.
- One year of
treatment is cheaper than alendronate.
Should you treat these women with this new drug? How
do you decide? Click
on the question mark below to see an example of how you
can address questions like this one:
?
|
| Diagnostic
Testing & Measures of Test Function
Bottom line is
that there are special considerations for critically appraising
studies of diagnostic tests in addition to usual considerations
of study validity, clinical relevance, applicability and usability.
Measures of test function help determine the accuracy and usefulness
of diagnostic tests. Also known as indices of accuracy. They
are —
Sensitivity, Specificity,
Positive predictive value, Negative predicitive value, Positive
likelihood ratio, Negative likelihood ratio, Post-test odds,
Post-test probabilities, Number-needed-to-diagnose,
However there are
some special challenges surrounding these measures which we
describe below. You can download this one-pager here [PDF].
|
| Evaluating
Diagnostic Tests: Challenges with Measures of Test Function |
| |
- A goal
of diagnostic testing is to reduce diagnostic uncertainty.
Yet there is usually uncertainty associated with diagnostic
testing itself.
- The obvious
question in diagnostic testing is, “Does the patient
have this condition, disorder or disease?” An
equally important question, however, is, “Will
the patient experience improved outcomes if the condition
is detected?” Thus diagnostic testing requires
considering both test accuracy and the evidence about
clinical outcomes from various interventions.
- Evaluating
a diagnostic test usually entails making a comparison
when one is available. Yet, there are often problems
with making test comparisons. And there are problems
when there is no comparator.
- ”Measures
of Test Function” evaluate the accuracy and prediction
capabilities of diagnostic tests. There are often problems
with Measures of Test Function.
Consequently
there are special considerations for critically appraising
studies of diagnostic tests in addition to usual considerations
of study validity, clinical relevance, applicability and
usability. |
How
much uncertainty is associated with this test?
|
Uncertainty
can rarely be eliminated due to –
a) Uncertainty
about what equates with a meaningful result since assignment
of normal and abnormal values is usually arbitrary when
dealing with a range, and “normal/abnormal”
may not equate with being disease-free or having a disease;
b) Trade-offs
between sensitivity and specificity – Setting the
cut-off to identify more patients with the disorder will
almost always yield more patients with false positives.
For example, if you set the cut-off for an abnormal fasting
blood sugar at a low level to identify more diabetics
(higher sensitivity), you will pay the price of including
more non-diabetics (lower specificity and high false positive
rate) as well.
c) Variations
in the test’s accuracy and precision, its application,
its predictive capabilities, and/or its interpretation;
d) Variations
in how values might vary within an individual, a population
or within different populations – including assumptions
about who is and who is not disease-free, and variations
in disease spectrum such as early to late disease, or
mild or severe disease, or rate of disease progression;
e) Sometimes
having a test for an intermediate outcome, but not having
good information available about whether treatment of
the intermediate outcome is actually associated with meaningful
clinical outcomes (e.g.,PVCs following an MI indicate
higher risk for cardiac mortality, but does treating them
reduce the mortality risk?)
f) Frequently
needing to choose a less accurate method due to cost or
risk (e.g., chest x-ray vs lung biopsy).
Consequently, the uncertainty surrounding the diagnostic
test must be evaluated. |
| |
- Frequently
there is no single, accurate test for diagnosis. For
example the diagnosis of rheumatoid arthritis involves
history, physical exam plus laboratory testing.
- Often
there is no way that is 100% accurate to establish a
diagnosis. Comparing a new diagnostic test or procedure
to an inaccurate “standard” may make it
seem that the new method is in error even if it is actually
better than the current “standard.”
- Are patients
representative of the population to which the test will
be applied and consecutive?
- Are study
investigators blinded to the results of the gold standard
and the test being evaluated?
- It is
rare that a test is both highly sensitive and highly
specific, which can make it difficult to find a perfect
gold standard.
- Often
good information about negative tests is lacking since
patients with negative tests are generally not subsequently
exposed to further invasive or uncomfortable testing.
Therefore, it is more likely to be unknown if the negative
results are valid or invalid.
Consequently,
the comparison method must be evaluated. |
Measures
of Test Function – Is the test accurate and predictive?
|
Results
for measures of test function may be misleading depending
on the population used to make those calculations.
- Calculations
are often based not on prevalence within a community,
but in the pool tested.
- The test's
reported sensitivity can be misleading if the sensitivity
is determined in a patient population that is different
from where the test is applied. For example, the sensitivity
of CPK for an MI may be overstated if it is determined
using CCU patients, but the test is used in general
hospital admissions. This is due to the greater severity
of disease in the CCU, which may result in a greater
likelihood that those with the disease test positive.
Conversely, a general population includes more people
presenting earlier in the course of the disease. These
people with early disease are more likely to have lower
CPK levels, resulting in a lower sensitivity for the
test.
- Prevalence
and severity of disease is often higher in academic
settings than in the general population, and sensitivity
may be higher than if the study or test had been performed
in a “usual care” setting.
Consequently,
variables like age and gender in the study subjects must
be evaluated, along with severity, stage and duration
of disease to ensure that various stages of disease have
been studied, to determine if the measures of test function
are appropriate for your population. |
|
| Performance
Measures — Quick Tips Checklist
We have an entire training program and tool for this work, but
we list our quick tips here.
See also
Delfini
Showcase: Publications — Performance
Measures |
| Steps
for Quality Improvement Project Design |
| Steps
I. through IV. Help for Selecting Good Projects |
| Step I. Do
you have a gap between current & optimal care? Apply
considerations for determining importance of area for clinical
improvement. |
| Step II. What
will close the gap and improve quality? Search for valid
and useful evidence for quality improvement effort. If none
available and this is for an intervention, STOP. What will
be your quality improvement? |
| Step
III. Is attempting the improvement feasible in your environment?
- Are you
going to be able to successfully make clinical practice
change happen?
- Are resources
available to support the initiative?
|
| Step
IV. Can you measure it?
- Is your
measure quantifiable?
- Is your
measure valid, accurate and dependable?
- Is the
measure useful and usable?
- Comprehensible
- Assists
with QI projects
- Is measurement
achievable in your local circumstances?
| Table:
Measure Name/Descriptor/Validity Consideration |
Example |
| Numerator
= what you are counting = validity ideally based on
valid, useful evidence |
An Rx
for ACEIs |
| Denominator
= the pool for the count = validity based on inclusions
and exclusions |
In unexcluded
patients with CHF admitted to a hospital |
| Frequency
= time interval for the occurrence (e.g., performance
or process) = validity ideally based on valid, useful
evidence |
By the
time of hospital discharge |
|
| Steps
V. through VIII. Help for Applying Performance Measures |
| Step V. How
are you going to gather the data to measure the improvement?
|
| Step VI. What
is the meaning of your measurement, i.e., what goal will
you set to define “improvement?” |
| Step VII.
How are you going to report it and to whom? |
| Step
VIII. What is your process for updating your improvement?
|
|
| Research
Searching Tips |
| Caveats
abound! I was asked to lead a session on research searching
at UCSD. I make no claims to the utility or completeness of
this document — plus this is specifically geared to UCSD
faculty, staff and students. But in case this is useful to someone
else, here 'tis. [PDF]
Sheri |
|
|