Quick Picks

Why Critical Appraisal Matters

Services

Seminars

Books

Updates & Contact Info

Free Online Tools

Free Online Tutorial

Delfini Blog
Follow & Share...
Just-in-time Updates
Like Us Find Us |
DelfiniGram™: GET ON OUR UPDATE LIST 
Volume
— Quality
of Evidence:
Secondary Studies
newest
04/05/2012: Have You Seen PRISMA?
04/03/2012: A Caution When Evaluating Systematic Reviews and Meta-analyses
Contents
Related DelfiniClicks™
Primary
Studies
Go to DelfiniClick™
for all volumes.  |
Systematic
Reviews: Quality & Searching Tips
While we believe
you should still do a very thorough search — the grey
literature can dramatically impact how effective a treatment
may appear: Here's a nice piece on dealing with inaccessible
literature and literature of low quality when doing a systematic
review. The article makes a good case for concentrating more
on a thorough quality review of what you can easily get than
digging too deeply to make sure you've caught everything. What
is clear from the article is that including poorly designed
studies can have substantial impact on treatment effects —
often making them appear more beneficial than they actually
are.
Health Technology
Assessment 2003; Vol.7: No.1 (Executive Summary)
How important are comprehensive literature searches and the
assessment of trial quality in systematic reviews? Empirical
study
Egger, M, Juni P, Bartlett C, Holenstein, Sterne J.
https://pdfs.semanticscholar.org/18da/e5c7d340138dba23fcfe35754b09d1dd89c8.pdf
Also, see our Recommended
Reading on Meta-analyses.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Systematic
Reviews: The Need for Critical Appraisal — Antioxidents
Case Study
12/10/08
In previous DelfiniClicks we have emphasized that secondary studies (eg, systematic reviews
(SRs) and meta-analyses), like primary studies, require critical
appraisal. A key determination is whether the investigators
have drawn their conclusions from studies they deemed high quality
or not. We have repeatedly shown examples of how low quality
studies are likely to overestimate benefit. A recent meta-analysis
of antioxidants (Bjelakovic G, Nikolova D, Gluud LL, Simonetti
RG, Gluud C. Mortality in randomized trials of antioxidant supplements
for primary and secondary prevention: systematic review and
meta-analysis. JAMA. 2007 Feb 28;297(8):842-5. PMID*: 17327526)
provides yet another example of how observational studies may
indicate benefit and high quality studies may demonstrate harm.
The take-home messages are the same:
- Besides looking
at the research question, clinical significance, searching
methods of a SR and heterogeneity of the included studies,
pay close attention to the inclusion and exclusion criteria
and look carefully at how the investigators assessed the quality
of the included studies.
Approximately ten
to twenty percent of individuals in North America and Europe
take antioxidant supplements because of the general belief,
based on several observational studies, that antioxidants improve
health. It now looks like some antioxidants are likely to increase
all-cause mortality. The authors report that beta carotene,
vitamin A, and vitamin E seem to increase the risk of death
and that further randomized trials are needed to establish the
effects of vitamin C and selenium.
Multivariate meta-regression
analyses showed that low-bias risk trials (RR, 1.16; 95% CI,
1.05-1.29) and selenium (RR, 0.998; 95% CI, 0.997-0.9995) were
significantly associated with mortality. In 47 low-bias trials
with 180,938 participants, the antioxidant supplements significantly
increased mortality (RR, 1.05; 95% CI, 1.02- 1.08). In low-bias
risk trials, after exclusion of selenium trials, beta carotene
(RR, 1.07; 95% CI, 1.02-1.11), vitamin A (RR, 1.16; 95% CI,
1.10-1.24), and vitamin E (RR, 1.04; 95% CI, 1.01-1.07), singly
or combined, significantly increased mortality. Vitamin C and
selenium had no significant effect on mortality. The trials
in which vitamin C was applied singly or in different combinations
with beta carotene, vitamin A, vitamin E, and selenium found
no significant effect on mortality. CIs indicated that for Vitamin
C a small beneficial effect or large harmful effects could not
be excluded.
Comments
Strengths:
In this meta-analysis the investigators should be congratulated
for stratifying studies by risk of bias (methodological quality).
They defined trials with low-risk of bias as trials with adequate
generation of the allocation sequence, adequate allocation concealment,
adequate blinding, and adequate follow-up. Trials with one or
more unclear or inadequate quality components were classified
as high-bias risk trials. As might be expected, the high-risk
bias trials reported mortality was significantly decreased in
the antioxidant intervention group taking suppliments (RR, 0.91;
95% CI, 0.83-1.00) without significant heterogeneity (I2=4.5%).
Once again we emphasize
what has been demonstrated by investigators interested in how
study quality affects reported study results:
- Inadequate generation
of the allocation sequence may result in an exaggeration of
benefit of up to a relative 51% (Kjaergard: PMID: 11730399)
- Inadequate concealment
of allocation may also result in an exaggeration of benefit
of up to a relative 51% (Kjaergard: PMID: 11730399)
- Inadequate blinding
may result in exaggeration of benefit of up to a relative
48% (Schulz, PMID: 7823387; Poolman, PMID:17332104, Kjaergard,
PMID: 11730399)
Weaknesses:
Follow-up was considered adequate if the numbers and reasons
for dropouts and withdrawals in all intervention groups were
described or if it was specified that there were no dropouts
or withdrawals. In a previous DelfiniClick, we discuss problems of using a description of dropouts or withdrawls
rather than intention-to-treat and sensitivity analysis. Use
of the Jadad scale (Jadad PMID: 8721797), for example allows
points to be awarded:
- Merely for reporting
rather than giving appropriate attention to methodological
quality.
- Even if the
investigators do not address intention-to-treat analysis as
long as they describe the lost subjects.
Therefore using
the Jadad scale, randomized trials with large numbers of dropouts
that are well-described, using only a per-protocol analysis
(and having myriad other biases such as differences between
groups), may be scored as of the highest methodological quality
(five points).
Bottom
Line: We will use the results of this SR to inform
decisions, but we were forced to grade this otherwise grade
B study of BU quality because of this major problem of considering
follow-up “adequate” with only a description of
dropouts or withdrawals.
*PMID is PubMed
Identification Number and allows you to quickly find the article
by inputting into the search window.
Case Study
Reference
- Bjelakovic G,
Nikolova D, Gluud LL, Simonetti RG, Gluud C. Mortality in
randomized trials of antioxidant supplements for primary and
secondary prevention: systematic review and meta-analysis.
JAMA. 2007 Feb 28;297(8):842-57
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Have You Seen PRISMA?
04/05/2012
Systematic reviews and meta-analyses are needed to synthesize evidence regarding clinical questions. Unfortunately the quality of these reviews varies greatly. As part of a movement to improve the transparency and reporting of important details in meta-analyses of randomized controlled trials (RCTs), the QUOROM (quality of reporting of meta-analysis) statement was developed in 1999.[1] In 2009, that guidance was updated and expanded by a group of 29 review authors, methodologists, clinicians, medical editors, and consumers, and the name was changed to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses).[2] Although some authors have used PRISMA to improve the reporting of systematic reviews, and thereby assisting critical appraisers assess the benefits and harms of a healthcare intervention, we (and others) continue to see systematic reviews that include RCTs at high-risk-of-bias in their analyses. Critical appraisers might want to be aware of the PRISMA statement.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714672/?tool=pubmed
1. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, et al. Improving the 8 quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999;354:1896-1900. PMID: 10584742.
2. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009 Jul 21;339:b2700. doi: 10.1136/bmj.b2700. PubMed PMID: 19622552.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Systematic
Reviews: Untrustable "Trustable" Sources?
We write this in
grieving…
For evidence-based
information, there are three sources that are highly respected
and fairly universally considered “trustable.” These
sources are the Cochrane Collaboration, Clinical Evidence (from
the British Medical Journal) and the Database of Abstracts of
Reviews of Effects (DARE) from the Centre for Reviews and Dissemination
at the University of York, England.
- Cochrane is
billed as, “The reliable source of evidence in health
care.”
- Clinical Evidence
refers to itself as the “International source of the
best available evidence for effective health care.”
- DARE (possibly
wisely) offers up no such claim.
We’ve always
known that there is variation in quality even with these, most
respected, evidence-based information sources. However, because
their reputation is so high, coupled with transparent and reasonably
robust procedures, many users go directly to these sources and
rely on what they find there.
In our efforts
to provide a solid footing for our evidence reviews, Delfini
frequently conducts audits of even these, considered to be the
most rigorous, of evidence-based sources. Regrettably, we routinely
find that we cannot rely on conclusions of the reviewers. The
problems we frequently encounter fall into three camps:
1) Poor assessment
of study quality. This problem seems to happen for a couple
of reasons. a) The method to assess the original study quality
is poor (e.g., reliance on the Jadad scale (Jadad PMID 8721797));
or b) potentially, lack of critical appraisal skills of reviewers.
2) Lack of exclusion
of studies of uncertain validity or usefulness — which
has the likelihood of bias in favor of interventions.
3) Most astonishingly
— cause and effect conclusions drawn from evidence which
has been solidly declared invalid.
Example
1: Cochrane Review
Reference:
Fouque D, Wang P, Laville M, Boissel JP. Low protein diets
for chronic renal failure in non-diabetic adults. The
Cochrane Database of Systematic Reviews 2000, Issue 4.
Art. No.: CD001892. DOI: 10.1002/14651858.CD001892.
Quality Assessment:
Data collected for each trial included inclusion and exclusion
criteria, patient details (age, gender), type of diet
prescribed (level of proposed protein intake, nature of
proteins, supplementation in energy or amino-acids), time
to the start of dialysis if available. The nature of renal
disease was recorded to verify that the distribution of
prognostic factors was balanced between the groups. No
quality assessment of the studies was performed.
Main results:
Two hundred and forty two renal deaths were recorded,
101 in the low protein diet and 141 in the higher protein
diet group, giving an odds ratio of 0.62 with a 95% confidence
interval of 0.46 to 0.83 (p=0.006). To avoid one renal
death, four to 56 patients need to be treated with a low
protein diet during one year.
Authors'
conclusions: Reducing protein intake in patients with
chronic renal failure reduces the occurrence of renal
death by about 40% as compared with higher or unrestricted
protein intake. The optimal level of protein intake cannot
be confirmed from these studies.
Comment:
Assessment of study quality is an “absolute must”
and defines critical appraisal of the medical literature.
It is well known that including low quality studies in
a systematic review is likely to yield invalid results
and conclusions.
Example
2: Cochrane Review of serenoa repens (saw palmetto) for
BPH
Reference
Wilt T, Ishani A, Mac Donald R. Serenoa repens for benign
prostatic hyperplasia. The Cochrane Database of Systematic
Reviews 2002,Issue 3. Art. No.: CD001423. DOI: 10.1002/14651858.CD001423.
The authors
of this review concluded that, “The evidence suggests
that Serenoa repens provides mild to moderate improvement
in urinary symptoms and flow measures.”
However,
a well-done RCT — Bent S, Kane C, Shinohara K, et
al. Saw palmetto for benign prostatic hyperplasia. N Engl
J Med 2006;354:557-66 — demonstrated that saw palmetto
did not improve symptoms or objective measures of benign
prostatic hyperplasia.
Comment:
We believe that explanation for differing results is that
poor quality studies were included in the Cochrane review.
Example
3: Clinical Evidence
A systematic
review of primary prevention of cardiovascular disease
utilized two studies (study 1, Blood Pressure Lowering
Treatment Trialists Collaborative. Effects of ACE inhibitors,
calcium antagonists, and other blood pressure lowering
drugs: results of prospectively designed randomised trials.
Lancet 2000;355:1955–1964. Study 2, Staessen JA,
Wang JG & Thijs L. Cardiovascular prevention and blood
pressure reduction: a quantitative overview updated until
1 March 2003. Journal of Hypertension 2003, 21: 1055–1076)
which when audited by Delfini were lethally threatened.
Example
4: DARE
In an appraisal
of a systematic review of hypertension, DARE rated the
review as “good.” Delfini’s audit of
this DARE appraisal did not find the audited article to
be valid. (See above reference to study 1: Trialists). |
Our
Advice for Working with Secondary Studies
Our advice is as
follows for Cochrane, Clinical Evidence and for secondary studies
that appear to pass a critique by DARE (or which pass your own
critical appraisal):
First, determine
if they made their conclusions on the basis of studies they
deemed high quality or not.
If they included
low quality studies, such that you have to disregard their conclusions
as untrustable, can you otherwise benefit from any of their
efforts?
a) Can you accept their search output? (Remember to update following
the date of their search.)
b) Are you comfortable accepting their exclusions?
c) Are you comfortable accepting their designation of a low
grade so that you can exclude these studies as well?
c) Are you comfortable with accepting their inclusions as the
basis for your critical appraisals? If yes, appraise every article
they give a high score to.
If they appear
to have only included studies they grade high:
a) Review their methods for grading — do you agree?
b) From the included studies, select a study or two that is
deemed of high quality and a study or two of the lowest quality,
and critically appraise your selection yourself. There will
probably be agreement over a low quality study — but since
you want to determine if low quality studies are being misgraded
high, it’s their high ground you want to head for.
c) If you are in agreement over which you have audited reach
a high quality grade, do a face validity check of the number
of studies getting high marks. Since probably over 80% of the
medical literature achieves a Delfini Grade U for uncertain
validity and/or usefulness, there are likely to be very few
studies that are worthy of inclusion in any review.
And then, let us
pray for a quality source we can all trust.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Review
of Cochrane Groups’ Assessment of Bias in Studies
05/08/08
The Cochrane Collaboration
publishes systematic reviews developed by various groups and,
after peer review, are edited by Cochrane Review Groups who
use their own assessment recommendations as well as those published
by the Cochrane Handbook of Systematic Reviews of Interventions.
An important study from the Nordic Cochrane Centre has examined
how the different Cochrane review groups currently recommend
assessment and handling of the risk of bias in studies evaluated
for inclusion in systematic reviews. The authors focus on the
use of study components versus numerical scales, and suggest
possible improvements[1] as there is significant variation in
the quality of approaches by Cochrane groups. Some key points
from the study are presented below.
Background
It is important in developing systematic reviews to evaluate
for bias each study being considered for inclusion in the review.
There are four major areas of bias that should be considered:
selection bias, performance bias, attrition bias and assessment
bias. Anything that leads away from "truth" is a bias,
excepting for chance effects. The reason for this assessment
is that biased studies are likely to result in unreliable estimates
of effect (ie, misleading results). Unreliable conclusions then
lead to inaccurate predictions of outcomes of benefits and harms
for users. Assessing studies for bias has generally been done
by Cochrane groups using two different approaches, namely —
- Component Approach,
meaning assessing the study components for bias: Evaluating
the methodological areas of each study for bias, e.g., details
of randomization, blinding, similarity of care experiences
except for the intervention being studied, handling of drop-outs,
methods for calculating outcomes. This approach is supported
by empirical evidence.
- Scale Approach,
meaning assessing the study by assigning an overall quality
score: This requires providing numerical values to validity
threats. The Jadad scale is an example of this approach [2].
The use of scales is not well-supported by empirical research.
For example, Jüni et al[3] used 25 existing scales to
identify high-quality trials, and found that the effect estimates
and conclusions of the same meta-analysis varied substantially
with the scale used. The Cochrane Collaboration advises against
use of scales for assessing studies.
Methods
The authors from the Nordic Cochrane Centre examined the instructions
to authors of the 50 Cochrane Review Groups that focus on clinical
interventions for recommendations on methodological quality
assessment of studies.
Results
The following table summarizes the main findings.
COMPONENT
APPROACH |
SCALE
APPROACH |
Groups
recommending a component approach, 41/50 (82%) |
Groups
recommending a scale approach, 9/50 (19%) |
23
groups had their own checklists ranging from 4 to 23 items |
5
groups recommend Jadad |
Areas
Recommended for Assessment by Component Group (%) |
Areas
Recommended for Assessment by Scale Group (%) |
Sequence
Generation, 26 (63)
|
9
(100) |
Concealment
of allocation, 41 (100)
|
2
(22) |
Blinding
of patients, 33 (80)
|
9
(100) |
Blinding
of caregivers, 32 (78)
|
1
(11) |
Blinding
of outcome assessors, 39 (95)
|
9
(100) |
Follow-up,
38 (93)
|
9
(100) |
Intention-to-treat
analysis, 20 (49) |
1
(11) |
Recommendations
for using quality assessments of individual studies in
reviews by Component Group (%) |
Recommendations
for using quality assessments of individual studies in
reviews by Scale Group (%)
|
Analytical
Approach, e.g, sensitivity analysis to test if including
only trials of higher methodological quality changes the
effect estimates, 20 (49)
|
8
(89) |
Descriptive
Approach, 1 (2)
|
0
(0) |
No
information, 20 (49) |
1
(11) |
Recommendations
for Type of Analysis by Component Group
|
Recommendations
for Type of Analysis by Scale Group
|
Sensitivity
analysis, 17 (85) |
7 (88) |
Threshold,
4 (20) |
3
(38) |
Subgroup
analysis, 4 (20) |
3
(38) |
Cumulative
analysis, 1 (5) |
2
(25) |
Weights,
1 (5) |
1
(13) |
Meta-regression,
0 (0) |
1
(13) |
Authors’
Conclusions
- Cochrane Reviews
are undertaken by authors with different levels of methodological
training, and following Cochrane Review Groups’ guidelines
for assessing bias can be problematic if the guidelines are
not in accordance with the empirical research on bias.
- Despite advising
against scales, the Cochrane Handbook actually recommends
a ranking scale. The scale distinguishes between low risk
of bias (all criteria met), moderate risk of bias (one or
more criteria partly met) and high risk of bias (one or more
criteria not met). However, the Cochrane Handbook does not
specify the criteria to be used. Instead it states that the
criteria used should be few and address substantive threats
to the validity of the study results.
- The Jadad scale
consists of three items, and up to two points are given for
randomization, two for double blinding and one for description
of withdrawals and dropouts. An overall score between zero
and five is assigned, where three is commonly regarded as
adequate trial quality. The Jadad scale is problematic for
a variety of reasons (we agree and present some of our observations
below). Also, studies have shown low interrater agreement,
particularly for withdrawals and dropouts, where kappa values
below zero have been reported, which is an agreement that
is worse than that expected by chance.
- The Cochrane
Handbook recommends analyzing all data according to the intention-to-treat
principle using different methods without specifying what
those methods should be. The Handbook should give clearer
recommendations to ensure a more homogeneous methodology.
- The authors
list multiple errors made by the various Cochrane groups —
for example, the grading system recommended by the Back Group
has five levels of evidence and was developed using a consensus
method. Consistent findings among multiple, low-quality non-randomized
studies are considered to be the same level of evidence as
one high-quality randomized trial, which is not in accordance
with findings from empirical studies or with the Cochrane
Handbook. The four-level grading system used by the Musculoskeletal
Group is also based on consensus and is also highly problematic.
The system is based on arbitrary cut-points such as sample
size above 50 and more than 80% follow-up, which are not based
on empirical evidence. The only difference between platinum
and gold evidence is that there needs to be two randomized
trials for platinum and one for gold, which is not reasonable
as, for example, the platinum trials could involve 60 patients
each and the gold trial, 500 patients. Silver level can be
either a randomized trial with a 'head-to-head' comparison
of agents or a high-quality case-control study, which is not
supported by empirical research, and bronze level can be a
high-quality case series without controls or expert opinion.
Delfini Comment on the Jadad Scale
We have long had
issues with the Jadad Scale.
- Points can be
awarded merely for reporting rather than giving appropriate
attention to methodological quality.
- For randomization,
the scale addresses explicitly the sequence generation, but
not concealment of allocation of the sequence.
- The scale does
not address intention-to-treat analysis among many other considerations.
Therefore, randomized trials with an appropriate randomization
sequence, but with no concealment of allocation, with large
numbers of dropouts that are well described, using only a
per-protocol analysis and having myriad other biases such
as differences between groups, may be scored as of the highest
methodological quality (five points).
After years of
doing critical appraisals involving thousands of studies, we
feel confident in stating that no scale can be developed that
is sufficient to evaluate studies. Instead an understanding
of critical appraisal concepts combined with critical and clinical
thinking is required.
Delfini Approach to Secondary Studies
As to use of systematic
reviews, Delfini frequently starts evidence-based reviews and
quality improvement projects by checking Cochrane for recent
systematic reviews. We have over the years noted a great deal
of variation in the quality of the reviews (and have reported
this to Cochrane). This report from the Cochrane Nordic Centre
is further evidence that we should keep doing what we have been
doing—auditing Cochrane reviews to be sure that the included
studies are of high quality and not just accepting the results
and conclusions of all reviews as valid because they come from
a “most-trusted source.”
This is how we
approach use of systematic reviews:
- The critical
appraisal process for secondary studies includes a number
of elements, including a review of the systematic search methods
employed and an assessment of whether the secondary source
includes only valid and clinically useful primary sources.
To assess the latter, the source is reviewed to attempt to
determine whether critical appraisal of the content from included
studies was performed, along with an attempt to assess the
skill-level of the appraisers and the robustness of their
review. One or two of included primary studies considered
to be of the highest quality are critically appraised for
validity and usefulness as an audit by Delfini. If these studies
pass the audit, one or two included primary studies of the
lowest quality are critically appraised as well. If these
lower quality studies also pass, it is assumed that the authors
have employed good critical appraisal techniques.
- If the source
passes an audit for validity and usefulness, the source’s
efficacy and safety conclusions are used in the Delfini evidence
synthesis and new research published following the date of
the source’s search strategy is sought.
- If the source
does not pass the audit for validity and usefulness, but has
utilized a sound search strategy and sound criteria for excluding
efficacy studies lacking relevance, validity or for other
problems, all the primary studies selected for inclusion by
the source are critically appraised, and valid, useful studies
form the basis of the Delfini review, which will then be updated
with any new valid and clinically useful primary studies published
since the date of the secondary source’s search.
We applaude the
Nordic Cochrane Group for their efforts in helping to improve
an important resource.
References
1. Lundh A, Gotzsche PC. Recommendations by Cochrane Review
Groups for assessment of the risk of bias in studies. BMC Med
Res Methodol. 2008 Apr 21;8(1):22 [Epub ahead of print]. PMID:
18426565
2. Jadad AR, Moore
RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay
HJ. Assessing the quality of reports of randomized clinical
trials: is blinding necessary? Control Clin Trials. 1996 Feb;17(1):1-12.
PMID: 8721797
3. Jüni P,
Witschi A, Bloch R, Egger M: The hazards of scoring the quality
of clinical trials for metaanalysis. JAMA 1999, 282:1054-60.
PMID: 10493204.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Bell’s
Palsy Update
02/11/08 & 12/03/2010 for instructional
slides to demonstrate how to perform a conservative intention-to-treat
(ITT) analysis. [Note: This is actually
a primary studies issue, but is stored here for historical reasons
as our Bell's Palsy journey began with a narrative review.]
Things have changed. It now appears that early treatment with prednisone most likely does benefit patients with Bell’s Palsy. In the past, we were critical of narrative reviews published in the BMJ and the NEJM because — as had been nicely pointed out by a Cochrane review — there were no high quality RCTs demonstrating improved outcomes with steroids in the treatment of acute Bell’s Palsy. (See our letters in the boxes below.)
Sullivan et al. (N Engl J Med 2007;357:1598-607. PMID: 17942873) have now presented probably valid and clinically useful data to change our conclusions about the use of steroids in Bell’s Palsy. In a 4-arm 9 month study comparing 1) prednisolone + placebo versus 2) acyclovir + placebo versus 3) prednisolone + acyclovir versus 4) 2 placebo capsules, the investigators report rates of complete recovery of 94.4% for patients who received prednisolone and 81.6% for those who did not, a difference of 12.8 percentage points (95% CI, 7.2 to 18.4; P<0.001). They concluded that, “in patients with Bell’s palsy, early treatment with prednisolone significantly improves the chances of complete recovery at 3 and 9 months.”
Critical Appraisal of the Sullivan Study
We critically appraised this study and found the following threats to validity:
- Would have preferred
more details of randomization.
- No mention of
co-interventions or allowed/disallowed concommitnant Rx.
- 7 patients assigned
to prednisolone received the wrong drug.
- 3 patients assigned
to placebo received the wrong drug.
- In the prednisolone
+ placebo group, the loss to follow-up was 11/138=7.9%; in
the double placebo group the loss was 19/141=13.5%. For the
acyclovir + prednisolone group the drop out rate was 10/124=8%,
and for the acyclovir + placebo the dropout was 15/123=12%.
Total loss = 55 / 526 = 10.5%.
True ITT analysis not done and an adequate sensitivity analysis was not done by the authors. Therefore we “redid” the analysis using the following assumptions:
- No Predisolone
Group: We applied the percent-recovered rate in the no prednisolone
group (control event rate) to those missing or who discontinued
(which agreed with statistics on the natural history of the
condition), excepting those who sought active treatment who
were counted as treatment failures.
- Prednisolone
Group: We failed all who were missing and those who sought
active treatment.
Our reanalysis yielded a statistically significant difference between those subjects receiving prednisolone and those who did not, at a p-value of 0.0487.
Comment: Because of author’s analysis, reported efficacy results are likely to be inflated, but the evidence suggests that prednisolone is likely to be an effective agent for adults with acute onset of Bell’s palsy when administered within 72 hours of onset. Because of loss to follow-up some uncertainty remains.
Grade
B-U: Possible to uncertain usefulness
- The evidence
might be sufficient to use in making health care decisions;
however, there remains sufficient uncertainty that the evidence
cannot fully reach a Grade B and the uncertainty is not great
enough to fully warrant a Grade U.
- Study quality
is such that it appears likely that the evidence is sufficient
to use in making health care decisions; however, there are
some study issues that raise continued uncertainty. Health
care decision-makers should be fully informed of the evidence
quality.
12/03/2010:
See the story in pictures. PDF of our instructional slides
on Intention-to-Treat
Recalculation of Sullivan 2007, PMID: 17942873, Prednisolone
for Treatment of Bell's Palsy
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Delfini Letter to BMJ: Corticosteriods are Not Proven for Treatment
of Bell's Palsy (Or are they? See update here.)
The cover feature
of the September 4, 2004 BMJ is about improving outcomes in
Bell's palsy. Yet they publish an invalid study — Holland
JN, Weiner GM, Recent developments in Bell's palsy. BMJ 2004;329:553-557 PMID 15345630
(4 September) Full
Text
Despite a good
Cochrane review that exposes some fatal flaws in prior research,
the above reference review not only includes this flawed study,
but excludes a valid one. Here is our
response to BMJ (reprinted below):
Corticosteroids
are Not Proven for Treatment of Bell's Palsy
Editor—Holland
and Eisner support the use of corticosteroids for treatment
of moderate to severe facial palsy based on two systematic
reviews [1,2]. We believe that such a conclusion is not justified
by the medical evidence presented by the authors and that
the two systematic reviews are fatally flawed.
In the Ramsey
review, three studies met the authors’ criteria for
validity [3,4,5], but in Ramsey’s analysis the study
by May et al.[3]—a valid study—was excluded because
it was an “outlier”, i.e., the results were not
consistent with the other two studies included in the review.
Excluding a study on the basis of results rather than methodology
is inappropriate. Results from non-valid studies should not
be utilized in decision-making.
May et al. reported
that corticosteroids resulted in poorer facial recovery than
placebo. It should be noted that the quality index score of
the excluded (May) study was better than one of the studies
included in Ramsey’s review. If the May et al study
is not excluded, the results do not support the authors’
conclusion of benefit from corticosteroid treatment.
It should also
be noted that a Cochrane systematic review [6]included the
May study as a valid study, but excluded the Austin study
because 29 percent of subjects in the Austin study were lost
to follow-up. Cochrane also excluded the Shafshak study because
it was a non-randomized study.
As pointed out by the Cochrane group, the Grogan “practice
parameter” is probably invalid because Grogan included
the Shafshak and Austin studies and that if these two trials
were excluded from the pooled estimate, the results were no
longer in favor of steroids for the treatment of Bell’s
palsy.
- Grogan PM,
Gronseth GS. Practice parameter: steroids, acyclovir, and
surgery for Bell's palsy (an evidence-based review): report
of the Quality Standards Subcommittee of the American Academy
of Neurology. Neurology 2001;56: 830-6
- Ramsey MJ,
DerSimonian R, Holtel MR, Burgess LP. Corticosteroid treatment
for idiopathic facial nerve paralysis: a meta-analysis. Laryngoscope
2000;110: 335-41
- May M, Wette
R, Hardin WB Jr, Sullivan J. The use of steroids in Bell’s
palsy: a prospective controlled study. Laryngoscope 1976;
86:1111–1122
- Austin JR,
Peskind SP, Austin SG, Rice DH. Idiopathic facial nerve paralysis:
a randomized double blind controlled study of placebo versus
prednisone. Laryngoscope 1993; 103:1326–1333.
- Shafshak TS,
Essa AY, Bakey FA. The possible contributing factors for the
success of steroid therapy in Bell’s palsy: a clinical
and electrophysiological study. J Laryngol Otol 1994; 108:940–943.
- The Cochrane
Database of Systematic Reviews The Cochrane Library, Copyright
2003, The Cochrane Collaboration: Corticosteroids for Bell's
palsy (idiopathic facial paralysis) [Review] Salinas, RA;
Alvarez, G; Alvarez, MI; Ferreira, J Date of Most Recent Update:
26-November-2001. Date of Most Recent Substantive Update:
15- October-2001.
November
5, 2004 Update: BMJ
Responds
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Delfini Letter to NEJM: Bell's Palsy - REDUX! (see update here)
Is it something
about the pull of the moon? No sooner do we write a letter to
the BMJ about flaws in a recent article on Bell's Palsy, that
the New England Journal of Medicine publishes a review based
on some of the same poor research--Gilden DJ. Clinical practice.
Bell's Palsy. N Engl J Med. 2004 Sep 23;351(13):1323-31. No
abstract available. PMID: 15385659 [PubMed - in process]
We disagree with
Gilden who concludes that that “the data suggest that
glucocorticoids decrease the incidence of permanent facial paralysis…”
What does Gilden base this conclusion on? Answer: An observational
study, a RCT with 29 percent of subjects lost to follow-up and
the American Academy of Neurology’s practice parameter
stating that early treatment with corticosteroids is “probably
effective”. The Cochrane group has pointed out that the
American Academy of Neurology’s practice parameter is
probably invalid because the Academy's evidence synthesis included
a fatally flawed trial and a non-randomized trial. We concur
with the Cochrane group and believe corticosteroids have not
been demonstrated to be effective in Bell’s palsy. Cause
and effect conclusions cannot be reliably drawn from observational
studies and results from non-valid studies should not be utilized
in drawing conclusions regarding effectiveness and making treatment
decisions.
This is also another
reminder about the problems with systematic reviews. See our
systematic review appraisal tool and our tips sheet: Systematic
Review Validity Tool & The
Problems with Narrative Reviews
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Substandard
Evidence: POEMS and Diabetes: Readers, Beware!
Allen Shaughnessy
and David Slawson, EBM pioneers who in 1994 gave us POEMS (Patient-oriented
evidence that matters) and DOEs (disease-oriented outcomes)
have published in the BMJ an article dealing with how experts
represented the results of a clinically useful study —
the UKPKS study. Full text is available at http://bmj.com/cgi/content/full/327/7409/266.
The reason the
authors picked the UKPKS study is that it is a good example
of patient-oriented evidence that matters — it evaluated
the effect of intensive blood glucose control on various outcomes
that matter:
- Tight control
did not prevent premature mortality.
- Metformin decreased
mortality and diabetes-related outcomes in overweight patients.
- Tight BP control
decreased complications (greater effect than blood glucose
control).
- Quality of life
was not affected by tight glucose control.
Shaughnessy and
Slawson systematically reviewed reviews that met their inclusion
criteria (see http://www.delfini.org/Delfini_Tool_SR.doc).
They found that:
Information that
tight glucose control does not change overall mortality or diabetes
related mortality was mentioned in only 6/35 reviews.
- Only 14 of
the reviews mentioned the effect of metformin on diabetes-related
outcomes in overweight patients.
- 17 of the reviews
did not mention the need for BP control in patients with diabetes.
- Only 5 reviewers
reported that diabetic patients with hypertension benefit
more from BP control than glucose control.
- Only 7 of the
reviews reported that ACE inhibitors and beta blockers were
equivalent as starting drugs for hypertension.
Implications
from this review: Important POEMs are missing from
recent diabetes review articles. Narrative reviews and expert
reviews remain problematic in that these reviews represent major
vehicles for transmitting research to clinicians, and often
these reviews are of poor quality. Clinicians may be getting
from expert reviews what these authors call PROSE—"Prescriptive
recommendations based on substandard evidence."
This is another
reminder that systematic reviews are a good place to start when
looking for valid, relevant information. Reader beware
of expert reviews which continue to appear in the best journals.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Quality
of Systematic Reviews: Misleading “POEM” on Hormone
Therapy
Here's a letter
Dr. Brian Alper and Delfini submitted to the American Family
Physician which was not accepted, but we think makes some very
important points:
Misleading
“POEMs and Tips” item on hormone therapy
TO THE EDITOR:
As “Tips from Other Journals” was found to be one
of your four most popular departments[1], it has a substantial
impact on disseminating research results.
We would like to
alert your readers to a recent “Tip” that increases
the dissemination of a flawed meta-analysis concluding that
hormone replacement therapy (HRT) reduced mortality in women
less than sixty years old [2,3]. This particular meta-analysis
may “get past” some critical appraisal screens because
it describes a comprehensive search and seemingly appropriate
inclusion and quality criteria. However, detailed appraisal
finds four fatal flaws in this meta-analysis.
First, the subgroup
analysis addressing younger women was based on trials with a
mean age less than sixty years (n = 4,141 women). The use of
mean age for each included trial rather than actual age of included
women has been previously reported as a significant flaw[4].
The method of subgroup analysis used does not account for most
of the available data on women less than sixty years old. The
Women’s Health Initiative trial alone provided data on
HRT in a population that had a mean age of 63.3 years but included
5,522 women aged 50-59 years.
Second, the authors
of the meta-analysis evaluated the methodology of the studies
they included but did not fully consider the impact of study
methods on their results. For example, seven of the seventeen
trials in the subgroup analysis failed to meet the quality criteria
of being double-blinded. All seven of these trials had results
that tended to favor HRT, while only two of the ten double-blinded
trials had results that tended to favor HRT.
Third, a meta-analysis
must provide appropriate weights to individual studies included
in the analysis. This usually is done by providing greater weights
to larger studies, basing the weight on the denominator or sample
size. In this meta-analysis, weights were based on the number
of deaths reported. For example, a study with 406 patients and
1 death contributed 1.9% of the meta-analysis summary statistic,
while a study with 130 patients and 73 deaths accounted for
10.3% of the final results for trials with mean age less than
sixty years.
Fourth, the study
accounting for 10.3% of this result [5] was a trial involving
postmenopausal women with ovarian cancer (mean age 51 years),
excluding those with low malignant potential. Including (and
emphasizing) such a highly select population invalidates the
use of this analysis for conclusions regarding the general population.
Any one of these
fatal flaws is sufficient to invalidate this conclusion. The
evidence does not support the concept of lower mortality from
HRT use in younger women.
Brian S. Alper
MD, MSPH
Editor-in-Chief, DynamicMedical.com
Research Assistant Professor in Family and Community Medicine,
University of Missouri-Columbia
Michael Stuart,
MD
President, Delfini Group, LLC
Clinical Assistant Professor, University of Washington School
of Medicine
Sheri Ann Strite
Managing Director & Principal, Delfini Group, LLC
1. Merriman JA.
Reader surveys help determine AFP’s direction. Am Fam
Physician 2005;71:1459.
2. Wellberry C. Does beginning HT earlier decrease mortality?
Am Fam Physician 2005;71:1594. (page number may be wrong, accessed
via web at http://www.aafp.org/afp/2005/0415/p1600.html)
3. Salpeter SR, et al. Mortality associated with hormone replacement
therapy in younger and older women. J Gen Intern Med July 2004;19:791-804.
4. Furberg CD, Psaty BM. Review: hormone replacement therapy
may reduce the risk for death in younger but not older postmenopausal
women. ACP J Club 2005;142:1.
5. Guidozzi F, Daponte A. Estrogen replacement therapy for ovarian
carcinoma survivors: A randomized controlled trial. Cancer 1999;86:1013-8.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Summarizing the Strength of the Evidence: AHRQ-EHCP Grading System
07/07/2011
Most groups who systematically review the medical literature have an approach to grading the quality of individual studies and also rating the quality for specific outcomes after evaluating the totality of evidence.
The AHRQ-EHCP (the Agency for Healthcare Research and Quality and the Effective Health Care Program group) grading methodology is worth knowing about.[1] AHRQ-EHCP utilizes four domains when rating the overall strength of the evidence (SOE). These domains were selected after reviewing grading methodolgies used by the U.S. Preventive Services Task Force (USPSF), [2] the GRADE working group, [3] and other evidence-based practice centers.[4,5].
Briefly, The AHRQ-EHCP approach assesses the risk of bias, consistency, directness and precision for each outcome or comparison of interest after rating each study or key outcome from each study for bias (paraphrased, in some instances, below):
- Bias: each study is scored based on study design and methodology, and the aggregate of studies is rated for an overall “risk of bias” score. Aggregate risk of bias is scored as low, medium, or high. The aggregate quality of studies is rated as good, fair or poor.
- Consistency (the degree of similarity of effect sizes of included studies) is scored as consistent, inconsistent or unknown/not applicable.
- Directness is the linkage between the intervention and health outcomes scored as direct or indirect (meaning intermediate or surrogate outcome measures which may or may not be valid measures for clinical usefulness).
- Precision concerns the ability to draw a clinically useful conclusion from the confidence intervals. An imprecise estimate, for example, is one for which the confidence interval is wide enough to include clinically distinct conclusions (e.g., favoring both the interventions being compared).
AHRQ-EHCP—like GRADE—has additional domains which may be included when rating evidence:
- Dose-response associations (present or not present)
- Plausible confounding that would increase or decrease effect (present or absent)
- Strength of association (magnitude of effect)
- Publication bias (not necessary to formally score)
The AHRQ-EHCP overall strength of evidence (SOE) for each outcome of interest includes three grades—high, moderate, low and inconclusive. For example, if the SOE is high, further research is unlikely to change confidence in the estimate of effect. If evidence is unavailable or does not permit a conclusion, the outcome in the AHRQ EHCP system is graded as insufficient.
Delfini Modification
We use our usual grading system for individual studies (A, B, BU, U). When we use our modified AHRQ-EHCP system for rating the overall level of evidence (LOE), we modify it by adding a fourth category —“borderline” to increase clarity as we believe that “moderate” is not precise enough to address evidence of borderline usefulness. And we prefer the term “inconsistent” to “insufficient.”
Table 1. AHRQ-EHCP and Delfini Evidence Grading Methodologies
AHRQ-EHCP Evidence Grading and Strength of Evidence Methodology |
Delfini Evidence Grading and Level of Evidence Methodology |
For Each Outcome
|
Each study or outcome: A, B, BU, U for validity and usefulness |
Overall SOE: high, moderate, low, insufficient |
Overall LOE: high, moderate, borderline, inconclusive |
Table 2. Examples AHRQ-EHCP Strength of Evidence (SOE) Ratings
Number studies; N |
Risk of Bias |
Consistency |
Directness |
Precision |
SOE |
Mortality |
1;80 |
RCT/Medium |
Unknown |
Direct |
Imprecise |
Insufficient |
Improved Quality of Life |
6;265 |
RCTs/Low |
Consistent |
Direct |
Precise |
High SOE |
References
1. Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions-Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol. 2009 Jul 10.
2. Sawaya GF, Guirguis-Blake J, LeFevre M, Harris R, Petitti D; U.S. Preventive Services Task Force. Update on the methods of the U.S. Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med. 2007 Dec 18;147:871–5.
3. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924e6.
4. West S, King V, Carey TS, et al. Systems to Rate the Strength of Scientific Evidence. Evidence Report/ Technology Assessment No. 47 (Prepared by the Research Triangle Institute- University of North Carolina Evidence-based Practice Center under Contract No. 290–97–0011). AHRQ Publication No. 02–E016. Rockville, MD: Agency for Healthcare Research and Quality; 2002.
5. Treadwell JR, Tregear SJ, Reston JT, Turkelson CM. A system for rating the stability and strength of medical evidence. BMC Med Res Methodol 2006;6:52.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
A Caution When Evaluating Systematic Reviews and Meta-analyses
04/03/2012
We would like to draw critical appraisers' attention to an infrequent but important problem encountered in some systematic reviews—the accuracy of standardized mean differences in some reviews. Meta-analysis of trials that have used different scales to record outcomes of a similar nature requires data transformation to a uniform scale, the standardized mean difference (SMD). Gøtzsche and colleagues, in a review of 27 meta-analyses utilizing SMD found that a high proportion of meta-analyses based on SMDs contained meaningful errors in data extraction and calculation of point estimates.[1] Gøtzsche et al. audited two trials from each review and found that, in 17 meta-analyses (63%), there were errors for at least 1 of the 2 trials examined. We recommend that critical appraisers be aware of this issue.
1. Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007 Jul 25;298(4):430-7. Erratum in: JAMA. 2007 Nov 21;298(19):2264. PubMed PMID:17652297.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
........
CONTACT DELFINI  
|
At DelfiniClick™

DelfiniClick™
|
Read Our Blog...
Menu........
Use of our website implies agreement to our Notices. Citations for references available upon request.
Home
What's New
Blog
Seminars
Services
Delfini Group Publishing
Resources
Sample Projects
Notices
About
Us & Our Work
Testimonials
Other
Site Search
Contact
Info/Updates
........
Quick
Navigator to Selected Resources
.......................
|