Evidence-based Medicine ClickEBM Mike Stuart MDEBM Sheri Strite

A Cool Click for Evidence-based Medicine (EBM) and Evidence-based Practice (EBP) Commentaries & Health Care Quality Improvement Nibblets

The EBM Information Quest: Is it true? Is it useful? Is it usable?™


Valdity Detectives: Michael E Stuart MD, President & Medical Director . Sheri Ann Strite, Managing Director & Principal

Quick Picks

Delfini: Dr. Michael E. Stuart & Sheri Ann Strite
Why Critical Appraisal Matters



Delfini Group Publishing

Contact Us
Updates & Contact Info

Free Online Tools

Free Online Tutorial

Delfini Blog

EBM Dolphin
Evidence & Quality Improvement Commentaries


Follow & Share...

Just-in-time UpdatesFollow Delfini Group on Twitter

Like Us Like Us on Facebook  Find UsFind Us at LinkedIn

DelfiniGram™: GET ON OUR UPDATE LIST Contact Us

Volume — Quality of Evidence:
Secondary Studies

04/05/2012: Have You Seen PRISMA?
04/03/2012: A Caution When Evaluating Systematic Reviews and Meta-analyses


Related DelfiniClicks

Primary Studies

Go to DelfiniClick™ for all volumes.Delfini Group EBM DolphinDelfini Group EBM Dolphin

Systematic Reviews: Quality & Searching Tips

While we believe you should still do a very thorough search — the grey literature can dramatically impact how effective a treatment may appear: Here's a nice piece on dealing with inaccessible literature and literature of low quality when doing a systematic review. The article makes a good case for concentrating more on a thorough quality review of what you can easily get than digging too deeply to make sure you've caught everything. What is clear from the article is that including poorly designed studies can have substantial impact on treatment effects — often making them appear more beneficial than they actually are.

Health Technology Assessment 2003; Vol.7: No.1 (Executive Summary)
How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study
Egger, M, Juni P, Bartlett C, Holenstein, Sterne J.


Also, see our Recommended Reading on Meta-analyses.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Systematic Reviews: The Need for Critical Appraisal — Antioxidents Case Study

In previous DelfiniClicks we have emphasized that secondary studies (eg, systematic reviews (SRs) and meta-analyses), like primary studies, require critical appraisal. A key determination is whether the investigators have drawn their conclusions from studies they deemed high quality or not. We have repeatedly shown examples of how low quality studies are likely to overestimate benefit. A recent meta-analysis of antioxidants (Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG, Gluud C. Mortality in randomized trials of antioxidant supplements for primary and secondary prevention: systematic review and meta-analysis. JAMA. 2007 Feb 28;297(8):842-5. PMID*: 17327526) provides yet another example of how observational studies may indicate benefit and high quality studies may demonstrate harm. The take-home messages are the same:

  • Besides looking at the research question, clinical significance, searching methods of a SR and heterogeneity of the included studies, pay close attention to the inclusion and exclusion criteria and look carefully at how the investigators assessed the quality of the included studies.

Approximately ten to twenty percent of individuals in North America and Europe take antioxidant supplements because of the general belief, based on several observational studies, that antioxidants improve health. It now looks like some antioxidants are likely to increase all-cause mortality. The authors report that beta carotene, vitamin A, and vitamin E seem to increase the risk of death and that further randomized trials are needed to establish the effects of vitamin C and selenium.

Multivariate meta-regression analyses showed that low-bias risk trials (RR, 1.16; 95% CI, 1.05-1.29) and selenium (RR, 0.998; 95% CI, 0.997-0.9995) were significantly associated with mortality. In 47 low-bias trials with 180,938 participants, the antioxidant supplements significantly increased mortality (RR, 1.05; 95% CI, 1.02- 1.08). In low-bias risk trials, after exclusion of selenium trials, beta carotene (RR, 1.07; 95% CI, 1.02-1.11), vitamin A (RR, 1.16; 95% CI, 1.10-1.24), and vitamin E (RR, 1.04; 95% CI, 1.01-1.07), singly or combined, significantly increased mortality. Vitamin C and selenium had no significant effect on mortality. The trials in which vitamin C was applied singly or in different combinations with beta carotene, vitamin A, vitamin E, and selenium found no significant effect on mortality. CIs indicated that for Vitamin C a small beneficial effect or large harmful effects could not be excluded.


Strengths: In this meta-analysis the investigators should be congratulated for stratifying studies by risk of bias (methodological quality). They defined trials with low-risk of bias as trials with adequate generation of the allocation sequence, adequate allocation concealment, adequate blinding, and adequate follow-up. Trials with one or more unclear or inadequate quality components were classified as high-bias risk trials. As might be expected, the high-risk bias trials reported mortality was significantly decreased in the antioxidant intervention group taking suppliments (RR, 0.91; 95% CI, 0.83-1.00) without significant heterogeneity (I2=4.5%).

Once again we emphasize what has been demonstrated by investigators interested in how study quality affects reported study results:

  • Inadequate generation of the allocation sequence may result in an exaggeration of benefit of up to a relative 51% (Kjaergard: PMID: 11730399)
  • Inadequate concealment of allocation may also result in an exaggeration of benefit of up to a relative 51% (Kjaergard: PMID: 11730399)
  • Inadequate blinding may result in exaggeration of benefit of up to a relative 48% (Schulz, PMID: 7823387; Poolman, PMID:17332104, Kjaergard, PMID: 11730399)

Weaknesses: Follow-up was considered adequate if the numbers and reasons for dropouts and withdrawals in all intervention groups were described or if it was specified that there were no dropouts or withdrawals. In a previous DelfiniClick, we discuss problems of using a description of dropouts or withdrawls rather than intention-to-treat and sensitivity analysis. Use of the Jadad scale (Jadad PMID: 8721797), for example allows points to be awarded:

  • Merely for reporting rather than giving appropriate attention to methodological quality.
  • Even if the investigators do not address intention-to-treat analysis as long as they describe the lost subjects.

Therefore using the Jadad scale, randomized trials with large numbers of dropouts that are well-described, using only a per-protocol analysis (and having myriad other biases such as differences between groups), may be scored as of the highest methodological quality (five points).

Bottom Line: We will use the results of this SR to inform decisions, but we were forced to grade this otherwise grade B study of BU quality because of this major problem of considering follow-up “adequate” with only a description of dropouts or withdrawals.

*PMID is PubMed Identification Number and allows you to quickly find the article by inputting into the search window.

Case Study Reference

  • Bjelakovic G, Nikolova D, Gluud LL, Simonetti RG, Gluud C. Mortality in randomized trials of antioxidant supplements for primary and secondary prevention: systematic review and meta-analysis. JAMA. 2007 Feb 28;297(8):842-57

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Have You Seen PRISMA?

Systematic reviews and meta-analyses are needed to synthesize evidence regarding clinical questions. Unfortunately the quality of these reviews varies greatly. As part of a movement to improve the transparency and reporting of important details in meta-analyses of randomized controlled trials (RCTs), the QUOROM (quality of reporting of meta-analysis) statement was developed in 1999.[1] In 2009, that guidance was updated and expanded by a group of 29 review authors, methodologists, clinicians, medical editors, and consumers, and the  name was changed to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses).[2] Although some authors have used PRISMA to improve the reporting of systematic reviews, and thereby assisting critical appraisers assess the benefits and harms of a healthcare intervention, we (and others) continue to see systematic reviews that include RCTs at high-risk-of-bias in their analyses. Critical appraisers might want to be aware of the PRISMA statement.


1. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, et al. Improving the 8 quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999;354:1896-1900. PMID: 10584742.

2. Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009 Jul 21;339:b2700. doi: 10.1136/bmj.b2700. PubMed PMID: 19622552.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Systematic Reviews: Untrustable "Trustable" Sources?

We write this in grieving…

For evidence-based information, there are three sources that are highly respected and fairly universally considered “trustable.” These sources are the Cochrane Collaboration, Clinical Evidence (from the British Medical Journal) and the Database of Abstracts of Reviews of Effects (DARE) from the Centre for Reviews and Dissemination at the University of York, England.

  • Cochrane is billed as, “The reliable source of evidence in health care.”
  • Clinical Evidence refers to itself as the “International source of the best available evidence for effective health care.”
  • DARE (possibly wisely) offers up no such claim.

We’ve always known that there is variation in quality even with these, most respected, evidence-based information sources. However, because their reputation is so high, coupled with transparent and reasonably robust procedures, many users go directly to these sources and rely on what they find there.

In our efforts to provide a solid footing for our evidence reviews, Delfini frequently conducts audits of even these, considered to be the most rigorous, of evidence-based sources. Regrettably, we routinely find that we cannot rely on conclusions of the reviewers. The problems we frequently encounter fall into three camps:

1) Poor assessment of study quality. This problem seems to happen for a couple of reasons. a) The method to assess the original study quality is poor (e.g., reliance on the Jadad scale (Jadad PMID 8721797)); or b) potentially, lack of critical appraisal skills of reviewers.

2) Lack of exclusion of studies of uncertain validity or usefulness — which has the likelihood of bias in favor of interventions.

3) Most astonishingly — cause and effect conclusions drawn from evidence which has been solidly declared invalid.

Example 1: Cochrane Review

Reference: Fouque D, Wang P, Laville M, Boissel JP. Low protein diets for chronic renal failure in non-diabetic adults. The Cochrane Database of Systematic Reviews 2000, Issue 4. Art. No.: CD001892. DOI: 10.1002/14651858.CD001892.

Quality Assessment: Data collected for each trial included inclusion and exclusion criteria, patient details (age, gender), type of diet prescribed (level of proposed protein intake, nature of proteins, supplementation in energy or amino-acids), time to the start of dialysis if available. The nature of renal disease was recorded to verify that the distribution of prognostic factors was balanced between the groups. No quality assessment of the studies was performed.

Main results: Two hundred and forty two renal deaths were recorded, 101 in the low protein diet and 141 in the higher protein diet group, giving an odds ratio of 0.62 with a 95% confidence interval of 0.46 to 0.83 (p=0.006). To avoid one renal death, four to 56 patients need to be treated with a low protein diet during one year.

Authors' conclusions: Reducing protein intake in patients with chronic renal failure reduces the occurrence of renal death by about 40% as compared with higher or unrestricted protein intake. The optimal level of protein intake cannot be confirmed from these studies.

Comment: Assessment of study quality is an “absolute must” and defines critical appraisal of the medical literature. It is well known that including low quality studies in a systematic review is likely to yield invalid results and conclusions.

Example 2: Cochrane Review of serenoa repens (saw palmetto) for BPH

Wilt T, Ishani A, Mac Donald R. Serenoa repens for benign prostatic hyperplasia. The Cochrane Database of Systematic Reviews 2002,Issue 3. Art. No.: CD001423. DOI: 10.1002/14651858.CD001423.

The authors of this review concluded that, “The evidence suggests that Serenoa repens provides mild to moderate improvement in urinary symptoms and flow measures.”

However, a well-done RCT — Bent S, Kane C, Shinohara K, et al. Saw palmetto for benign prostatic hyperplasia. N Engl J Med 2006;354:557-66 — demonstrated that saw palmetto did not improve symptoms or objective measures of benign prostatic hyperplasia.

Comment: We believe that explanation for differing results is that poor quality studies were included in the Cochrane review.

Example 3: Clinical Evidence

A systematic review of primary prevention of cardiovascular disease utilized two studies (study 1, Blood Pressure Lowering Treatment Trialists Collaborative. Effects of ACE inhibitors, calcium antagonists, and other blood pressure lowering drugs: results of prospectively designed randomised trials. Lancet 2000;355:1955–1964. Study 2, Staessen JA, Wang JG & Thijs L. Cardiovascular prevention and blood pressure reduction: a quantitative overview updated until 1 March 2003. Journal of Hypertension 2003, 21: 1055–1076) which when audited by Delfini were lethally threatened.

Example 4: DARE

In an appraisal of a systematic review of hypertension, DARE rated the review as “good.” Delfini’s audit of this DARE appraisal did not find the audited article to be valid. (See above reference to study 1: Trialists).

Our Advice for Working with Secondary Studies

Our advice is as follows for Cochrane, Clinical Evidence and for secondary studies that appear to pass a critique by DARE (or which pass your own critical appraisal):

First, determine if they made their conclusions on the basis of studies they deemed high quality or not.

If they included low quality studies, such that you have to disregard their conclusions as untrustable, can you otherwise benefit from any of their efforts?
a) Can you accept their search output? (Remember to update following the date of their search.)
b) Are you comfortable accepting their exclusions?
c) Are you comfortable accepting their designation of a low grade so that you can exclude these studies as well?
c) Are you comfortable with accepting their inclusions as the basis for your critical appraisals? If yes, appraise every article they give a high score to.

If they appear to have only included studies they grade high:
a) Review their methods for grading — do you agree?
b) From the included studies, select a study or two that is deemed of high quality and a study or two of the lowest quality, and critically appraise your selection yourself. There will probably be agreement over a low quality study — but since you want to determine if low quality studies are being misgraded high, it’s their high ground you want to head for.
c) If you are in agreement over which you have audited reach a high quality grade, do a face validity check of the number of studies getting high marks. Since probably over 80% of the medical literature achieves a Delfini Grade U for uncertain validity and/or usefulness, there are likely to be very few studies that are worthy of inclusion in any review.

And then, let us pray for a quality source we can all trust.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Review of Cochrane Groups’ Assessment of Bias in Studies

The Cochrane Collaboration publishes systematic reviews developed by various groups and, after peer review, are edited by Cochrane Review Groups who use their own assessment recommendations as well as those published by the Cochrane Handbook of Systematic Reviews of Interventions. An important study from the Nordic Cochrane Centre has examined how the different Cochrane review groups currently recommend assessment and handling of the risk of bias in studies evaluated for inclusion in systematic reviews. The authors focus on the use of study components versus numerical scales, and suggest possible improvements[1] as there is significant variation in the quality of approaches by Cochrane groups. Some key points from the study are presented below.

It is important in developing systematic reviews to evaluate for bias each study being considered for inclusion in the review. There are four major areas of bias that should be considered: selection bias, performance bias, attrition bias and assessment bias. Anything that leads away from "truth" is a bias, excepting for chance effects. The reason for this assessment is that biased studies are likely to result in unreliable estimates of effect (ie, misleading results). Unreliable conclusions then lead to inaccurate predictions of outcomes of benefits and harms for users. Assessing studies for bias has generally been done by Cochrane groups using two different approaches, namely —

  • Component Approach, meaning assessing the study components for bias: Evaluating the methodological areas of each study for bias, e.g., details of randomization, blinding, similarity of care experiences except for the intervention being studied, handling of drop-outs, methods for calculating outcomes. This approach is supported by empirical evidence.
  • Scale Approach, meaning assessing the study by assigning an overall quality score: This requires providing numerical values to validity threats. The Jadad scale is an example of this approach [2]. The use of scales is not well-supported by empirical research. For example, Jüni et al[3] used 25 existing scales to identify high-quality trials, and found that the effect estimates and conclusions of the same meta-analysis varied substantially with the scale used. The Cochrane Collaboration advises against use of scales for assessing studies.

The authors from the Nordic Cochrane Centre examined the instructions to authors of the 50 Cochrane Review Groups that focus on clinical interventions for recommendations on methodological quality assessment of studies.

The following table summarizes the main findings.

Groups recommending a component approach, 41/50 (82%)
Groups recommending a scale approach, 9/50 (19%)
23 groups had their own checklists ranging from 4 to 23 items
5 groups recommend Jadad
Areas Recommended for Assessment by Component Group (%)
Areas Recommended for Assessment by Scale Group (%)
Sequence Generation, 26 (63)
9 (100)
Concealment of allocation, 41 (100)
2 (22)
Blinding of patients, 33 (80)
9 (100)
Blinding of caregivers, 32 (78)
1 (11)
Blinding of outcome assessors, 39 (95)
9 (100)
Follow-up, 38 (93)
9 (100)
Intention-to-treat analysis, 20 (49)
1 (11)
Recommendations for using quality assessments of individual studies in
reviews by Component Group (%)
Recommendations for using quality assessments of individual studies in
reviews by Scale Group (%)
Analytical Approach, e.g, sensitivity analysis to test if including only trials of higher methodological quality changes the effect estimates, 20 (49)
8 (89)
Descriptive Approach, 1 (2)
0 (0)
No information, 20 (49)
1 (11)
Recommendations for Type of Analysis by Component Group
Recommendations for Type of Analysis by Scale Group
Sensitivity analysis, 17 (85)
7 (88)
Threshold, 4 (20)
3 (38)
Subgroup analysis, 4 (20)
3 (38)
Cumulative analysis, 1 (5)
2 (25)
Weights, 1 (5)
1 (13)
Meta-regression, 0 (0)
1 (13)

Authors’ Conclusions

  • Cochrane Reviews are undertaken by authors with different levels of methodological training, and following Cochrane Review Groups’ guidelines for assessing bias can be problematic if the guidelines are not in accordance with the empirical research on bias.
  • Despite advising against scales, the Cochrane Handbook actually recommends a ranking scale. The scale distinguishes between low risk of bias (all criteria met), moderate risk of bias (one or more criteria partly met) and high risk of bias (one or more criteria not met). However, the Cochrane Handbook does not specify the criteria to be used. Instead it states that the criteria used should be few and address substantive threats to the validity of the study results.
  • The Jadad scale consists of three items, and up to two points are given for randomization, two for double blinding and one for description of withdrawals and dropouts. An overall score between zero and five is assigned, where three is commonly regarded as adequate trial quality. The Jadad scale is problematic for a variety of reasons (we agree and present some of our observations below). Also, studies have shown low interrater agreement, particularly for withdrawals and dropouts, where kappa values below zero have been reported, which is an agreement that is worse than that expected by chance.
  • The Cochrane Handbook recommends analyzing all data according to the intention-to-treat principle using different methods without specifying what those methods should be. The Handbook should give clearer recommendations to ensure a more homogeneous methodology.
  • The authors list multiple errors made by the various Cochrane groups — for example, the grading system recommended by the Back Group has five levels of evidence and was developed using a consensus method. Consistent findings among multiple, low-quality non-randomized studies are considered to be the same level of evidence as one high-quality randomized trial, which is not in accordance with findings from empirical studies or with the Cochrane Handbook. The four-level grading system used by the Musculoskeletal Group is also based on consensus and is also highly problematic. The system is based on arbitrary cut-points such as sample size above 50 and more than 80% follow-up, which are not based on empirical evidence. The only difference between platinum and gold evidence is that there needs to be two randomized trials for platinum and one for gold, which is not reasonable as, for example, the platinum trials could involve 60 patients each and the gold trial, 500 patients. Silver level can be either a randomized trial with a 'head-to-head' comparison of agents or a high-quality case-control study, which is not supported by empirical research, and bronze level can be a high-quality case series without controls or expert opinion.

Delfini Comment on the Jadad Scale

We have long had issues with the Jadad Scale.

  • Points can be awarded merely for reporting rather than giving appropriate attention to methodological quality.
  • For randomization, the scale addresses explicitly the sequence generation, but not concealment of allocation of the sequence.
  • The scale does not address intention-to-treat analysis among many other considerations. Therefore, randomized trials with an appropriate randomization sequence, but with no concealment of allocation, with large numbers of dropouts that are well described, using only a per-protocol analysis and having myriad other biases such as differences between groups, may be scored as of the highest methodological quality (five points).

After years of doing critical appraisals involving thousands of studies, we feel confident in stating that no scale can be developed that is sufficient to evaluate studies. Instead an understanding of critical appraisal concepts combined with critical and clinical thinking is required.

Delfini Approach to Secondary Studies

As to use of systematic reviews, Delfini frequently starts evidence-based reviews and quality improvement projects by checking Cochrane for recent systematic reviews. We have over the years noted a great deal of variation in the quality of the reviews (and have reported this to Cochrane). This report from the Cochrane Nordic Centre is further evidence that we should keep doing what we have been doing—auditing Cochrane reviews to be sure that the included studies are of high quality and not just accepting the results and conclusions of all reviews as valid because they come from a “most-trusted source.”

This is how we approach use of systematic reviews:

  • The critical appraisal process for secondary studies includes a number of elements, including a review of the systematic search methods employed and an assessment of whether the secondary source includes only valid and clinically useful primary sources. To assess the latter, the source is reviewed to attempt to determine whether critical appraisal of the content from included studies was performed, along with an attempt to assess the skill-level of the appraisers and the robustness of their review. One or two of included primary studies considered to be of the highest quality are critically appraised for validity and usefulness as an audit by Delfini. If these studies pass the audit, one or two included primary studies of the lowest quality are critically appraised as well. If these lower quality studies also pass, it is assumed that the authors have employed good critical appraisal techniques.
  • If the source passes an audit for validity and usefulness, the source’s efficacy and safety conclusions are used in the Delfini evidence synthesis and new research published following the date of the source’s search strategy is sought.
  • If the source does not pass the audit for validity and usefulness, but has utilized a sound search strategy and sound criteria for excluding efficacy studies lacking relevance, validity or for other problems, all the primary studies selected for inclusion by the source are critically appraised, and valid, useful studies form the basis of the Delfini review, which will then be updated with any new valid and clinically useful primary studies published since the date of the secondary source’s search.

We applaude the Nordic Cochrane Group for their efforts in helping to improve an important resource.

1. Lundh A, Gotzsche PC. Recommendations by Cochrane Review Groups for assessment of the risk of bias in studies. BMC Med Res Methodol. 2008 Apr 21;8(1):22 [Epub ahead of print]. PMID: 18426565

2. Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996 Feb;17(1):1-12. PMID: 8721797

3. Jüni P, Witschi A, Bloch R, Egger M: The hazards of scoring the quality of clinical trials for metaanalysis. JAMA 1999, 282:1054-60. PMID: 10493204.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Bell’s Palsy Update
02/11/08 & 12/03/2010 for
instructional slides to demonstrate how to perform a conservative intention-to-treat (ITT) analysis. [Note: This is actually a primary studies issue, but is stored here for historical reasons as our Bell's Palsy journey began with a narrative review.]

Things have changed. It now appears that early treatment with prednisone most likely does benefit patients with Bell’s Palsy. In the past, we were critical of narrative reviews published in the BMJ and the NEJM because — as had been nicely pointed out by a Cochrane review — there were no high quality RCTs demonstrating improved outcomes with steroids in the treatment of acute Bell’s Palsy. (See our letters in the boxes below.)

Sullivan et al. (N Engl J Med 2007;357:1598-607. PMID: 17942873) have now presented probably valid and clinically useful data to change our conclusions about the use of steroids in Bell’s Palsy. In a 4-arm 9 month study comparing 1) prednisolone + placebo versus 2) acyclovir + placebo versus 3) prednisolone + acyclovir versus 4) 2 placebo capsules, the investigators report rates of complete recovery of 94.4% for patients who received prednisolone and 81.6% for those who did not, a difference of 12.8 percentage points (95% CI, 7.2 to 18.4; P<0.001). They concluded that, “in patients with Bell’s palsy, early treatment with prednisolone significantly improves the chances of complete recovery at 3 and 9 months.”

Critical Appraisal of the Sullivan Study
We critically appraised this study and found the following threats to validity:

  • Would have preferred more details of randomization.
  • No mention of co-interventions or allowed/disallowed concommitnant Rx.
  • 7 patients assigned to prednisolone received the wrong drug.
  • 3 patients assigned to placebo received the wrong drug.
  • In the prednisolone + placebo group, the loss to follow-up was 11/138=7.9%; in the double placebo group the loss was 19/141=13.5%. For the acyclovir + prednisolone group the drop out rate was 10/124=8%, and for the acyclovir + placebo the dropout was 15/123=12%. Total loss = 55 / 526 = 10.5%.

True ITT analysis not done and an adequate sensitivity analysis was not done by the authors. Therefore we “redid” the analysis using the following assumptions:

  • No Predisolone Group: We applied the percent-recovered rate in the no prednisolone group (control event rate) to those missing or who discontinued (which agreed with statistics on the natural history of the condition), excepting those who sought active treatment who were counted as treatment failures.
  • Prednisolone Group: We failed all who were missing and those who sought active treatment.

Our reanalysis yielded a statistically significant difference between those subjects receiving prednisolone and those who did not, at a p-value of 0.0487.

Comment: Because of author’s analysis, reported efficacy results are likely to be inflated, but the evidence suggests that prednisolone is likely to be an effective agent for adults with acute onset of Bell’s palsy when administered within 72 hours of onset. Because of loss to follow-up some uncertainty remains.

Grade B-U: Possible to uncertain usefulness

  • The evidence might be sufficient to use in making health care decisions; however, there remains sufficient uncertainty that the evidence cannot fully reach a Grade B and the uncertainty is not great enough to fully warrant a Grade U.
  • Study quality is such that it appears likely that the evidence is sufficient to use in making health care decisions; however, there are some study issues that raise continued uncertainty. Health care decision-makers should be fully informed of the evidence quality.

12/03/2010: See the story in pictures. PDF of our instructional slides on Intention-to-Treat Recalculation of Sullivan 2007, PMID: 17942873, Prednisolone for Treatment of Bell's Palsy

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Delfini Letter to BMJ: Corticosteriods are Not Proven for Treatment of Bell's Palsy (Or are they? See update here.)

The cover feature of the September 4, 2004 BMJ is about improving outcomes in Bell's palsy. Yet they publish an invalid study — Holland JN, Weiner GM, Recent developments in Bell's palsy. BMJ 2004;329:553-557 PMID 15345630 (4 September) Full Text

Despite a good Cochrane review that exposes some fatal flaws in prior research, the above reference review not only includes this flawed study, but excludes a valid one. Here is our response to BMJ (reprinted below):

Corticosteroids are Not Proven for Treatment of Bell's Palsy

Editor—Holland and Eisner support the use of corticosteroids for treatment of moderate to severe facial palsy based on two systematic reviews [1,2]. We believe that such a conclusion is not justified by the medical evidence presented by the authors and that the two systematic reviews are fatally flawed.

In the Ramsey review, three studies met the authors’ criteria for validity [3,4,5], but in Ramsey’s analysis the study by May et al.[3]—a valid study—was excluded because it was an “outlier”, i.e., the results were not consistent with the other two studies included in the review. Excluding a study on the basis of results rather than methodology is inappropriate. Results from non-valid studies should not be utilized in decision-making.

May et al. reported that corticosteroids resulted in poorer facial recovery than placebo. It should be noted that the quality index score of the excluded (May) study was better than one of the studies included in Ramsey’s review. If the May et al study is not excluded, the results do not support the authors’ conclusion of benefit from corticosteroid treatment.

It should also be noted that a Cochrane systematic review [6]included the May study as a valid study, but excluded the Austin study because 29 percent of subjects in the Austin study were lost to follow-up. Cochrane also excluded the Shafshak study because it was a non-randomized study.
As pointed out by the Cochrane group, the Grogan “practice parameter” is probably invalid because Grogan included the Shafshak and Austin studies and that if these two trials were excluded from the pooled estimate, the results were no longer in favor of steroids for the treatment of Bell’s palsy.

  1. Grogan PM, Gronseth GS. Practice parameter: steroids, acyclovir, and surgery for Bell's palsy (an evidence-based review): report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2001;56: 830-6
  2. Ramsey MJ, DerSimonian R, Holtel MR, Burgess LP. Corticosteroid treatment for idiopathic facial nerve paralysis: a meta-analysis. Laryngoscope 2000;110: 335-41
  3. May M, Wette R, Hardin WB Jr, Sullivan J. The use of steroids in Bell’s palsy: a prospective controlled study. Laryngoscope 1976; 86:1111–1122
  4. Austin JR, Peskind SP, Austin SG, Rice DH. Idiopathic facial nerve paralysis: a randomized double blind controlled study of placebo versus prednisone. Laryngoscope 1993; 103:1326–1333.
  5. Shafshak TS, Essa AY, Bakey FA. The possible contributing factors for the success of steroid therapy in Bell’s palsy: a clinical and electrophysiological study. J Laryngol Otol 1994; 108:940–943.
  6. The Cochrane Database of Systematic Reviews The Cochrane Library, Copyright 2003, The Cochrane Collaboration: Corticosteroids for Bell's palsy (idiopathic facial paralysis) [Review] Salinas, RA; Alvarez, G; Alvarez, MI; Ferreira, J Date of Most Recent Update: 26-November-2001. Date of Most Recent Substantive Update: 15- October-2001.

November 5, 2004 Update: BMJ Responds

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Delfini Letter to NEJM: Bell's Palsy - REDUX! (see update here)

Is it something about the pull of the moon? No sooner do we write a letter to the BMJ about flaws in a recent article on Bell's Palsy, that the New England Journal of Medicine publishes a review based on some of the same poor research--Gilden DJ. Clinical practice. Bell's Palsy. N Engl J Med. 2004 Sep 23;351(13):1323-31. No abstract available. PMID: 15385659 [PubMed - in process]

We disagree with Gilden who concludes that that “the data suggest that glucocorticoids decrease the incidence of permanent facial paralysis…” What does Gilden base this conclusion on? Answer: An observational study, a RCT with 29 percent of subjects lost to follow-up and the American Academy of Neurology’s practice parameter stating that early treatment with corticosteroids is “probably effective”. The Cochrane group has pointed out that the American Academy of Neurology’s practice parameter is probably invalid because the Academy's evidence synthesis included a fatally flawed trial and a non-randomized trial. We concur with the Cochrane group and believe corticosteroids have not been demonstrated to be effective in Bell’s palsy. Cause and effect conclusions cannot be reliably drawn from observational studies and results from non-valid studies should not be utilized in drawing conclusions regarding effectiveness and making treatment decisions.

This is also another reminder about the problems with systematic reviews. See our systematic review appraisal tool and our tips sheet: Systematic Review Validity Tool & The Problems with Narrative Reviews

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Substandard Evidence: POEMS and Diabetes: Readers, Beware!

Allen Shaughnessy and David Slawson, EBM pioneers who in 1994 gave us POEMS (Patient-oriented evidence that matters) and DOEs (disease-oriented outcomes) have published in the BMJ an article dealing with how experts represented the results of a clinically useful study — the UKPKS study. Full text is available at http://bmj.com/cgi/content/full/327/7409/266.

The reason the authors picked the UKPKS study is that it is a good example of patient-oriented evidence that matters — it evaluated the effect of intensive blood glucose control on various outcomes that matter:

  • Tight control did not prevent premature mortality.
  • Metformin decreased mortality and diabetes-related outcomes in overweight patients.
  • Tight BP control decreased complications (greater effect than blood glucose control).
  • Quality of life was not affected by tight glucose control.

Shaughnessy and Slawson systematically reviewed reviews that met their inclusion criteria (see http://www.delfini.org/Delfini_Tool_SR.doc).

They found that:

Information that tight glucose control does not change overall mortality or diabetes related mortality was mentioned in only 6/35 reviews.

  • Only 14 of the reviews mentioned the effect of metformin on diabetes-related outcomes in overweight patients.
  • 17 of the reviews did not mention the need for BP control in patients with diabetes.
  • Only 5 reviewers reported that diabetic patients with hypertension benefit more from BP control than glucose control.
  • Only 7 of the reviews reported that ACE inhibitors and beta blockers were equivalent as starting drugs for hypertension.

Implications from this review: Important POEMs are missing from recent diabetes review articles. Narrative reviews and expert reviews remain problematic in that these reviews represent major vehicles for transmitting research to clinicians, and often these reviews are of poor quality. Clinicians may be getting from expert reviews what these authors call PROSE—"Prescriptive recommendations based on substandard evidence."

This is another reminder that systematic reviews are a good place to start when looking for valid, relevant information. Reader beware of expert reviews which continue to appear in the best journals.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Quality of Systematic Reviews: Misleading “POEM” on Hormone Therapy

Here's a letter Dr. Brian Alper and Delfini submitted to the American Family Physician which was not accepted, but we think makes some very important points:

Misleading “POEMs and Tips” item on hormone therapy

TO THE EDITOR: As “Tips from Other Journals” was found to be one of your four most popular departments[1], it has a substantial impact on disseminating research results.

We would like to alert your readers to a recent “Tip” that increases the dissemination of a flawed meta-analysis concluding that hormone replacement therapy (HRT) reduced mortality in women less than sixty years old [2,3]. This particular meta-analysis may “get past” some critical appraisal screens because it describes a comprehensive search and seemingly appropriate inclusion and quality criteria. However, detailed appraisal finds four fatal flaws in this meta-analysis.

First, the subgroup analysis addressing younger women was based on trials with a mean age less than sixty years (n = 4,141 women). The use of mean age for each included trial rather than actual age of included women has been previously reported as a significant flaw[4]. The method of subgroup analysis used does not account for most of the available data on women less than sixty years old. The Women’s Health Initiative trial alone provided data on HRT in a population that had a mean age of 63.3 years but included 5,522 women aged 50-59 years.

Second, the authors of the meta-analysis evaluated the methodology of the studies they included but did not fully consider the impact of study methods on their results. For example, seven of the seventeen trials in the subgroup analysis failed to meet the quality criteria of being double-blinded. All seven of these trials had results that tended to favor HRT, while only two of the ten double-blinded trials had results that tended to favor HRT.

Third, a meta-analysis must provide appropriate weights to individual studies included in the analysis. This usually is done by providing greater weights to larger studies, basing the weight on the denominator or sample size. In this meta-analysis, weights were based on the number of deaths reported. For example, a study with 406 patients and 1 death contributed 1.9% of the meta-analysis summary statistic, while a study with 130 patients and 73 deaths accounted for 10.3% of the final results for trials with mean age less than sixty years.

Fourth, the study accounting for 10.3% of this result [5] was a trial involving postmenopausal women with ovarian cancer (mean age 51 years), excluding those with low malignant potential. Including (and emphasizing) such a highly select population invalidates the use of this analysis for conclusions regarding the general population.

Any one of these fatal flaws is sufficient to invalidate this conclusion. The evidence does not support the concept of lower mortality from HRT use in younger women.

Brian S. Alper MD, MSPH
Editor-in-Chief, DynamicMedical.com
Research Assistant Professor in Family and Community Medicine, University of Missouri-Columbia

Michael Stuart, MD
President, Delfini Group, LLC
Clinical Assistant Professor, University of Washington School of Medicine

Sheri Ann Strite
Managing Director & Principal, Delfini Group, LLC

1. Merriman JA. Reader surveys help determine AFP’s direction. Am Fam Physician 2005;71:1459.
2. Wellberry C. Does beginning HT earlier decrease mortality? Am Fam Physician 2005;71:1594. (page number may be wrong, accessed via web at http://www.aafp.org/afp/2005/0415/p1600.html)
3. Salpeter SR, et al. Mortality associated with hormone replacement therapy in younger and older women. J Gen Intern Med July 2004;19:791-804.
4. Furberg CD, Psaty BM. Review: hormone replacement therapy may reduce the risk for death in younger but not older postmenopausal women. ACP J Club 2005;142:1.
5. Guidozzi F, Daponte A. Estrogen replacement therapy for ovarian carcinoma survivors: A randomized controlled trial. Cancer 1999;86:1013-8.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Summarizing the Strength of the Evidence: AHRQ-EHCP Grading System

Most groups who systematically review the medical literature have an approach to grading the quality of individual studies and also rating the quality for specific outcomes after evaluating the totality of evidence.

The AHRQ-EHCP (the Agency for Healthcare Research and Quality and the Effective Health Care Program group) grading methodology is worth knowing about.[1] AHRQ-EHCP utilizes four domains when rating the overall strength of the evidence (SOE). These domains were selected after reviewing grading methodolgies used by the U.S. Preventive Services Task Force (USPSF), [2] the GRADE working group, [3] and other evidence-based practice centers.[4,5].

Briefly, The AHRQ-EHCP approach assesses the risk of bias, consistency, directness and precision for each outcome or comparison of interest after rating each study or key outcome from each study for bias (paraphrased, in some instances, below):

  • Bias: each study is scored based on study design and methodology, and the aggregate of studies is rated for an overall “risk of bias” score. Aggregate risk of bias is scored as low, medium, or high. The aggregate quality of studies is rated as good, fair or poor.
  • Consistency (the degree of similarity of effect sizes of included studies) is scored as consistent, inconsistent or unknown/not applicable.
  • Directness is the linkage between the intervention and health outcomes scored as direct or indirect (meaning intermediate or surrogate outcome measures which may or may not be valid measures for clinical usefulness).
  • Precision concerns the ability to draw a clinically useful conclusion from the confidence intervals. An imprecise estimate, for example, is one for which the confidence interval is wide enough to include clinically distinct conclusions (e.g., favoring both the interventions being compared).

AHRQ-EHCP—like GRADE—has additional domains which may be included when rating evidence:

  • Dose-response associations (present or not present)
  • Plausible confounding that would increase or decrease effect (present or absent)
  • Strength of association (magnitude of effect)
  • Publication bias (not necessary to formally score)

The AHRQ-EHCP overall strength of evidence (SOE) for each outcome of interest includes three grades—high, moderate, low and inconclusive. For example, if the SOE is high, further research is unlikely to change confidence in the estimate of effect. If evidence is unavailable or does not permit a conclusion, the outcome in the AHRQ EHCP system is graded as insufficient.

Delfini Modification
We use our usual grading system for individual studies (A, B, BU, U). When we use our modified AHRQ-EHCP system for rating the overall level of evidence (LOE), we modify it by adding a fourth category —“borderline” to increase clarity as we believe that “moderate” is not precise enough to address evidence of borderline usefulness. And we prefer the term “inconsistent” to “insufficient.”

Table 1. AHRQ-EHCP and Delfini Evidence Grading Methodologies

AHRQ-EHCP Evidence Grading and Strength of Evidence Methodology

Delfini Evidence Grading and Level of Evidence Methodology

For Each Outcome

  • Bias: Each study rated and an aggregate risk of bias level is selected from low/ medium/ high risk. The aggregate quality of the studies under consideration is rated as good/fair/poor

  • Consistency

  • Directness

  • Precision

Each study or outcome: A, B, BU, U for validity and usefulness

Overall SOE: high, moderate, low, insufficient

Overall LOE: high, moderate, borderline,  inconclusive

Table 2. Examples AHRQ-EHCP Strength of Evidence (SOE) Ratings

Number studies; N

Risk of Bias












Improved Quality of Life






High SOE


1. Owens DK, Lohr KN, Atkins D, et al. Grading the strength of a body of evidence when comparing medical interventions-Agency for Healthcare Research and Quality and the Effective Health Care Program. J Clin Epidemiol. 2009 Jul 10.

 2. Sawaya GF, Guirguis-Blake J, LeFevre M, Harris R, Petitti D; U.S. Preventive Services Task Force. Update on the methods of the U.S. Preventive Services Task Force: estimating certainty and magnitude of net benefit. Ann Intern Med. 2007 Dec 18;147:871–5.

 3. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008;336:924e6.

 4. West S, King V, Carey TS, et al. Systems to Rate the Strength of Scientific Evidence. Evidence Report/ Technology Assessment No. 47 (Prepared by the Research Triangle Institute- University of North Carolina Evidence-based Practice Center under Contract No. 290–97–0011). AHRQ Publication No. 02–E016. Rockville, MD: Agency for Healthcare Research and Quality; 2002.

 5. Treadwell JR, Tregear SJ, Reston JT, Turkelson CM. A system for rating the stability and strength of medical evidence. BMC Med Res Methodol 2006;6:52.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

A Caution When Evaluating Systematic Reviews and Meta-analyses

We would like to draw critical appraisers' attention to an infrequent but important problem encountered in some systematic reviews—the accuracy of standardized mean differences in some reviews. Meta-analysis of trials that have used different scales to record outcomes of a similar nature requires data transformation to a uniform scale, the standardized mean difference (SMD). Gøtzsche and colleagues, in a review of 27 meta-analyses utilizing SMD found that a high proportion of meta-analyses based on SMDs contained meaningful errors in data extraction and calculation of point estimates.[1] Gøtzsche et al. audited two trials from each review and found that, in 17 meta-analyses (63%), there were errors for at least 1 of the 2 trials examined. We recommend that critical appraisers be aware of this issue.

1. Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007 Jul 25;298(4):430-7. Erratum in: JAMA. 2007 Nov 21;298(19):2264. PubMed PMID:17652297.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.


Contact UsCONTACT DELFINI Delfini Group EBM DolphinDelfini Group EBM Dolphin

At DelfiniClick™

EBM Dolphin

Read Our Blog...

Use of our website implies agreement to our Notices. Citations for references available upon request.

What's New


Delfini Group Publishing

Sample Projects
About Us & Our Work
Site Search

Contact Info/Updates

Quick Navigator to Selected Resources



Return to Top

© 2002-2020 Delfini Group, LLC. All Rights Reserved Worldwide.
Use of this website implies your agreement to our Notices.

EBM Solutions for Evidence-based Health Care Quality