Delfini Evidence-based Medicine Logo

 Delfini Evidence-based medicine essentials: tips at a high level view

New! Announcing the Delfini Summer Seminar
July 28 & 29, 2010—Portland OR

Also see DelfiniClick™ for a collection of more in depth commentaries. See here for freely available tools.

About PMID Numbers: We frequently utilize a PMID number in place of a citation. Where PMID numbers are available, enter that number into the PubMed search box to retrieve that citation and listing.

See below for —

Delfini on Evidence-based Medicine (EBM)

From Delfini's perspective, evidence-based medicine is all about effective use of science in health care.

Delfini's Definition of Evidence-based Medicine

"Evidence-based medicine is the use of the scientific method and application of valid and useful science to inform health care provision, practice, evaluation and decisions."

The use of science is required to help reduce medical uncertainty, increase predictability and inform about the probability of benefit or harm to whom.

There are many factors that do — and should — guide medical decision making. We advocate starting with reviewing the available science and only using valid studies with useful outcomes. Once we are informed about the science, we should take other factors, such as cost, the patient perspective (e.g., benefits, harms, alternatives, costs and uncertainties), clinician and patient satisfaction, and other triangulation issues (e.g., regulatory issues, PR, medical community impacts, marketing issues, medical-legal issues, issues of purchasers, liability and risk management, cost, community standards, accreditors, press, overall impact on the health care organization, etc.), into account.

We believe it is fine to make decisions on factors other than science. Our position is simply that we should know the science first — and not confuse opinion or other factors with science.

Evaluate the science first — then you can "throw it over the wall" into the realm of other decision considerations (the other triangulation issues).

Practicing an evidence-based approach means to us -

Delfini's Hallmarks of Evidence-based Medicine

1) When seeking information on a topic, a systematic search is conducted for science and science-based information using evidence-based searching and filtering techniques.

2) All sources of information to guide medical decision-making are critically appraised, using science-based principles, for validity and usefulness.

3) Any conclusions drawn from the science are carefully crafted to be as valid as possible.

4) Methods used and reporting are transparent so that the work can be evaluated for quality, replicated and updated.

5) Clinical information sources are updated when significant new information becomes available and such information is periodically sought.

We then apply evidence-based medicine to clinical quality improvement. To learn more about the Delfini approach to clinical quality improvement & value, read about the Delfini Evidence- and Value-based Clinical Quality Improvement Model here.

More on Evidence-based Medicine for consumers and their clinicians.

The Need for Critical Appraisal Skills

We believe that everyone involved in health care decision-making that affects patients needs basic critical appraisal skills. Why?

  • Poor and misleading information is published even in the best medical journals. And there are no reliable resources to meet even half of your health care information needs. Read about the problems here.
  • You want to be able to know immediately whether health care information you are getting comes from a study design that makes it impossible to draw cause and effect conclusions.
  • You want to be able to quickly categorize the many studies you see or hear about into "potentially useful and worthy of a closer look" — or a waste of time.
  • You want to understand why some sources are "trustable" and others are not.
  • And even if there was more trustable information available, we believe you would still need basic critical appraisal skills as a core competency, if you are involved in decisions affecting patient care.
    • Even with trusted sources, you would still get "had" by poor and misleading information without these skills.
    • Bad information will continue to be published.
    • Thinking "case series" and experientially is intuitive; scientific thinking is not — you would still continue to apply "case series" thinking.
    • You would continue to get misleading information from your patients, your colleagues, the "experts," manufacturers, the news and "urban legends."
    • You need to be able to communicate about science and why some treatments are better than others — this is not only for communicating with your patients, but also friends, family, reporters and lawyers — not to mention your colleagues and others.
    • Critical appraisal skills are important for your professionalism and useful in life generally.

It does not have to be that hard. It makes medicine more interesting — and can even be fun. Not to mention, improve care, help patients and help prevent waste of resources — which can result in harms. Read below for more EBM tips.

The 5 "A"s of Evidence-based Medicine

Evidence-based medicine is important to us to answer the following questions:

Here are the EBM process steps at a high level:

EBM health care quality improvement steps

Modified by Delfini Group, LLC (www.delfini.org) from Leung GM. Evidence-based practice revisited. Asia Pac J Public Health. 2001;13(2):116-21
.............
Anatomy of an Article - Delfini-style

EBM Article Advice

.............
The Hunt for Usable Evidence

EBM Steps to Evaluate Medical Science

.............
The Steps in Critical Appraisal

At a high level, there are three for all primary studies:

  1. What is the study design? Important for seeking valid and useful evidence.
  2. What is the validity of the individual study?
  3. What are the results? Is this going to be useful, usable information?

You want "usable evidence."

"Usable evidence" = Validity + Effectiveness + Appropriateness + Usability.

Critical Appraisal Steps

.............
Filtering for Strength of Study Design (Evidence Grading)

"Best Available" evidence requires valid (meaning "probably true") and clinically useful evidence. It is not "evidence" until it has passed a rigorous critical appraisal test.

Medical evidence

.............
Quick EBM Tip: How to Tell the Difference Between an Experiment and an Observation Study

Practically speaking, most health care professionals do not need know the difference between a cohort study or a case/control study. However, we all need to be able to distinguish an experiment from an observational study so that we know the right study design to answer our clinical question. The above graphic shows what kinds of questions are to be answered by what kind of study design. For therapy, screening and prevention, we always want to use valid and useful information resulting from experiments, such as randomized controlled trials.

The quick tip. Did the patient or his or her physician CHOOSE the treatment? If yes, this is an observational study, and we are simply observing what happens naturally. If the treatment was assigned, this is an experiment.

Reminder that cause and effect conclusions cannot be drawn from observations, but only experiments. Observational studies can only show associations — not all associations are cause and effect.

.............
The Problems with Case Series (to name but a few...)

Case series are reports of interventions with NO COMPARISON group. They can be useful for generating ideas for studies, but they can be very harmful if applied as evidence! Unfortunately, physicians and others are often misled by their results!

This issue is so important we have the text below available as a [PDF]. Hand one out to everyone you know!

Problems with Case Series
Definition
A group of patients receives an intervention and outcomes are assessed. There is no appropriate comparison group. (Historical controls are sometimes used, but this is not an appropriate comparison group.)
Key Points
Case series is not evidence – unless you have “all-or-none results” which is rare — and there is still potential for bias. Case series can be useful for hypothesis generation.
Key Problem:
There is no comparison group.
Patients frequently improve after a medical visit, but outcomes might otherwise be the same whether a treatment is administered or not (see Observational Bias below). Comparisons reveal associations by exposing differences. Lack of a comparison group can make it appear as if there is an association between an intervention and an outcome when, in fact, there is not.
Conclusions
Case series can be useful in describing a clinical condition or to generate ideas for study. However, because of the above mentioned biases, case series can almost never be relied upon to draw conclusions between interventions and outcomes. Rarely, conditions where morbidity or mortality is nearly 100 percent and, with the intervention, is decreased dramatically, case series may be sufficient to draw conclusions about the effect of the intervention on outcomes – but it has to be emphasized that this is extremely rare.

 

Discussion

Bias will always be present in case series.

  • Selection bias will always be present in case series.
    • There are usually no criteria for patient selection.
    • Frequently cases are not consecutively selected.
    • Clinicians usually report on those patients with the best outcomes.
  • Observation bias will always be present in case series.
    • When there is no blinding, clinician beliefs in or hopes for an intervention can affect outcomes – resulting in performance bias.
    • Assessment bias often occurs because lack of consecutively selected patients can result in selective reporting favoring the intervention.
  • Key Point —» Patients frequently improve after a medical visit, but outcomes might otherwise be the same whether a treatment is administered or not (see above). Therefore, without a comparison group, almost any intervention will appear to be beneficial and attributed to medical care when, in fact, improvement may be due to --

    a) The self-limited nature of the condition,
    b) Placebo effect,
    c) Regression-to-the-mean – meaning that extreme test values are statistically likely to move to an average over time. When patients present with extreme values and then seem to have improvement, it may be falsely attributed to an intervention. A comparison group with no intervention can help expose this effect. OR
    d) Coincidence (chance).
  • There are reporting problems resulting from case series.
    • Due to publication bias, negative results are almost never reported (the reporting of which would still present its own problems since a negative-finding case series would be highly prone to bias for the above reasons).
    • Authors of case series frequently compare their results to those of other case series. There is always the possibility that the authors will select case series for comparison that show their results in the best light.

The Problems with Narrative Reviews (aka Overviews)
Narrative reviews can be very biased and misleading. Here is why. [PDF] is also available for circulation to people who can be helped by this.

Problems with Narrative Reviews (aka Overviews)
Definition
Narrative reviews are “evidence-round ups” on specific health care topics – but ones which do not necessarily follow systematic evidence-based criteria. Systematic reviews (including meta-analyses) can be contrasted with narrative reviews – or overviews – in that they generally follow a specific set of evidence-based criteria (and yet, still require critical appraisal to determine if they have met these standards.)
Key Points
Narrative reviews often do not meet important criteria to help mitigate bias – frequently they lack explicit criteria for article selection and frequently there is no evaluation of selected articles for validity, as examples. Only rigorously applied evidence-based methods can help move toward predictability for health care outcomes.
Key Problem:
Big Potential for Bias
There is high potential for low methodological quality. Authors frequently have expert opinions (and biases) and find studies to support their positions (selection bias).
Conclusions
Review articles can be useful for summarizing the literature and providing guidance provided they are of high methodological quality. However, because many reviews are not done in a systematic way, they should not be relied upon to draw conclusions about effective care.

 

Discussion

Overviews or narrative reviews are frequently published in the best medical journals. Many clinicians rely upon these reviews since they are considered a current “roundup” of evidence and are accompanied by “useful” recommendations from the (expert) author. Authors frequently have expert opinions (and biases) and find studies to support their positions (selection bias). You need to be sure the review was done using a systematic approach.

  • There may be studies showing no effect or harm which are not included in the review.
  • The only way to know if there has been a comprehensive search and critical appraisal of the studies in the review is to see the search strategy and criteria used for study inclusion.

Unless you see the search strategy, criteria for selecting and accepting studies for entry into the review, the information (summary points, conclusions) contained in the review may be invalid. Caution is urged in using any reviews except valid systematic reviews.

Criteria —» If you are looking at a review article that does not pass these criteria you are likely to be wasting your time and drawing invalid conclusions about the best clinical approach (you can get our Systematic Review Appraisal Tool online at our Website.)

  • Was there an attempt to obtain all relevant studies for the review?
  • Does the review state inclusion/exclusion criteria for the studies?
  • Do the criteria address study type, methods, the population studied, intervention and outcomes?
  • Were the studies adequately evaluated for internal and external validity?
    • Population
    • Study type
    • Study Methods = Method of randomization ? Blinded assessment ? Outcomes (benefits, harms and risks) ? Loss to follow-up ? ITT analysis
  • Is there a statement or chart rating/summarizing study quality?
  • Were there several reviewers who agreed on study validity?
  • Was there a summary/synthesis of the evidence?

Quantitative summaries (meta-analysis) may not be possible in some cases because of study heterogeneity. However, systematic reviews may still be able to weight the best studies, e.g., by validity, sample size.

For questions of treatment, prevention and screening, is important to favor valid systematic reviews that utilize Randomized Controlled Trials (RCTs).

Assessing Results

Medical Research Results

.............
Measures of Outcomes — ARR, RRR and NNT: A Visual Explanation
This too is very important (and a wee bit hard to produce here). Our suggestion? Download the [PDF]. Here's another one to give to everyone you know.

EBM Measures of Outcomes

............

Understanding Number-Needed-to-Treat @ The DelfiniClick

Evidence Grading

Introduction
The purpose of grading clinical studies is to condense the results from critical appraisal of a study or studies into a brief “tag” which allows categorization into various levels of quality. The need to know the quality of a study or group of studies derives from a need to accurately predict what outcomes are likely to occur when various health care interventions are chosen by clinicians, patients and others. Higher quality studies are more likely to accurately predict efficacy and safety outcomes than studies of lower quality. It appears that lower quality studies often falsely inflate study results. The table below summarizes the relative amount of exaggeration that may occur with low quality RCTs.

Exaggeration of Differences Between Outcomes in Intervention vs Control Groups (High Quality vs Low Quality RCTs)
To see study citation and abstract, enter the PMID number in the PubMed search window.

Study Area of Concern
Relative Amount of Exaggeration Reference
First Author (PMID)
Generation of Randomization Sequence 51% Kjaergard (11730399)
Concealment of Allocation Up to 51% Schulz (7823387), Kjaergard (11730399), Moher (9746022)
Blinding 17%-44% Schulz (7823387), Poolman (17332104), Kjaergard (11730399)
Assessing outcomes through models 50% or greater Lachin (11018568)

Evidence Grading Systems
There are many systems or schemes for evidence grading. They are all based on identifying study flaws, often referred to as threats to validity and involve evaluation of study design, methodology and results.

Cautions

  • When using any grading system, it is important to review the criteria used for arriving at each grade - these may vary even when the grade “name” is identical.
  • Also, some grading systems may assign misleading quality grades by inflating lower quality or invalid studies.

The Delfini validity and usability grading scale is designed to be easily understood, remembered and flexible to apply. The concepts behind our grading system can be applied to individual studies or conclusions from systematic reviews. It can also be used to rate judgments based on evidence such as clinical recommendations, guidelines, etc.

For more details and grading advice, download our tool for Evidence Grading, Wording Conclusions & Results Tables [WORD] which is also available at Tools: Evidence Tool Set. Read on for Delfini's methods:

Delfini Validity & Usability Grading Scale for Summarizing the Evidence for Interventions

Grades can be applied to individual studies, to conclusions within studies, a body of evidence or to secondary sources such as guidelines or clinical recommendations. General advice is provided below.

NOTES

  • Author’s Conclusions are Uncertain
    If the author’s conclusions are uncertain, it may not be necessary to do a critical appraisal of the study because an uncertain outcome automatically renders an “uncertain” grade designation.
  • Grade B-U Advice
    Because of its use for clinical applications, B-U should be used conservatively. B-U is not a default grade. Rather, it should be used when the study is probably a B and the outcomes are highly likely to be true,
    but it doesn’t quite comfortably reach a Grade B.

 

Grade A:
Useful

The evidence is strong and appears sufficient to use in making health care decisions – it is both valid and useful (e.g., meets standards for clinical significance, sufficient magnitude of effect size, physician and patient acceptability, etc.)

Advice:
Studies achieving this grade should be outstanding in design, execution and reporting with useful information to aid clinical decision-making, enabling reasonable certitude in drawing conclusions.

For a body of evidence:
Several well-designed and conducted studies that consistently show similar results

  • For therapy, screening, prevention and diagnostic studies: RCTs. In some cases a single, large well-designed and conducted RCT may be sufficient; however, without confirmation from other studies results could be due to chance, undetected significant biases, fraud, etc. In such instance the study might receive a Grade A, but the Strength of the Evidence should include a cautionary note.
  • For natural history and prognosis: Cohort studies
Grade B:
Possibly Useful

The evidence appears potentially strong and is probably sufficient to use in making health care decisions - some threats to validity were identified

Advice:
Studies achieving this grade should be of high quality in design, execution and reporting with non-lethal threats to validity and with sufficiently useful information to aid clinical decision-making, enabling reasonable certitude in drawing conclusions.

For a body of evidence:
The evidence is strong enough to conclude that the results are probably valid and useful (see above); however, study results from multiple studies are inconsistent or the studies may have some (but not lethal) threats to validity.

  • For therapy, screening, prevention and diagnostic studies: RCTs. In some cases a single, large well-designed and conducted RCT may be sufficient; however, without confirmation from other studies results could be due to chance, undetected significant biases, fraud, etc. In such instance the study might receive a Grade A, but the Strength of the Evidence should include a cautionary note.
  • Also for diagnosis, valid studies assessing test accuracy for detecting a condition when there is evidence of effectiveness from valid, applicable RCTs.
  • For natural history and prognosis: Cohort studies
Grade B-U:
Possible to uncertain usefulness

The evidence might be sufficient to use in making health care decisions; however, there remains sufficient uncertainty that the evidence cannot fully reach a Grade B and the uncertainty is not great enough to fully warrant a Grade U.

Study quality is such that it appears likely that the evidence is sufficient to use in making health care decisions; however, there are some study issues that raise continued uncertainty. Health care decision-makers should be fully informed of the evidence quality.

 

Grade U:
Uncertain Validity and/or
Uncertain Usefulness

 

There is sufficient uncertainty that caution is urged regarding its use in making health care decisions.

  • Uncertain Validity: This may be due to uncertain validity due to methodology (enough threats to validity to raise concern – our suggestion would be to not use such a study in most circumstances) or may be due to conflicting results.
  • Uncertain Usefulness: Or this may be due to uncertain applicability due to results (good methodology, but questions due to effect size, applicability of results when relating to biologic markers, or other issues). These latter studies may be useful and should be viewed in the context of the weight of the evidence.
  • Uncertain Validity and Usefulness: This is a combination of the above.
  • Uncertainty of Author: If the author has reached a conclusion that the findings are uncertain, doing a critical appraisal is unlikely to result in a different conclusion. The evidence leaves us uncertain regardless of whether the study is valid or not. Critical appraisal is at the discretion of the reviewer.
An example of a critical appraisal report @ The DelfiniClick™ — radiofrequency for the treatment of gastro-esophageal reflux disease example: full story; appraisal only

Intention-to-Treat Analysis — The Biased Case of Migraine

Interntion-to-treat analysis (ITT) is an important consideration in randomized, controlled trials. As described in the CONSORT STATEMENT (http://www.consort-statement.org/), among other things, ITT analysis “prevents bias caused by the loss of participants, which may disrupt the baseline equivalence established by random assignment and which may reflect non-adherence to the protocol.”

ITT analysis is defined as follows in the CONSORT STATEMENT:
“A strategy for analyzing data in which all participants are included in the group to which they were assigned, whether or not they completed the intervention given to the group.”

An easy way to tell if an ITT analysis has been done is to look at the number randomized in each group and see if that number is the same number that is analyzed. Number in should be the same number out — in each group as originally randomized.

And, as you can see, determining whether an analysis meets the definition of ITT analysis or not is incredibly easy. Yet many authors mislabel their analyses as ITT when they are not. In one study, in articles reviewed authors were found to say they had performed an ITT analysis when 47% of the time they had not. (Kruse, R. B Alper et al. Intention-to-treat analysis: Who is in? Who is out? JFamPrac 2002 Nov: (Vol 51) #11)

An article in BMJ dealing with migraine illustrates some important points about ITT analysis and reminds us that authors continue to report outcomes in ways that are highly likely to be biased.

In the Schrader study, 30 patients with migraine were randomized to receive lisinopril and 30 were randomized to placebo. The authors, however, only reported on 55 patients in their so-labeled “intention-to-treat analysis” because of poor compliance. This is not an intention-to-treat analysis.

The following is reported by the authors:

Schrader H, Stovner, LJ, Helde G, Sand T, Bovim G. Prophylactic treatment of migraine with angiotensin converting inhibitor (lisinopril): randomised, placebo controlled, crossover study. BMJ 2001;322:1-5 — article available at — http://bmj.bmjjournals.com/cgi/content/full/322/7277/19.
Results
In the 47 participants with complete data, hours with headache, days with headache, days with migraine, and headache severity index were significantly reduced by 20% (95% confidence interval 5% to 36%), 17% (5% to 30%), 21% (9% to 34%), and 20% (3% to 37%), respectively, with lisinopril compared with placebo. Days with migraine were reduced by at least 50% in 14 participants for active treatment versus placebo and 17 patients for active treatment versus run-in period. Days with migraine were fewer by at least 50% in 14 participants for active treatment versus placebo. Intention to treat analysis of data from 55 patients supported the differences in favour of lisinopril for the primary end points. In the intention to treat analysis in 55 patients, significant differences were retained for the primary efficacy end points:
Intention to Treat Analysis—55 Participants with Means (SD)
  Lisinopril  Placebo  Mean % reduction (95% CI) 
Headache hours  138 (130)   162 (134)   15 (0 to 30) 
Headache days  20.7 (14)  24.7 (11)  16 (5 to 27) 
Migraine days  14.6 (10)  18.7 (9)  22 
Conclusion: The angiotensin converting enzyme inhibitor, lisinopril, has a clinically important prophylactic effect in migraine.

The authors have done as their primary analysis an “optimal compliance analysis.” They also state they have done an ITT analysis but they have not.

It is fine to do non-ITT analyses – “as treated,” and “completer” analysis are two common ones you will frequently see. But the ITT analysis must be the primary analysis. Others are considered secondary (and should be labeled and treated as such).

And so how does one handle loss to follow-up? There are various methods, but there is an important principle which should guide us — the method should put the burden of proof on the intervention. This is the opposite of our court system – “guilty until proven innocent,” in effect. So what you do is assign an outcome to those lost to follow-up that puts the intervention through the toughest test. “Worse-case-basis” is one method; “last-observed result” is another.

If you put the intervention through the hardest test, and you still have positive results (assuming the study is otherwise valid), you can feel much more confident about the reported outcomes truly being valid. If the missing subjects in the above-mentioned migraine article are handled this way, there is no statistically significant difference between lisinopril and placebo.

We are frequently asked what is an acceptable percent loss to follow-up. It depends on whether the loss to follow-up will affect the results or not. We have seen what we consider to be important changes even with small numbers lost to follow-up. We recommend that you do sensitivity analyses (“what if”s) to see what the effect might be if you had the data. Without doing an ITT analysis, we are very uncomfortable about the results if five percent or more of subjects have missing data for analyzing endpoints -- and even less than five percent might have impact.

For those who would like more information, the following article is an excellent one on the subject and is very helpful for understanding issues pertaining to ITT analysis and randomization as well:

Schulz KF, Grimes DA
Sample size slippages in randomised trials: exclusions and the lost and wayward.
The Lancet. Vol 359. March 2, 2000: 781-785
PMID: 11888606

See other reading on ITT analysis is available here.

Very special thanks to Murat Akalin, MD, MPH, UCSD, for selecting a great article for case study, participating in this review, doing the ITT analysis and encouraging us to write this.

Summary Points About Harms

Safety issues concern risks and harms which are events that cause problems with meaningful outcomes (morbidity, mortality, quality of life, functioning) or other cause other unwanted effects.

  • Terms “safety, risk, harm, adverse event, adverse effect, ADE” often used interchangeably
  • We sometimes distinguish risk from harm

Harms are infrequent, hard to find, usually not the topic of study, study duration may be too short or population studied, too small. They are often reported from weaker science such as case report data, database research, observational studies or low quality RCTs. (Reminder: Cause and effect can only be concluded from valid RCTs.) Safety data are, therefore, usually not strong and often likely due to chance. When good interventions are no longer available due to poor safety information – which could be inaccurate – patients may be harmed.

Clinical trials tell us much about efficacy but little about harms because —

  • Study populations are carefully selected and frequently have only one disease; frequently exclude pregnant women, children, elderly.
  • Rare events are difficult to find in RCTs because of small sample sizes.
  • Harms from long-term use of a drug are usually not known because trials are of short duration.
  • Drug interactions are more likely in the real world than in RCTs.

Generally, there is no easy way to find information regarding harms.

  • Drug companies do not have a large interest in studies looking at long-term harms.
  • The current “system” is based on voluntary reporting.
  • Case reports (letters, short reports) are the main source of information on harms.
  • There are national and international monitoring centers, e.g., FDA. However, a problem with FDA data is that many (?most) reports do not have cause—effect relationships.

Searching for harms using PubMed —

    • Large RCTs should be sought, but problematic if harms are rare or late. Also look for long-term follow-up of RCTs.
    • Systematic reviews of RCTs dealing with harms should be sought, but harms may not be detected if some of the included trials do not report harms or if harms are described in various ways in different studies. Therefore, in some cases systematic reviews may falsely indicate lack of harms that are subsequently detected in large, well-designed and conducted RCTs.
    • Search for case-control and cohort studies, keeping in mind that observational studies are prone to bias
    • Key words: “adverse effects, adverse events, adverse reactions, adverse reaction monitoring, ADR, pharmacovigilance.”

Strategies we employ for approaching harms —

  • Review multiple studies
  • Note if support (e.g., biologic plausibility, relatedness in outcomes, dose-response relationship)
    Review of confidence intervals (CI) for non-significant findings to discern if there is a clinically meaningful possibility with the interval
  • Consider applying composite endpoints
  • Review the exclusions. Exclusion of patients otherwise likely to experience side-effects may affect generalizability of results of side effects reporting (eg, may happen if patients are restricted to those who are not naïve or may occur through a run-in and exclusion period).
    • Caution: if subjects who have experienced or are likely to experience adverse outcomes known to be associated with the intervention being studied are excluded, then search for condition, intervention and adverse outcome (and potentially comparator) without limits — but potentially narrowing for cohort. It might be reasonable to limit review to recent studies as they probably have a discussion of prior findings.
  • Apply cautious wording with caveats
  • Clinicians are urged to follow FDA recommendations

References:

  • BMJ. 2004;329:2-3.
  • BMJ. 2004;329:44-47.

More on harms —

Flowing Evidence into Cost Analysis: Powerful Ways with NNT

This example serves several purposes.

It illustrates the critical importance of including time period in NNT in order to better understand efficacy.

It shows how you can use NNT in an effectiveness and cost analysis.

1. Read the fictional scenario below. This is typical of the kinds of issues faced by doctors, P & T committees, QI staff, etc., daily.

2. Think about your reaction.

3. Then click on the icon to see how important NNT can be. Remember, NNT is always to be associated the length of the time of study. This example really drives this home.

Fictional Scenario

  • Lifetime risk of hip fracture in women is 15% with significant mortality (20-30% of women die in the first year following hip fracture).
  • HRT is now found to have many risks. Other fracture prevention drugs have risks. There is a new (fictional) drug on the market that has fewer risks and that many docs are starting to use on high and moderate risk women. The drug is getting a lot of press attention and has good evidence behind it.
  • Many women in your organization are requesting information and treatment for prevention of fractures. Many are asking about this new drug.
  • One year of treatment is cheaper than alendronate.


Should you treat these women with this new drug? How do you decide? Click on the question mark below to see an example of how you can address questions like this one:

?

Diagnostic Testing & Measures of Test Function

Bottom line is that there are special considerations for critically appraising studies of diagnostic tests in addition to usual considerations of study validity, clinical relevance, applicability and usability. Measures of test function help determine the accuracy and usefulness of diagnostic tests. Also known as indices of accuracy. They are —

Sensitivity, Specificity, Positive predictive value, Negative predicitive value, Positive likelihood ratio, Negative likelihood ratio, Post-test odds, Post-test probabilities, Number-needed-to-diagnose,

However there are some special challenges surrounding these measures which we describe below. You can download this one-pager here [PDF].

Evaluating Diagnostic Tests: Challenges with Measures of Test Function

 

Key Points

  • A goal of diagnostic testing is to reduce diagnostic uncertainty. Yet there is usually uncertainty associated with diagnostic testing itself.
  • The obvious question in diagnostic testing is, “Does the patient have this condition, disorder or disease?” An equally important question, however, is, “Will the patient experience improved outcomes if the condition is detected?” Thus diagnostic testing requires considering both test accuracy and the evidence about clinical outcomes from various interventions.
  • Evaluating a diagnostic test usually entails making a comparison when one is available. Yet, there are often problems with making test comparisons. And there are problems when there is no comparator.
  • ”Measures of Test Function” evaluate the accuracy and prediction capabilities of diagnostic tests. There are often problems with Measures of Test Function.

Consequently there are special considerations for critically appraising studies of diagnostic tests in addition to usual considerations of study validity, clinical relevance, applicability and usability.

 

How much uncertainty is associated with this test?

Uncertainty can rarely be eliminated due to –

a) Uncertainty about what equates with a meaningful result since assignment of normal and abnormal values is usually arbitrary when dealing with a range, and “normal/abnormal” may not equate with being disease-free or having a disease;

b) Trade-offs between sensitivity and specificity – Setting the cut-off to identify more patients with the disorder will almost always yield more patients with false positives. For example, if you set the cut-off for an abnormal fasting blood sugar at a low level to identify more diabetics (higher sensitivity), you will pay the price of including more non-diabetics (lower specificity and high false positive rate) as well.

c) Variations in the test’s accuracy and precision, its application, its predictive capabilities, and/or its interpretation;

d) Variations in how values might vary within an individual, a population or within different populations – including assumptions about who is and who is not disease-free, and variations in disease spectrum such as early to late disease, or mild or severe disease, or rate of disease progression;

e) Sometimes having a test for an intermediate outcome, but not having good information available about whether treatment of the intermediate outcome is actually associated with meaningful clinical outcomes (e.g.,PVCs following an MI indicate higher risk for cardiac mortality, but does treating them reduce the mortality risk?)

f) Frequently needing to choose a less accurate method due to cost or risk (e.g., chest x-ray vs lung biopsy).
Consequently, the uncertainty surrounding the diagnostic test must be evaluated.

 

Is the comparison valid?

  • Frequently there is no single, accurate test for diagnosis. For example the diagnosis of rheumatoid arthritis involves history, physical exam plus laboratory testing.
  • Often there is no way that is 100% accurate to establish a diagnosis. Comparing a new diagnostic test or procedure to an inaccurate “standard” may make it seem that the new method is in error even if it is actually better than the current “standard.”
  • Are patients representative of the population to which the test will be applied and consecutive?
  • Are study investigators blinded to the results of the gold standard and the test being evaluated?
  • It is rare that a test is both highly sensitive and highly specific, which can make it difficult to find a perfect gold standard.
  • Often good information about negative tests is lacking since patients with negative tests are generally not subsequently exposed to further invasive or uncomfortable testing. Therefore, it is more likely to be unknown if the negative results are valid or invalid.

Consequently, the comparison method must be evaluated.

 

Measures of Test Function – Is the test accurate and predictive?

Results for measures of test function may be misleading depending on the population used to make those calculations.

  • Calculations are often based not on prevalence within a community, but in the pool tested.
  • The test's reported sensitivity can be misleading if the sensitivity is determined in a patient population that is different from where the test is applied. For example, the sensitivity of CPK for an MI may be overstated if it is determined using CCU patients, but the test is used in general hospital admissions. This is due to the greater severity of disease in the CCU, which may result in a greater likelihood that those with the disease test positive. Conversely, a general population includes more people presenting earlier in the course of the disease. These people with early disease are more likely to have lower CPK levels, resulting in a lower sensitivity for the test.
  • Prevalence and severity of disease is often higher in academic settings than in the general population, and sensitivity may be higher than if the study or test had been performed in a “usual care” setting.

Consequently, variables like age and gender in the study subjects must be evaluated, along with severity, stage and duration of disease to ensure that various stages of disease have been studied, to determine if the measures of test function are appropriate for your population.

Performance Measures — Quick Tips Checklist
We have an entire training program and tool for this work, but we list our quick tips here.

See also Delfini Showcase: Publications — Performance Measures

Steps for Quality Improvement Project Design
Steps I. through IV. Help for Selecting Good Projects
Step I. Do you have a gap between current & optimal care? Apply considerations for determining importance of area for clinical improvement.
Step II. What will close the gap and improve quality? Search for valid and useful evidence for quality improvement effort. If none available and this is for an intervention, STOP. What will be your quality improvement?

Step III. Is attempting the improvement feasible in your environment?

  • Are you going to be able to successfully make clinical practice change happen?
  • Are resources available to support the initiative?

Step IV. Can you measure it?

  • Is your measure quantifiable?
  • Is your measure valid, accurate and dependable?
  • Is the measure useful and usable?
    • Comprehensible
    • Assists with QI projects
  • Is measurement achievable in your local circumstances?
Table: Measure Name/Descriptor/Validity Consideration
Example
Numerator = what you are counting = validity ideally based on valid, useful evidence An Rx for ACEIs
Denominator = the pool for the count = validity based on inclusions and exclusions In unexcluded patients with CHF admitted to a hospital
Frequency = time interval for the occurrence (e.g., performance or process) = validity ideally based on valid, useful evidence By the time of hospital discharge
Steps V. through VIII. Help for Applying Performance Measures
Step V. How are you going to gather the data to measure the improvement?
Step VI. What is the meaning of your measurement, i.e., what goal will you set to define “improvement?”
Step VII. How are you going to report it and to whom?
Step VIII. What is your process for updating your improvement?
Research Searching Tips

Caveats abound! I was asked to lead a session on research searching at UCSD. I make no claims to the utility or completeness of this document — plus this is specifically geared to UCSD faculty, staff and students. But in case this is useful to someone else, here 'tis. [PDF] Sheri


Resources
:
EMB & EBP Healthcare Quality Improvement Tips

.......
Home
What
's New
Intro Links

Services
Resources
Success Stories
Notices
About Us
About Our Work
Testimonials
Other
Site Navigator
Contact Us

Site Search
.......

delfini.org
Delfini Group EBM Dolphin

Also creators of
medicalleaders.org

Quick Navigator to Selected Resources:

Delfini SERVICES
We offer a wide range of supportive services with the ultimate goal of increasing health care organizations’ and clinicians’ abilities to provide high quality, evidence-based care and to improve medical decision-making for organizations, leaders, teams, clinicians and patients.
Delfini About Our Work Delfini Group EBM DolphinDelfini Group EBM Dolphin
Resources at —

Return to Top

© Delfini Group, LLC, 2002-2010. All Rights Reserved Worldwide.
Use of this website implies your agreement to our Notices.

EBM Solutions for Evidence Based Health Care Quality