Quick Picks

Why Critical Appraisal Matters

Services

Seminars

Books

Updates & Contact Info

Free Online Tools

Free Online Tutorial

Delfini Blog
Follow & Share...
Just-in-time Updates
Like Us Find Us |
DelfiniGram™: GET ON OUR UPDATE LIST
Volume
— Quality
of Evidence:
Primary Studies & General Concepts
Newest
02/01/2015: Progression Free Survival (PFS) in Oncology Trials
10/06/2014: Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports
06/18/2014: Comparative Study Designs: Claiming Superiority, Equivalence and Non-inferiority—A Few Considerations & Practical Approaches
01/14/2014: Attrition Bias Update
Go to Contents
Basics For Evaluating Medical Research Studies:
A Simplified Approach
And Why Your Patients Need You To Know This
Delfini Group Evidence-based Practice Series
Short How-to Guide Book
...................................................
|
 |
"Best help with evidence-based medicine available."
Marty Gabica, MD, Chief Medical Officer, Healthwise
This book is about how to evaluate the reliability and clinical usefulness of clinical trials. Written for physicians and other health care professionals, this book is written in easy-to-understand terms that even the layperson can understand and put to use. Now available for purchase. |
Contents
- See also Delfini
Commentary on "Lies, Damned Lies, and Medical Science," by
David H. Freedman, The Atlantic, November 2010 including a link
to the article and our published Letter to the Editor, "How to
Evaluate Medical Science," The Atlantic, January/February 2011
- Quality
of Studies: Lower Quality = Greater Effect Size
- 5 “A”s of Evidence-based Medicine & PICOTS: Using “Population, Intervention, Comparison, Outcomes, Timing, Setting” (PICOTS) In Evidence-Based Quality Improvement Work
- Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports
- Must
Clinical Trials be Randomized? A Look at Minimization Methods
- Advice On Some Quasi-Experimental Alternatives To Randomization
- Concealment
of Allocation
- Blinding
and RCTs
- Blinding and Objective Outcomes
- Blinding
in Surgery Trials
- The
Importance of Blinded Assessors in RCTs
- Testing the Success of Blinding
- Time-related Biases Including Immortality Bias
- Empirical Evidence of Attrition Bias in Clinical Trials
- Attrition
Bias: Intention-to-Treat Basics
- Loss to Follow-up Update
- Intention-to-Treat
Analysis & the Effects of Various Methods of Handling Missing Subjects:
The Case of the Compelling Rationale
- See
also Bell's
Palsy Update for
the addition of instructional slides to demonstrate how
to perform a conservative intention-to-treat (ITT) analysis. [This
is a primary studies issue, but is stored in secondary studies for historical
reasons concerning our initial commentary on a narrative review.]
- Intention-to-Treat & Imputing Missing Variables: Last-Observation-Carried-Forward (LOCF)—When We Might Draw Reasonable Conclusions
- Intention-to-Treat
Analysis & Censoring: Rofecoxib Example
- Intention-to-Treat
Analysis: Misreporting and Migraine
- Missing
Data Points: Difference or No Difference
- Quality
of Studies: VIGOR
- Avoiding
Overestimates of Benefit: Composite Endpoints in Cardiovascular Trials
- Confidence-Intervals,
Power & Meaningful Clinical Benefit
- Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence
- Confidence Intervals: Overlapping Confidence Intervals—A Clarification
- Primary and Secondary Outcomes: Significance Issues
- Progression Free Survival (PFS) in Oncology Trials
- Adjusting for Multiple Comparison
- Safety Issues
- When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?
- Understanding
Number Needed to Treat (NNT)
- Obtaining Absolute Risk Reduction (ARR) and Number Needed To Treat (NNT) From Relative Risk (RR) and Odds Ratios (OR) Reported in Systematic Reviews
- Early
Discontinuation of Clinical Trials: Oncology Medication Studies—Recent
Developments and Concern
- Early Termination of Clinical Trials—2012 Update
- Advanced
Concepts: Can Useful Information Be Obtained From Studies With Significant
Threats To Validity? A Case Study of Missing Data Points in Venous Thromboembolism
(VTE) Prevention Studies & A
Case Study of How Evidence from One Study Might Support Conclusions
from a Flawed Study
- Review of Bias In Diabetes Randomized Controlled Trials
- Comparative Study Designs: Claiming Superiority, Equivalence and Non-inferiority—A Few Considerations & Practical Approaches
- Are Adaptive Trials Ready For Primetime?
Go to DelfiniClick™
for all volumes.  |
Quality
of Studies: Lower Quality = Greater Effect Size
The quality of
studies in systematic reviews and meta-analyses has repeatedly
been shown to affect the amount of benefit reported. This DelfiniClick
is a quick reminder that just because a study is a RCT does
not mean it will provide you with a reliable estimate of effect
size. A nice illustration of this point is provided in a classic
article by Moher D et al. (Does quality of reports of randomised
trials affect estimates of intervention efficacy reported in
meta-analyses? Lancet 1998; 352: 609–13)[1].
In this study,
the authors randomly selected 11 meta-analyses that involved
127 RCTs on the efficacy of interventions used for circulatory
and digestive diseases, mental health, pregnancy and childbirth.
The authors evaluated each RCT by examining the description
of randomization, allocation concealment, blinding, drop outs
and withdrawals.
The results are
in line with other authors’ findings regarding quality
of methods and amount of benefit (effect size) reported as relative
measures below:
- The quality
of trials was low overall.
- Low-quality
trials compared with high quality trials (score >2) were
associated with an increased estimate of benefit of 34%.
- Trials that
used inadequate allocation concealment, compared with those
that used adequate methods, were also associated with an increased
estimate of benefit (37%).
- The average
treatment benefit was 39% for all trials, 52% for low-quality
trials, and 29% for high-quality trials.
The authors conclude
that studies of low methodological quality in which the estimate
of quality is incorporated into the metaanalyses can alter the
interpretation of the benefit of the intervention.
We continue to
see this problem in systematic reviews and clinical guidelines
and suggest that when evaluating secondary studies readers pay
close attention to the quality of included studies.
[1] Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998 Aug 22;352(9128):609-13. PubMed PMID: 9746022.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Overestimation
of Effect Size in Studies of Low Quality
Updated 02/11/2013
In a previous DelfiniClick,
we summarized an article by Moher and colleagues (1) in which
the authors randomly selected 11 meta-analyses involving 127
RCTs which evaluated the efficacy of interventions used for
circulatory and digestive diseases, mental health, pregnancy
and childbirth. Moher and colleagues concluded that -
- Low-quality
trials compared with high quality trials (score >2), were
associated with a relative increased estimate of benefit (34%).
- Trials that
used inadequate allocation concealment, compared with those
that used adequate methods, were associated with a relative
increased estimate of benefit (37%).
Below we summarize
another study that confirms and expands Moher’s findings.
In a study similar to Moher’s, Kjaergard and colleagues
(2) evaluated the effects of methodologic quality on estimated
intervention effects in randomized trials.
The study evaluated
23 large and 167 small randomized trials and a total of 136,164
participants. Methodologic quality was defined as the confidence
that the trial’s design, conduct, analysis, and presentation
minimized or avoided biases in the trial’s intervention
comparisons (3). The reported methodologic quality was assessed
using four separate components and a composite quality scale.
The quality score
was ranked as low (</=2points) or high (>/=3 points),
as suggested by Moher et al. (1). The four components were 1)
generation of allocation sequence; 2) concealment of allocation;
3) double-blinding; and, 4) reporting of loss-to-follow-up:
RESULTS
OF KJAERGARD ET AL’S REVIEW (all reported exaggerations
are relative increases):
Generation
of Allocation Sequence
The odds ratios generated by all trials (large and small) with
inadequate generation of the allocation sequence were on average
significantly exaggerated by 51% compared with all trials reporting
adequate generation of allocation sequence (ratio of odds ratios
(95% CI) = 0.49 (0.30–0.81), P <0.001.
Concealment
of Allocation
All trials with inadequate allocation concealment exaggerated
intervention benefits by 40% compared with all trials reporting
adequate allocation concealment (ratio of odds ratios (95% CI)
= 0.60 (0.31–1.15), P =0.12. Odds ratios were significantly
exaggerated by 52% in small trials with inadequate versus adequate
allocation concealment (ratio of odds ratios (95% CI) 0.48 (0.25–0.92),
P = 0.027).
Double
Blinding
The odds ratios generated by all trials without double blinding
were significantly exaggerated by 44% compared with all double-blind
trials (ratio of odds ratios (95% CI) = 0.56 (0.33–0.98),
P = 0.041).
Reporting
of Loss-to-Followup
The analyses showed no significant association between reported
follow-up and estimated intervention effects (ratio of odds
ratios (95% CI) = 1.50 (0.80–2.78), P = 0.2).
Kjaergard
and Colleagues’ Conclusions
- Adequate generation
of the allocation sequence and adequate allocation concealment
should be required for adequate randomization.
Unlike previous investigators (1,3,4, 5), the authors found
that trials with inadequate generation of allocation sequence
exaggerate intervention effects significantly.
- Trials with
inadequate allocation concealment also generate exaggerated
results.
This is in accordance with previous evidence (1,3,5). The
authors found that despite the considerable overlap between
generation of allocation sequence and allocation concealment,
both factors may independently affect the estimated intervention
effect.
- Trials without
double blinding exaggerate results.
This study supports Schulz and colleagues’ finding of
a significant association between intervention effects and
double blinding and extends the evidence by including trials
from several therapeutic areas.
- There was no
association between reported follow-up and intervention effect.
Delfini Comment
It is useful to know quantitatively how various threats
to validity affect results when doing critical appraisal of
a study. The study by Kjaergard and colleagues summarized above
expands the findings of Schulz, Moher, Juni and others.
Previous studies
have questioned the reliability of reported losses to follow-up
(5, 6). In accordance with Schulz and colleagues’ results
(5), the authors found no association between intervention effects
and reported follow-up.
In agreement with
the findings of Moher and associates (1,3) and Juni and colleagues
(7), the authors found that trials with a low quality score
on the scale developed by Jadad and colleagues (8) significantly
exaggerate intervention benefits.
Kjaergard and colleagues
conclude that assessment of methodologic quality should focus
on generation of allocation sequence, allocation concealment,
and double blinding. Delfini feels this is not sufficient –
but appreciates this study as one that further demonstrates
the importance of effective approaches to some of these methodologic
areas.
References
1. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, et
al. Does quality of reports of randomised trials affect estimates
of intervention efficacy reported in meta-analyses? Lancet.
1998;352:609-13. PMID: 9746022
2. Kjaergard LL,
John Villumsen J, Gluud C. Reported Methodologic Quality and
Discrepancies between Large and Small Randomized Trials in Meta-Analyses.
Ann Intern Med. 2001;135:982-989. PMID 11730399
3. Moher D, Cook
DJ, Jadad AR, Tugwell P, Moher M, Jones A, et al. Assessing
the quality of reports of randomised trials: implications for
the conduct of meta-analyses. Health Technol Assess. 1999;3:i-iv,
1-98. PMID: 10374081
4. Emerson JD,
Burdick E, Hoaglin DC, Mosteller F, Chalmers TC. An empirical
study of the possible relation of treatment differences to quality
scores in controlled randomized clinical trials. Control Clin
Trials. 1990;11:339-52. PMID: 1963128
5. Schulz KF, Chalmers
I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions
of methodological quality associated with estimates of treatment
effects in controlled trials. JAMA. 1995;273:408-12. PMID: 7823387
6. Gøtzsche
PC. Methodology and overt and hidden bias in reports of 196
double-blind trials of nonsteroidal antiinflammatory drugs in
rheumatoid arthritis. Control Clin Trials. 1989;10:31-56. PMID:
2702836
7. Juni P, Witschi
A, Bloch R, Egger M. The hazards of scoring the quality of clinical
trials for meta-analysis. JAMA. 1999;282:1054-60. PMID: 10493204
8. Jadad AR, Moore
RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, et al.
Assessing the quality of reports of randomized clinical trials:
is blinding necessary? Control Clin Trials. 1996;17:1-12. PMID:
8721797
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
5 “A”s of Evidence-based Medicine & PICOTS: Using “Population, Intervention, Comparison, Outcomes, Timing, Setting” (PICOTS) In Evidence-Based Quality Improvement Work
09/25/2012
Much of what we do when answering key clinical questions can be summarized using the 5 “A” EBM Framework—Ask, Acquire, Appraise, Apply and "A"s Again.[1] Key clinical questions create the focus for the work and, once created, drive the work or project. In other words, the 5 “A”s form a scaffolding for us to use in doing EB quality improvement work of many types.
When healthcare professionals look to the medical literature for answers to various clinical questions or when planning comparative reviews, they frequently utilize checklists which employ the mnemonics, PICO (population, intervention, comparison, outcome)[2], PICOTS (same as PICO with the addition of timing and setting) or less frequently PICOT-SD (which also includes study design.[3] PICOTS (patient population, intervention, comparison, outcomes, timing and setting) is a checklist that can remind us of important considerations in all of the 5 "A" areas.
PICOTS in Forming Key Clinical Questions and Searching
PICOTS is a useful framework for constructing key questions, but should be applied thoughtfully, because at times all PICOTS elements are not needed to construct a useful clinical question. For example, if I am interested in the evidence regarding prevention of venous thromboembolism in hip replacement surgery, I would want to include the population and study design and perhaps key outcomes, but I would not want to limit the question to any specific interventions in case there are some useful interventions of which I am not aware. So the question might be, “What is the evidence that thromboembolism or deep vein thrombosis (DVT) prophylaxis with various agents reduces mortality and clinically significant morbidity in hip replacement surgery?” In this case, I was somewhat specific about P (the patient population—which frequently is the condition of interest—in this case, patients undergoing hip replacement surgery), less specific about O (mortality and morbidities) and not specific about I and C.
I could be even more specific about P if I specified patients at average risk for VTE or only patients at increased risk. If I were interested in the evidence about the effect of glycemic control on important outcomes in type II diabetes, I might pose the question as, “What is the effect of tight glycemic control on various outcomes,” and type in the terms “type 2 diabetes” AND “tight glycemic control” which would not limit the search to studies reporting outcomes of which I was unaware.
Learners are frequently taught to use PICO when developing search strategies. (When actually conducting a search, we use "condition" and not "population" because the condition is more likely to activate the MeSH headings in PubMed which produces a search with key synonyms.) As illustrated above, the PICO elements chosen for the search should frequently be limited to P (the patient population or condition) and I so as to capture all outcomes that have been studied. Therefore, it is important to remember that many of your searches are best done with using only one or two elements and using SD limits such as for clinical trials in order to increase the sensitivity of your search.
PICOTS in Assessing Studies for Validity and Synthesizing Evidence
When critically appraising studies for reliability or synthesizing evidence from multiple studies, PICOTS reminds us of the areas where heterogeneity is likely to be found. PICOTS is also useful in comparing the relevance of the evidence to our population of interest (external validity) and in creating decision support for various target groups.
PICOTS in Documenting Work
Transparency can be made easier by using PICOTS when documenting our work. You will notice that many tables found in systematic reviews and meta-analyses include PICOTS elements.
References
1. Modified by Delfini Group, LLC (www.delfini.org) from Leung GM. Evidence-based practice revisited. Asia Pac J Public Health. 2001;13(2):116-21. Review. PubMed PMID: 12597509.
2. Guyatt GH, Oxman AD, Kunz R, Atkins D, Brozek J, Vist G, Alderson P, Glasziou P, Falck-Ytter Y, Schünemann HJ. GRADE guidelines: 2. Framing the question an deciding on important outcomes. J Clin Epidemiol. 2011 Apr;64(4):395-400. Epub 2010 Dec 30. PubMed PMID: 21194891.
3. Methods Guide for Effectiveness and Comparative Effectiveness Reviews. AHRQ Publication No. 10(12)-EHC063-EF. Rockville, MD: Agency for Healthcare Research and Quality. April 2012. Chapters available at: www.effectivehealthcare.ahrq.gov.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports
10/06/2014
Many critical appraisers assess bias using tools such as the Cochrane risk of bias tool (Higgins 11) or tools freely available from us (http://www.delfini.org/delfiniTools.htm). Internal validity is assessed by evaluating important items such as generation of the randomization sequence, concealment of allocation, blinding, attrition and assessment of results.
Jefferson et al. recently compared the risk of bias in 14 oseltamivir trials using information from previous assessments based on the study publications and the newly acquired, more extensive clinical study reports (CSRs) obtained from the European Medicines Agency (EMA) and the manufacturer, Roche.
Key findings include the following:
- Evaluations using more complete information from the CSRs resulted in no difference in the number of previous assessment of "high" risk of bias.
- However, over half (55%, 34/62) of the previous "low" risk of bias ratings were reclassified as "high."
- Most of the previous "unclear" risk of bias ratings (67%, 28/32) were changed to "high" risk of bias ratings when CSRs were available.
The authors discuss the idea that the risk of bias tools are important because they facilitate the process of critical appraisal of medical evidence. They also call for greater availability of the CSRs as the basic unit available for critical appraisal.
Delfini Comment
We believe that both sponsors and researchers need to provide more study detail so that critical appraisers can provide more precise ratings of risk of bias. Study publications frequently lack information needed by critical appraisers.
We agree that CSRs should be made available so they can be used to improve their assessments of clinical trials. However, our experience has been the opposite of that experienced by the authors. When companies have invited us to work with them to assess the reliability of their studies and made CSRs available to us, frequently we have found important information not otherwise available in the study publication. When this happens, studies otherwise given a rating at higher risk of bias have often been determined to be at low risk of bias and of high quality.
References
1. Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savovic J, Schulz KF, Weeks L, Sterne JA; Cochrane Bias Methods Group; Cochrane Statistical Methods Group. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011 Oct 18;343:d5928. doi: 10.1136/bmj.d5928. PubMed PMID: 22008217.
2. Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Onakpoya I, Heneghan CJ. Risk of bias in industry-funded oseltamivir trials: comparison of core reports versus full clinical study reports. BMJ Open. 2014 Sep 30;4(9):e005253. doi: 10.1136/bmjopen-2014-005253. PubMed PMID: 25270852.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Must
Clinical Trials be Randomized? A Look at Minimization Methods
07/20/2010
In clinical
trials, any difference between groups, except for what is being
studied, could explain or distort the study results. In randomized
clinical trials (RCTs), the purpose of randomization is to attempt
to distribute people for study into study groups in such a way
that prognostic variables are evenly distributed. Thus, the
goal of the randomization process in RCTs is to generate study
groups with similar known and unknown prognostic variables so
that the groups being compared have similar baseline characteristics.
Randomization is very likely to achieve balanced groups, especially
in large trials. Adequate simple or unrestricted randomization
is achieved by generating random number sequences and concealing
the randomization process from everyone involved in the study.
Minimization is a non-random method of allocating
patients to study groups. Since it is not random, is it necessarily
bad? Possibly not.
With minimization the goal is to ensure that
several pre-specified patient factors and the number of subjects
are balanced in the study groups. The allocation of each subject
is identified, and that information is used to increase the
likelihood that subjects are allocated to the group which it
is thought will result in balanced prespecified patient factors.
This can be accomplished by models that identify the the number
of patients in each group with the pre-specified factors and
increase the likelihood or ensure that the next subject will
be allocated to the group with fewer patients with the pre-specified
factor. Numerous methods for accomplishing minimization have
been described. Minimization may effectively distribute known
prognostic variables, and many authors consider it methodologically
equivalent to randomization without minimization. One potential
threat to validity is whether or not the knowledge of impending
allocation assignment by individuals involved in the study could
affect the allocation process. Benefits, drawbacks and extensive
methodological detail are available in a review by Scott et
al. who conclude that minimization is a highly effective allocation
method [1].
1. Scott NW, McPherson GC, Ramsay CR, Campbell
MK. The method of minimization for allocation to clinical trials.
a review. Control Clin Trials. 2002 Dec;23(6):662-74. Review.
PubMed PMID: 12505244 Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. |
Advice On Some Quasi-Experimental Alternatives To Randomization
03/23/2012
We have found a lot of help over the years in reading the advice and postings of statistician, Dr. Steve Simon. Here’s an entry in which he discusses some considerations when dealing with quasi-experimental designs. You can sign up for his newsletter to receive it directly. (Note: if you keep reading to the next entry about how much in practice is estimated to be evidence-based, we suspect that the reported percent might be inflated if the reviewers were not applying a solid critical appraisal approach.) You can read Steve’s advice about quasi-experimental design considerations here:
http://www.pmean.com/news/201201.html#1
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Concealment
of Allocation
In
1996, the CONSORT statement encouraged the reporting of concealment
of allocation. Concealment of allocation is the process for
actually assigning to the patient the group they will be in
without breaking blinding. Hewitt et al. in a recent issue of
BMJ reviewed the prevalence of adequate concealment of allocation
in 4 journals—BMJ, Lancet, JAMA and NEJM (Hewitt C et
al. BMJ 2005;330:1057-1058. PMID: 15760970). They scored the
allocation as adequate (i.e., subject recruiter was different
person from the person executing the allocation sequence), inadequate
or unclear. Sealed envelopes were considered inadequate unless
performed by an independent third party.
Results
Studies included: 234
Adequate concealment: 132 (56%)
Inadequate concealment: 41 (18%)
Unclear concealment: 61 (26%)
Delfini Commentary
The authors point out that previous studies have found an association
between inadequate concealment and the reporting of significant
results. Of interest is that studies included in this review
with inadequate concealment tended to show a significant result—OR
1.8, 95% CI (0.8 to 3.7).
This is another
study suggesting that the critical appraisal of RCTs is “critical”
and that lower quality studies are more likely to report significant
benefit than are higher quality studies.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Blinding
and RCTs
A recent article,
Boutron I, Estellat C, Guittet L, Dechartres A, Sackett DL,
et al. (2006) Methods of blinding in reports of randomized controlled
trials assessing pharmacologic treatments: A systematic review.
PLoS Med 3(10): e425. DOI: 10.1371/ journal.pmed.0030425, provides
a great deal of useful information about and a way of classifying
blinding in research studies. The authors evaluated blinding
in RCTs of pharmacologic treatment published in 2004 in high
impact-factor journals. The following are some key points from
the article:
• The authors
identified 819 reports with about 60% describing the method
of blinding. The classification identified three main methods
of blinding:
(1) methods to provide identical treatments in both arms,
(2) methods to avoid unblinding during the trial, and
(3) methods of blinded outcome assessment.
• ESTABLISHING BLINDING OF PATIENTS AND PROVIDERS: 472
[58%] described the method of blinding, but 236 [29%] gave no
detail and 111 [13%] some data on blinding (i.e., reporting
that treatments were similar or the use of double dummies with
no description of the method). The methods of blinding identified
varied in complexity. The authors reported use of a centralized
preparation of similar capsules, tablets, or embedded treatments
in hard gelatin capsules (193/336 [57%]), similar syringes (37/336
[11%]), or similar bottles (38/336 [11%]). Use of a double dummy
procedure was described in 79 articles (23%). Other methods
consisted of a sham intervention performed by an unblinded health
care provider who was not actively involved in the care of patients
and had no other contact with patients or other caregivers and
outcome assessors (17/336 [5%]). To mask the specific taste
of the active treatments, in ten articles researchers used a
specific flavor such as peppermint or sugar to coat treatments.
For treatments administered by care providers, authors reported
use of a centralized preparation of opaque coverage to adequately
conceal intravenous treatments with different appearances (14/336
[4%]).
• AVOIDING
UNBLINDING OF PATIENTS AND PROVIDERS: Only 28/819 [3%]) reported
methods to avoid unblinding. Methods to blind dosage adaptation
relied on use of a centralized adapted dosage or provision of
sham results of complementary investigations for treatments
necessitating dosage adaptation. Methods to avoid unblinding
because of side effects relied mainly on centralized assessment
of side effects, partial information to patients about side
effects, use of active placebo or systematic prevention of adverse
effects in both arms.
• BLINDING
ASSESSORS: These methods depend on the main outcomes and are
particularly important when blinding cannot be established and
maintained by the methods described above. A total of 112 articles
[14%] described these methods, which relied mainly on a centralized
assessment of the main outcome. Blinding of outcome assessors
is presumably achieved if neither patients nor those involved
in the trial have any means to discover which arm a patient
is in, for example because the placebo and active drugs are
indistinguishable and allocation is via a central randomization
service. 96 reports (86%) of the 112 reports in which specific
measures to blind the outcome assessor were reported concern
trials in which patients were reported as blinded or in which
double blinding or triple blinding was reported. These results
suppose that, although blinding was performed at an earlier
stage, the investigators nevertheless decided to perform a specific
method of blinding the outcome assessor.
• AUTHORS
COMMENTS AND CONCLUSIONS:
• Although blinding is essential to avoid bias, the reporting
of blinding is generally quite poor and reviews of trials that
test the success of blinding methods indicate that a high proportion
of trials are unblinded.
• The study
results might be explained in part by the insufficient coverage
of blinding in the Consolidated Standards for Reporting Trials
(CONSORT) statements. For example, three items of the CONSORT
statements are dedicated to the description of the randomization
procedure, whereas only one item is dedicated to the blinding
issue. The CONSORT statements mainly focus on reporting who
is blinded and less on the reporting of details on the method
of blinding, and this information is essential to appraise the
success of blinding.
• Some evidence
suggests that although participants are reported as blinded,
the success of blinding might be questionable. For instance,
in a study assessing zinc treatment for the common cold, the
blinding procedure failed, because the taste and aftertaste
of zinc was distinctive. And yet, tools used to assess the quality
of trials included in meta-analyses and systematic reviews focus
on the reporting of the blinding status for each participant
and rarely provide information on the methods of blinding and
the adequacy of the blinding method.
• There is
a need to strengthen the reporting guidelines related to blinding
issues, emphasizing adequate reporting of the method of blinding.
Delfini Commentary
Lack of blinding appears to be a major source of bias in RCTs.
Just as well-done randomization and concealment of allocation
to the study groups decreases the likelihood of selection bias,
blinding of subjects and everyone working with the subjects
or study data to the assigned intervention (double-blinding)
decreases the likelihood of performance bias. Performance bias
occurs when patients in one group experience care or exposures
not experienced by patients in the other group(s) and the differences
in care affect the study outcomes. Lack of blinding may affect
outcomes in that:
- Unblinded subjects
may report outcomes differently from blinded subjects, have
different thresholds for leaving a study, seek (and possibly
receive) additional care in different ways.
- Unblinded clinicians
may behave differently towards patients than blinded clinicians.
- Using unblinded
assessors may result in systematic differences in outcomes
assessment (assessment bias).
A number of studies
have shown that lack of blinding is associated with inflated
treatment effects.
In some cases blinding
may not be possible. For example, side effects or taste may
result in unblinding. The important point is that even if blinding
is not possible, the investigators do not get “extra”
validity points for doing the best they could (i.e., the study
should not be “upgraded”).
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Blinding and Objective Outcomes
03/28/2011
We provide some general references on blinding at Recommended Reading. A frequent question (or assumption) that we hear concerns lack of blinding and objective outcomes such as mortality. There appears to be a consensus that lack of blinding can distort subjective outcomes. However, there also appears to be a belief that lack of blinding is not likely to distort hard outcomes. We are not so sure.
In reviewing the literature on blinding, we find only one reference that actually attempts to address this question. Wood et al. found little evidence of bias in trials with objective outcomes.[1] Yet, as we know, absence of evidence is not evidence of absence. Therefore, anything that contradicts these findings raises the specter that we are not “distortion-free” when it comes to lack of blinding and hard outcomes.
The RECORD trial is an interesting case in point. Caregivers were not blinded, but adjudication was. However, Psaty and Prentice point out that it appears that it is possible that lack of blinding might have affected which cases were submitted to adjudication, potentially causing a meaningful change in outcomes.[2] We wrote a letter in response that pressed even further for the importance of blinding.[3] You can read more about this particular case in the DelfiniClick that immediately follows below.
A classic study is Chalmers’ review of the effect of randomization and concealment of allocation on the objective outcome, mortality, in 145 trials of interventions for acute myocardial infarction.[4] Although this study did not focus on blinding beyond the concealment phase of studies, it may help shine some light on this area. Chalmers showed (and others confirmed later) that lack of effective allocation concealment is associated with changes in study results. It is also possible that lack of blinding of patients and investigators in studies with objective outcome measures can affect patient management and patient experiences, thereby distorting results.
In Salpeter et al. a meta-analysis of hormone replacement therapy, mortality was an outcome of interest.[5] The trials were analyzed by mean age of women in the trials (creating one of several serious threats to validity), to create a “younger women” and an “older women” analysis set. No benefit was shown in the “older women” trials, but benefit was shown in the “younger women” set. Interestingly, many of the studies in the younger women group were open-label, but none were open-label in the older women group. Although clearly not proof, this is intriguing and potentially suggestive of a distorting effect of non-blinding in studies with objective outcome measures.
To us, irrespective of any hard evidence of the impact of lack of blinding on hard outcomes, the fact that a distortion is possible, is of concern. If it is true that clinicians’ interventions can have an impact on mortality, then it is entirely possible that knowing which treatment a patient is receiving could have an impact on mortality outcomes. We know that the placebo effect is real. A patient’s knowledge of his or her treatment could be impacted by that effect and/or by a change in behaviors on the part of clinicians, investigators, patients or others involved in clinical trials, and that could affect a hard outcome such as mortality.
As critical appraisers we want to know—
Who was blinded (including an express statement about blinded assessment)?
How was blinding managed?
Was the blinding likely to have been successful?
1. Wood L, et al. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ. 2008 Mar 15;336(7644):601-5. Epub 2008 Mar 3. PubMed PMID: 18316340.
2. Psaty BM, Prentice RL. Minimizing bias in randomized trials: the importance of blinding. JAMA. 2010 Aug 18;304(7):793-4. PubMed PMID: 20716744. [See below for DelfiniClick on this study.]
3. Strite SA, Stuart ME. Importance of blinding in randomized trials. JAMA. 2010 Nov 17;304(19):2127-8; author reply 2128. PubMed PMID: 21081725.
4. Chalmers TC et al. Bias in Treatment Assignment in Controlled Clinical Trials. N Engl J Med 1983;309:1358-61. PMID: 6633598.
5. Salpeter SR, et al. Mortality associated with hormone replacement therapy in younger and older women. J Gen Intern Med July 2004;19:791-804. PMID: 15209595
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Open-Label
Trials and Importance of Blinding (Even with Hard Outcomes)
One of our
heroes is Dr. Bruce Psaty, a brilliant and dedicated University
of Washington researcher Sheri worked with years ago during
her stint at the Group Health Cooperative Center for Health
Studies (now retitled, Group Health Research Institute). Bruce
does some really interesting and important work, and frequently
his efforts add to our collection of cases for critical appraisal
training.
In a recent issue of JAMA, he and Dr. Ross Prentice,
a statistician and leader at Fred Hutchinson Cancer Research
Center, address, “Minimizing Bias in Randomized Trials:
The Importance of Blinding.”[1] They explore the “prospective
randomized open trial with blinded endpoints,” and examine
other evidence supporting the importance of investigator-blinding
in clinical trials. In their commentary, they examine the RECORD
trial (Rosiglitazone Evaluated for Cardiac Outcomes and Regulation
of Glycemia in Diabetes) which was an open-label trial with
blinded assessment. They report that it was determined that
event rates for myocardial infarction in the control group were
unexpectedly low, and they summarize some findings from an independent
review by the FDA which identified myriad problems with case
report forms created prior to any blind assessment. The FDA
review resulted in a re-analysis, using the available readjudicated
case information, with the end result that the outcome of non-significance
for risk of MI in the original study report changed to a statistically
significant difference, the results of which were reported to
be “remarkably close to results” reported in the
original meta-analysis that raised concerns about rosiglitazone
and cardiovascular risk.[2]
In our letter to JAMA,[3] we express that Drs.
Psaty and Prentice add to evidence on the importance of blinding,
and we raise some points to carry this further, including an
example specific to the commentary, that addresses potential
for unbalancing study groups.
We want to expand upon this to make two basic
key points:
1. As a general principle, nondifferential errors
between treatment groups can, in fact, systematically bias summary
measures. Example: Inaccurate measuring instruments equally
applied. What if a question on a survey instrument fails to
capture an outcome of interest? It might show no difference
between groups, when a true difference actually exists.
2. Nondifferential errors may be nondifferential
in appearance only. Missing data are a case in point. Missing
data points are frequent problems in clinical trials. Some reviewers
are unconcerned by missing data provided that the percent of
missing data is balanced between groups. We disagree. Just because
data may be missing in equal measure doesn’t mean that
a distortion of results has not occurred.
In our letter, we also point out that unblinded
investigators may treat patients differently, which is a performance
bias. Patients with differing care experiences could have dramatically
different outcomes, including myocardial infarction, in keeping
with the RECORD study example.
We are grateful to Drs. Psaty and Prentice for
their work and agree that they have put a greater spotlight
on “likely important departures from the crucial equal
outcome ascertainment requirement under open-label trial designs.”[1]
We hope from their work and our letter that people will increasingly
see the important role blinding plays in clinical trial design
and execution.
References
1. Psaty BM, Prentice RL. Minimizing bias in randomized trials:
the importance of blinding. JAMA. 2010 Aug 18;304(7):793-4.
PubMed PMID: 20716744.
2. Nissen SE, Wolski K. Effect of rosiglitazone
on the risk of myocardial infarction and death from cardiovascular
causes. N Engl J Med. 2007 Jun 14;356(24):2457-71. Epub 2007
May 21. Erratum in: N Engl J Med. 2007 Jul 5;357(1):100.. PubMed
PMID: 17517853.
3. Strite SA, Stuart ME. Importance of Blinding
in Randomized Trials: To the Editor. JAMA. 2010 Nov 17;304(19):2127-8. Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. |
Blinding In Surgical Trials — It is Through Blinding
We Become Able To See
11/17/2010
Blinding
is an important consideration when evaluating a study. Without
blinding, the likelihood of bias increases. Bias occurs when
patients in one group experience care or exposures not experienced
by patients in the other group(s), and the differences in care
affect the study outcomes.Lack of blinding may be a major source
of this type of bias in that unblinded clinicians who are frequently
“rooting for the intervention” may behave differently
than blinded clinicians towards patients whom they know to be
receiving the study drug or intervention being studied. The
result is likely to be that in unblinded studies, patients may
receive different or additional care. Unblinded subjects may
be more likely to drop out of a study or seek care in ways that
differ from blinded subjects. Unblinded assessors may also be
“rooting for the intervention” and assess outcomes
differently from blinded assessors.
How much difference
does blinding make? Jüni et al. reviewed four studies that
compared double blinded versus non-blinded RCTs and attempted
to quantify the amount of distortion (bias) caused by lack of
double blinding [1]. Overall, the overestimation of effect was
about 14%. The largest study reviewed by Juni assessed the methodological
quality of 229 controlled trials from 33 meta-analyses and then
analyzed, using multiple logistic regression models, the associations
between those assessments and estimated treatment effects [2].
Trials that were not double-blind yielded on average 17% greater
effect, 95% CI (4% to 29%), than blinded studies (P = .01).
Lack of double
blinding is frequently found in surgical trials and results
in uncertain evidence because of the problems stated above.
A case study helps to illustrate this. A recent multicenter
RCT, the Spine Patient Outcomes Research Trial (SPORT)[3]
was a non-blinded trial that serves as an interesting case
study of the blinding issues that arise when a surgical intervention
is compared to a non-surgical intervention, and blinding is
not attempted. The trial included patients with persistent
(at least 6 weeks) disk-related pain and neurologic symptoms
(sciatica) who were randomized to undergo diskectomy or receive
usual care (not standardized but frequently including patient
education, anti-inflammatory medication, and physical therapy,
alone or in combination). There were a number of problems
with this study including lack of power, poor control of non-study
interventions, a high proportion of patients who crossed over
between treatment strategies (43% randomized to surgery did
not undergo surgery by 2 years and the 42% randomized to conservative
care did receive surgery) and lack of blinding. The degree
of missing data was 24%-27% without a true intention-to-treat
analysis. Of great interest was an editorial that dealt with
the problem of non-blinding in surgical studies. The editorialist,
Flum, makes the following points [4]:
- While the
technique of sham intervention is well accepted in studies
of medications using inactive pills (placebos), simulated
acupuncture, and nontherapeutic conversation in place of
therapeutic psychiatric interventions, it has only occasionally
been applied to surgical trials. This is unfortunate because
the use of sham controls has been critical in understanding
just how much patient expectation influences outcomes after
an operation.
- A sham-controlled
trial would be particularly relevant for spine surgery since
the most commonly occurring and relevant outcomes are subjective.
- Patients chosing
surgical options may have high expectations. They may include
a higher level of emotional “investment” in
surgical care compared with usual care based on the level
of commitment resulting from a decision to have an operation
and get through recovery. After the patient has accepted
the risks of surgical intervention, the desire for improvement
may drive perceptions about improvement.
- Patients
who opt for surgery may also differ from patients who decline
surgery in their beliefs regarding the benefits of invasive
interventions.
- The surgeon’s
expectations and direction are likely to play an important
role in patient improvement.
- Given the
proliferation of operative procedures for the treatment
of subjective complaints like back pain, the need for sham
controlled trials has never been greater.
Flum goes on to
present multiple examples of the power of suggestion and the
problem of doing non-blinded trials in the field of surgery.
Observational trials have often reported procedural success,
but sham-controlled trials for the same conditions demonstrate
how much of that success is due to the placebo effect.
- Example 1 — Ligation of Internal Mammary: After multiple
observational studies suggesting that ligation of the internal
mammary artery was helpful in patients with coronary disease,
Cobb et al randomized patients to operative arterial ligation
or a sham procedure. Both groups improved after the intervention,
but there were similar, if not greater, improvements in subjective
measures such as exercise tolerance and nitroglycerin use
in the sham surgical group.
- Example 2 — Osteoarthritic Knee Surgery — and 3
— Osteoarthritic Knee Joint Irrigation:
After multiple case series reported that patients with osteoarthritis
of the knee improve after arthroscopic surgery, Moseley et
al demonstrated just how much of that effect is related to
the hopes, expectations, and beliefs of the patient. The investigators
randomized 180 patients to undergo arthroscopy with debridement,
arthroscopy with lavage, or sham arthroscopy. The power of
expectation was strong and patients were unable to determine
if they had been assigned to the treatment or sham groups—
and all groups improved. At 2 years after randomization, all
patients reported comparable pain scores and functional scores.
Another sham-controlled study in patients with knee osteoarthritis
demonstrated that patients benefit equally from irrigation
of the joint and from sham irrigation.
- Example 4 — Parkinson’s Disease: Researchers found similar
improvements in quality of life after direct brain injections
of embryonic neurons or placebo in patients with advanced
Parkinson’s disease.
- Example 5 — Transmyocardial Laser Revascularization in HF:
Heart failure patients undergoing transmyocardial laser revascularization
or sham procedures had equal improvements in subjective outcomes.
- Example 6 — Hernia: After hernia repair, there was equal
improvement in pain control after cryoablation of nerves or
sham interventions.
- Examples 7-9
— Laparoscopic Interventions: Multiple
case series have reported benefit on subjective outcomes such
as pain control, function, and readiness for discharge with
laparoscopic cholecystectomy, colon resection, and appendectomy
compared with conventional approaches..Bias arises when the
clinical care team influences patient and discharge expectations
though coaching, communication, and management. Randomized
trials of these three procedures that included blinding of
both the patients and the discharging clinicians to the treatment
that patients received by placing large, side-to-side abdominal
wall dressings demonstrate little or no difference in patients
reaching discharge criteria. A reasonable conclusion is that
when the clinician’s expectations and “coaching”
were removed by placing a large bandage on the abdominal wall,
the subjective benefits disappeared. Flum concludes that studies
not addressing both patient and clinician expectation on subjective
outcomes do not inform the clinical community about the true
role of the intervention.
Delfini Commentary
Blinding of subjects and everyone working with the
subjects or study data to the assigned intervention (double-blinding)
decreases the likelihood of bias. Bias may be more likely to
occur when evaluating subjective outcomes such as pain, satisfaction,
and function in non-blinded studies, but it has also been reported
with objective outcomes such as mortality. When dealing with
subjective outcomes, as Flum points out, it is critical to distinguish
the effect of the intervention from the effect of the patient’s
expectation of the intervention. The only way to distinguish
the effect of a patient’s positive expectations of an
operation from the intervention itself is to blind patients
to the treatment they receive and randomize them to receive
the intervention of interest or to receive a sham intervention
(placebo). Yet we frequently hear, “But blinding is not
possible in surgical studies.” Frequently the argument
is raised that subjecting people to anesthesia and sham surgery
is not ethical. However, conducting clinical trials employing
methods that result in avoidable fatal flaws is also problematic.
Flum’s position is that when the risk of a placebo does
not exceed a threshold of acceptable research risk and if the
knowledge to be gained is substantial, a sham-controlled trial
is needed and is ethical. He reasons that ethical justification
of placebo-controlled trials is based on the following considerations:
- Invasive procedures
are associated with risks.
- There are great
harms created by conducting studies that are of uncertain
validity.
- Establishing
community standards based on uncertain evidence is more likely
to result in more harm than good.
- Sham-controlled
trials are justified when uncertainty exists among clinicians
and patients about the merits of an intervention.
The SPORT trial
draws attention to the problem of non-blinding in surgical trials.
This was a very expensive, labor-intensive study that provides
no useful efficacy data. Research subjects were undoubtedly
told this study would provide answers regarding the relative
efficacy of surgery vs conservative care for lumbar spine disease.
The authors of the SPORT trial state that a sham-controlled
trial was impractical and unethical, possibly — according
to Flum — because the risk of the sham would include general
anesthesia (to truly blind the patients). He would argue that
in this case blinding which would require anesthesia is the
only way that valid, useful evidence could have been created.
Even though we graded the study U (uncertain validity and usefulness)
and would not use the results to inform decisions about efficacy
or effectiveness because of the threats to validity, the study
does report information regarding risks of surgery that may
be of great value to patients.
References
1 Jüni P,
Altman DG and Egger M. Systematic reviews in health care: Assessing
the quality of controlled clinical trials. BMJ. 2001;323;42-46.
PMID: 11440947
2 Schulz KF, Chalmers
I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions
of methodological quality associated with estimates of of treatment
effects in controlled trials. JAMA 1995;273:40812. PMID:
7823387.
3 Weinstein JN,
Tosteson TD, Lurie JD, et al. Surgical vs nonoperative treatment
for lumbar disk herniation: the Spine Patient Outcomes Research
Trial (SPORT): a randomized trial. JAMA. 2006;296:2441-2450.
PMID: 17119141
4 Flum DR. Interpreting
Surgical Trials With Subjective Outcomes Avoiding UnSPORTsmanlike
Conduct. JAMA, November 22/29, 2006—Vol 296, No. 20: 2483-1484.
PMID: 17119146
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
The
Importance of Blinded Assessors in RCTs
We
have previously summarized the problems associated with lack
of blinding in surgical (and other) studies — see Blinding
in Surgery Trials in a previous DelfiniClick™.
The major problem with unblinded studies is that the outcomes
in the intervention group are likely to be falsely inflated
because of the biases introduced by lack of blinding.
Recently a group
of orthopedists identified and reviewed thirty-two randomized,
controlled trials published in The Journal of Bone and Joint
Surgery between 2003 and 2004 to evaluate the effect of blinded
assessment vs non-blinded assessment on reported outcomes [1].
Results
- Sixteen of
the thirty-two randomized controlled trials did not report
blinding of outcome assessors when blinding would have been
possible.
- Among the studies
with continuous outcome measures, unblinded outcomes assessment
was associated with significantly larger treatment effects
than blinded outcomes assessment (standardized mean difference,
0.76 compared with 0.25; p = 0.01).
- In the studies
with dichotomous outcomes, unblinded outcomes assessments
were associated with significantly greater treatment effects
than blinded outcomes assessments (odds ratio, 0.13 compared
with 0.42; p < 0.001).
- This translates
into a relative risk reduction of 38% for blinded outcome
assessments compared with 71% for unblinded outcome assessments
(a difference of 33%).
Conclusion
Unblinded outcomes assessment dramatically inflates the reported
benefit of effectiveness of treatments.
Delfini Commentary
This is yet another study pointing out the importance of blinding.
Based on this and other similar studies it is our conclusion
that studies or the results of studies without blinded assessors
are grade U or at best grade B-U (see evidence-grading scale
here).
Reference
1. Poolman RW,
Struijs PA, Krips R, Sierevelt IN, Marti RK, Farrokhyar F, Bhandari
M. Reporting of outcomes in orthopaedic randomized trials: does
blinding of outcome assessors matter? J Bone Joint Surg Am.
2007 Mar;89(3):550-8. J Bone Joint Surg Am. 2007 Mar;89(3):550-8.
PMID: 17332104.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Testing
the Success of Blinding
03/23/09
Blinding in clinical
trials of medical interventions is important. Researchers have
reported that lack of blinding is likely to overestimate benefit
by up to a relative 72%. [1-4] Optimal reporting of blinding
entails who was blinded, how the blinding was performed and
whether the blind was likely to have been successfully maintained.
To assess the latter,
investigators, at times, attempt to test the success of blinding
following a clinical trial by asking clinicians and/or patients
to identify which arm they believed they were assigned to. However,
the results of this attempt may be misleading due to chance
and there is a strong possibility of confounding due to pre-trial
hunches about efficacy as described by Sackett in a letter to
the BMJ, "Why not test success of blinding?" PMID:
15130997.[5]
To illustrate Sackett's
point with a brief scenario, let us say that a new agent is
approved and interest about the agent is running high. A clinician
participating in a new clinical trial of that agent who is already
predisposed to believe the drug works is likely to guess all
treatment successes were a result of patients being assigned
to this arm. If an agent actually is effective, then it will
be likely to appear that blinding was not successful even if
it was.
Sackett describes
the reverse scenario here: http://www.bmj.com/cgi/content/full/328/7448/1136-a
- Kjaergard LL,
John Villumsen J, Gluud C. Reported Methodologic Quality and
Discrepancies between Large and Small Randomized Trials in
Meta-Analyses. Ann Intern Med. 2001;135:982-989. PMID 11730399
- Poolman RW,
Struijs PA, Krips R, Sierevelt IN, Marti RK, Farrokhyar F,
Bhandari M. Reporting of outcomes in orthopaedic randomized
trials: does blinding of outcome assessors matter? J Bone
Joint Surg Am. 2007 Mar;89(3):550-8. J Bone Joint Surg Am.
2007 Mar;89(3):550-8. PMID: 173321045.
- Schulz KF, Chalmers
I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions
of methodological quality associated with estimates of treatment
effects in controlled trials. JAMA. 1995;273:408-12. PMID:
7823387
- Jüni P, Altman DG, Egger M. Systematic reviews in health care: Assessing the quality of controlled clinical trials. BMJ. 2001 Jul 7;323(7303):42-6. Review. PubMed PMID: 11440947; PubMed Central PMCID: PMC1120670
- Sackett in a
letter to the BMJ, "Why not test success of blinding?"
PMID: 15130997
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Time-related Biases Including Immortality Bias
10/08/2013
We were recently asked about the term “immortality bias.” The easiest way to explain immortality bias is to start with an example. Imagine a study of hospitalized COPD patients undertaken to assess the impact of drug A, an inhaled corticosteroid preparation, on survival. In our first example, people are randomized to receive a prescription to drug A post-discharge or not to receive a prescription. If someone in group A dies prior to filling their prescription, they should be analyzed as randomized and, therefore, they should be counted as a death in the drug A group even though they were never actually exposed to drug A.
Let's imagine that drug A confers no survival advantage and that mortality for this population is 10 percent. In a study population of 1,000 patients in each group, we would expect 100 deaths in each group. Let us say that 10 people in the drug A group died before they could receive their medication. If we did not analyze the unexposed people who died in group A as randomized, that would be 90 drug A deaths as compared to 100 comparison group deaths—making it falsely appear that drug A resulted in a survival advantage.
If drug A actually works, the time that patients are not exposed to the drug works a little against the intervention (oh, yes, and do people actually take their drug?), but as bias tends to favor the intervention, this probably evens up the playing field a bit—there is a reason why we talk about "closeness to truth" and "estimates of effect."
"Immortality bias" is a risk in studies when there is a time period (the "immortal" or the "immune" time when the outcome is other than survival) in which patients in one group cannot experience an event. Setting aside the myriad other biases that can plague observational studies, such as the potential for confounding through choice of treatment, to illustrate this, let us compare our randomized controlled trial (RCT) that we just described to a retrospective cohort study to study the same thing. In the observational study, we have to pick a time to start observing patients, and it is no longer randomly decided how patients are grouped for analysis, so we have to make a choice about that too.
For our example, let us say we are going to start the clock on recording outcomes (death) beginning at the date of discharge. Patients are then grouped for analysis by whether or not they filled a prescription for drug A within 90 days of discharge. Because "being alive" is a requirement for picking up prescription, but not for the comparison group, the drug A group potentially receives a "survival advantage" if this bias isn't taken into account in some way in the analysis.
In other words, by design, no deaths can occur in the drug A group prior to picking up a prescription. However, in the comparison group, death never gets an opportunity to "take a holiday" as it were. If you die before getting a prescription, you are automatically counted in the comparison group. If you live and pick up your prescription, you are automatically counted in the drug A group. So the outcome of "being alive" is a prerequisite to being in the drug A group. Therefore, all deaths of people not filling a prescription that occur prior to that 90 day window get counted in the comparison group. And so yet another example of how groups being different or being treated differently other than what is being studied can bias outcomes.
Many readers will recognize the similarity between immortality bias and lead time bias. Lead time bias occurs when earlier detection of a disease, because of screening, makes it appear that the screening has conferred a survival advantage—when, in fact, the "greater length of time survived" is really an artifact resulting from the additional time counted between disease identification and when it would have been found if no screening had taken place.
Another instance where a time-dependent bias can occur is in oncology studies when intermediate markers (e.g., tumor recurrence) are assessed at the end of follow-up segments using Kaplan-Meier methodology. Recurrence may have occurred in some subjects at the beginning of the time segment rather than at the end of a time segment.
It is always good to ask if, in the course of the study, could the passing of time have had a resulting impact on any outcomes?
Other Examples —
- Might the population under study have significantly changed during the course of the trial?
- Might the time period of the study affect study results (e.g., studying an allergy medication, but not during allergy season)?
- Could awareness of adverse events affect future reporting of adverse events?
- Could test timing or a gap in testing result in misleading outcomes (e.g., in studies comparing one test to another, might discrepancies have arisen in test results if patients’ status changed in between applying the two tests)?
All of these time-dependent biases can distort study results.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Empirical Evidence of Attrition Bias in Clinical Trials
04/12/2012
The commentary, “Empirical evidence of attrition bias in clinical trials,” by Juni et al [1] is a nice review of what has transpired since 1970 when attrition bias received attention in a critical appraisal of a non-valid trial of extracranial bypass surgery for transient ischemic attack. [2] At about the same time Bradford Hill coined the phrase “intention-to-treat.” He wrote that excluding patient data after “admission to the treated or control group” may affect the validity of clinical trials and that “unless the losses are very few and therefore unimportant, we may inevitably have to keep such patients in the comparison and thus measure the ‘intention-to-treat’ in a given way, rather than the actual treatment.”[3] The next major development was meta-epidemiological research which assessed trials for associations between methodological quality and effect size and found conflicting results in terms of the effect of attrition bias on effect size. However, as the commentary points out, the studies assessing attrition bias were flawed. [4,5,6].
Finally a breakthrough in understanding the distorting effect of loss of subjects following randomization was seen by two authors evaluating attrition bias in oncology trials.[7] The investigators compared the results from their analyses which utilized individual patient data, which invariably followed the intention-to-treat principle with those done by the original investigators, which often excluded some or many patients. The results showed that pooled analyses of trials with patient exclusions reported more beneficial effects of the experimental treatment than analyses based on all or most patients who had been randomized. Tierney and Stewart showed that, in most meta-analyses they reviewed based on only "included" patients, the results favored the research treatment (P = 0.03). The commentary gives deserved credit to Tierney and Stewart for their tremendous contribution to critical appraisal and is a very nice, short read.
References
1. Jüni P, Egger M. Commentary: Empirical evidence of attrition bias in clinical trials. Int J Epidemiol. 2005 Feb;34(1):87-8. Epub 2005 Jan 13. Erratum in: Int J Epidemiol. 2006 Dec;35(6):1595. PubMed PMID: 15649954.
2. Fields WS, Maslenikov V, Meyer JS, Hass WK, Remington RD, Macdonald M. Joint study of extracranial arterial occlusion. V. Progress report of prognosis following surgery or nonsurgical treatment for transient cerebral ischemic attacks. PubMed PMID: 5467158.
3. Bradford Hill A. Principles of Medical Statistics, 9th edn. London: The Lancet Limited, 1971.
4. Schulz KF, Chalmers I, Hayes RJ, Altman D. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408–12. PMID: 7823387
5. Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in metaanalyses. Ann Intern Med 2001;135:982–89. PMID 11730399
6. Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA. 2002 Jun 12;287(22):2973-82. PubMed PMID: 12052127.
7. Tierney JF, Stewart LA. Investigating patient exclusion bias in meta-analysis. Int J Epidemiol. 2005 Feb;34(1):79-87. Epub 2004 Nov 23. PubMed PMID: 15561753.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Attrition
Bias: Intention-to-Treat Basics
Updated 02/11/2013
In
general, we approach critical appraisal of RCTs by evaluating
the four major components of a trial— study population
(including how established), the intervention, the follow-up
and the assessment. There is very little controversy about the
process of randomizing in order to distribute known and unknown
confounders as equally as possible between the groups. There
also appears to be general understanding that the only difference
between the two groups should be what is being studied. However,
what seems to receive much less attention is the considerable
potential for bias that occurs when data is missing from subjects
because they do not complete a study or are lost to follow-up,
and investigators use models to deal with that missing data.
The only way to prevent this bias is to have data on all randomized
subjects. This is frequently not possible. And bias creeps in.
Intent-to-treat
designs that provide primary outcome data on all randomized
patients are the ideal. All patients randomized are included
in the analysis — and patients are analyzed in the same
groups to which they were randomized. Unfortunately we are rarely
provided with all of this information, and we must struggle
to impute the missing data—i.e., we must do our own sensitivity
analysis and recalculate p-values based on various assumptions
(e.g., worst case scenario, all missing subject fail, etc.)
— when possible! All too often, papers do not report sufficient
data to perform these calculations, or the variables do not
lend themselves to this type of analysis because they cannot
be made binomial, and we are left with the authors’ frequently
inadequate analysis which might result in our assigning a low grade to the study.
We see many studies
where the analysis is accomplished using Kaplan-Meier estimates
and other models to deal with excluded patient data. As John
Lachin has pointed out, this type of “efficacy subset”
analysis has the potential for Type I errors (study findings=significant
difference between groups; truth=no significant difference)
as large as 50 percent or higher [1]. Lachin and others have
shown that the statistical methods used when data is censored
(meaning not included in analysis either through patient discontinuation
or data being removed), frequently assume that —
- Missing data
is missing at random to some degree;
- It is reasonable
to impute missing data using assumptions from non-missing
data; and,
- The bias from
efficacy subset analysis is not a major factor.
We want to see
data on all patients randomized. When patients are lost to follow-up
or do not complete a study, we want to see intent-to-treat analyses
with clear statements about how the missing data is imputed.
We agree with Lachin’s suggestion that the intent-to-treat
design is likely to be more powerful (than statistical modeling),
and especially powerful when an effective treatment slows progression
of a disease during its administration—i.e., when a patient
benefits long after the patient becomes noncompliant or the
treatment is terminated. Lachlin concludes that, “The
bottom line is that the only incontrovertibly unbiased study
is one in which all randomized patients are evaluated and included
in the analysis, assuming that other features of the study are
also unbiased. This is the essence of the intent-to-treat philosophy.
Any analysis which involves post hoc exclusions of information
is potentially biased and potentially misleading.”
We also agree with
an editorial comment made by Colin Begg who states that, “The
properly conducted randomized trial, where the primary endpoint
and the statistical method are specified in advance, and all
randomized patients contribute to the analysis in an intent-to-treat
fashion, provides a structure that severely limits our opportunity
to obscure the facts in favor of our theories.” Begg concludes
by supporting Lachin’s assessment: “He is absolutely
correct in his view that the recent heavy emphasis on the development
of missing data methodologies in statistical academic circles
has led to a culture in which poorly designed studies with lots
of missing data are perceived to be increasingly more acceptable,
on the flimsy notion that sophisticated statistical modeling
can overcome poor quality data. Mundane though it may sound,
I strongly support his [Lachin’s] assertion that `…the
best way to deal with the problem (of missing data) is to have
as little missing data as possible…’ Attention to
the development of practical strategies for obtaining outcome
data from patients who withdraw from trials, notably short-term
trials with longitudinal repeated measures outcomes, is more
likely to lead to improvement in the quality of clinical trials
than the further development of statistical techniques that
impute the missing data. [2]”
It would be difficult
to express our concern more eloquently than what is stated above.
The two examples below amplify this.
Example 1: A group
of rheumatologists were uncomfortable with Kaplan-Meier statistical
methods for analysis of outcomes in rheumatology studies. Their
concern was that, even though Kaplan-Meier methods are frequently
used to analyze cancer data, very little research has been done
to validate the use of Kaplan-Meir methods for drug studies
(i.e. endpoints such as stopping medication because of side-effects
or lack of efficacy. They tested three assumptions upon which
Kaplan-Meier survival analysis depends:
1. Patients recruited
early in the study should have the same drug survival (i.e.
time to determination of lack of efficacy or onset of side-effects)
as those recruited later;
2. Patients receiving their first drug later in the study should
have the same drug survival characteristics as those receiving
it earlier; and,
3. Drug survival characteristics should be independent of the
time that a patient has been in the study before receiving the
disease modifying drug.
To examine the
above assumptions, the authors plotted survival curves for the
different groups (i.e. subjects recruited early vs those recruited
later) and showed that, in each case, the drug survival characteristics
were statistically different between the two groups (p<0.01).
They conclude, as did Lachin, that it is not possible to prove
that survival analysis is always invalid (even though they did
show in this case the Kaplan-Meier analysis was invalid). However,
this group feels that the onus of proof is on those who advocate
for drug survival analysis—i.e., using statistical modeling
rather than presenting all the data so that the reader can do
an ITT analysis or sensitivity analysis[3].
Example 2: A similar
situation occurred when a group of geriatricians became concerned
that many different, and sometimes inappropriate, statistical
techniques are used to analyze the results of randomized controlled
trials of falls prevention programs for elderly people. To evaluate
this, they used raw data from two randomized controlled trials
of a home exercise program to compare the number of falls in
the exercise and control groups using two different survival
analysis models (Andersen-Gill and marginal Cox regression)
and a negative binomial regression model for each trial.
In one trial, the
three different statistical techniques gave similar results
for the efficacy of the intervention but, in the second trial,
underlying assumptions were violated for the two Cox regression
models. Negative binomial regression models were easier to use
and more reliable.
Proportional Hazards
and Cox Regression Models: The authors point that although the
use of proportional hazards or Cox regression models can test
whether several factors (for example, intervention group, baseline
prognostic factors) are independently related to the rate of
a specific event (e.g., a fall) that using survival probabilities
to analyze time to fall events assumes that, at any time, participants
who are censored before the end of the trial have the same risk
of falling as those who complete the trial. An assumption of
proportional hazards models is that the ratio of the risks of
the events in the two groups is constant over time and that
the ratio is the same for different subgroups of the data, such
as age and sex groups. This is known as the proportionality
of hazards assumption. No particular distribution is assumed
for the event times, that is, the time from the trial start
date for the individual to the outcome of interest (in this
case, a fall event) such as would be the case for death following
cardiac surgery, where one assume a greater frequency of deaths
to occur close to the surgical event.
Andersen-Gill and
marginal Cox proportional hazards regression: These models are
used in survival analyses when there are multiple events per
person in a trial. The Andersen-Gill extension of the proportional
hazards regression model and the marginal proportional hazards
regression model are both statistical techniques used for analyzing
recurring event data.
Negative Binomial
Regression: The negative binomial regression model can also
be used to compare recurrent event rates in different groups.
It allows investigation of the treatment effect and confounding
variables, and adjusts for variable follow-up times by using
time at risk.
In the first study
of falls in the elderly, all three statistical approaches indicated
that falls were significantly reduced by 40% (Andersen-Gill
Cox model), 44% (marginal Cox model) and 39% (negative binomial
regression model) in the exercise group compared with those
in the control group. The tests for the proportionality of hazards
for both types of survival regression models indicated that
these models “worked” for the recurring falls problem.
In the second study,
there was evidence that the proportional hazards assumption
was violated in the Andersen-Gill and marginal Cox regression
models (proportional hazards test). The authors point out that
survival analysis is not valid if participants who are censored
do not have the same rate of outcome (risk of falling) as those
who continue in the trial. The authors point out and cite a
reference for concluding that those not completing a falls prevention
trial are at higher risk of falling and, if fewer from one group
than another group withdraw, it may point to a study-related
cause for the change in discontinuation, and results may be
biased.
Summary
Unfortunately, readers are in a very difficult position when
evaluating the quality of studies that use survival analyses
and statistical modeling because the assumptions used in the
models are almost never given and the missing data points are
frequently quite large.
Many researchers, biostatisticians and others struggle with this area—there appears to be no clear agreement in the clinical research community about how to best address these issues. There also is inconsistent evidence on the effects of attrition on study results. We, therefore, believe that studies should be evaluated on a case-by-case basis.
The key question is, "Given that attrition has occurred, are the study results likely to be true?" It is important to look at the contextual elements of the study and reasons for discontinuation and loss-to-follow up and to look at what data is missing and why to assess likely impact on results. Attrition may or may not impact study outcomes depending, in part, upon the reasons for withdrawals, censoring rules and the resulting effects of applying those rules, for example. However, differential attrition issues should be looked at especially closely. Unintended differences between groups are more likely to happen when patients have not been allocated to their groups in a blinded fashion, groups are not balanced at the onset of the study and/or the study is not effectively blinded or an effect of the treatment has caused the attrition.
References
1. Lachin JM. Statistical
considerations in the intent-to-treat principle. Control Clin
Trials 2000;21:167–189. PMID: 11018568
2. Utley M. et
al. Potential bias in Kaplan-Meier survival analysis applied
to rheumatology drug studies. Rheumatology 2000;39:1-6.
3. Robertson,
MC et al. Statistical Analysis of Efficacy in Falls Prevention.
Journal of Gerontology 2005;60:530–534.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Loss to Follow-up Update
Updated 02/11/2013
Heads up about an important systematic review of the effects of attrition on outcomes of randomized controlled trials (RCTs) that was recently published in the BMJ.[1]
Background
- Key Question: Would the outcomes of the trial change significantly if all persons had completed the study, and we had complete information on them?
- Loss to follow-up in RCTs is important because it can bias study results.
BMJ Study
The aim of this review was to assess the reporting, extent and handling of loss to follow-up and its potential impact on the estimates of the effect of treatment in RCTs. The investigators evaluated 235 RCTs published between 2005 through 2007 in the five general medical journals with the highest impact factors: Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine. All eligible studies reported a significant (P<0.05) primary patient-important outcome.
Methods
The investigators did several sensitivity analyses to evaluate the effect varying assumptions about the outcomes of participants lost to follow-up on the estimate of effect for the primary outcome. Their analyses strategies were—
- None of the participants lost to follow-up had the event
- All the participants lost to follow-up had the event
- None of those lost to follow-up in the treatment group had the event and all those lost to follow-up in the control group did (best case scenario)
- All participants lost to follow-up in the treatment group had the event and none of those in the control group did (worst case scenario)
- More plausible assumptions using various event rates which the authors call the “the event incidence:” The investigators performed sensitivity analyses using what they considered to be plausible ratios of event rates in the dropouts compared to the completers using ratios of 1, 1.5, 2, 3.5 in the intervention group compared to the control group (see examples taken from Appendix 2 at the link at the end of this post below the reference). They chose an upper limit of 5 times as many dropouts for the intervention group as it represents the highest ratio reported in the literature.
Key Findings
- Of the 235 eligible studies, 31 (13%) did not report whether or not loss to follow-up occurred.
- In studies reporting the relevant information, the median percentage of participants lost to follow-up was 6% (interquartile range 2-14%).
- The method by which loss to follow-up was handled was unclear in 37 studies (19%); the most commonly used method was survival analysis (66, 35%).
- When the investigators varied assumptions about loss to follow-up, results of 19% of trials were no longer significant if they assumed no participants lost to follow-up had the event of interest, 17% if they assumed that all participants lost to follow-up had the event, and 58% if they assumed a worst case scenario (all participants lost to follow-up in the treatment group and none of those in the control group had the event).
- Under more plausible assumptions, in which the incidence of events in those lost to follow-up relative to those followed-up was higher in the intervention than control group, 0% to 33% of trials—depending upon which plausible assumptions were used (see Appendix 2 at the link at the end of this post below the reference)— lost statistically significant differences in important endpoints.
Summary
When plausible assumptions are made about the outcomes of participants lost to follow-up in RCTs, this study reports that up to a third of positive findings in RCTs lose statistical significance. The authors recommend that authors of individual RCTs and of systematic reviews test their results against various reasonable assumptions (sensitivity analyses). Only when the results are robust with all reasonable assumptions should inferences from those study results be used by readers.
Reference
1. Akl EA, Briel M, You JJ et al. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review BMJ 2012;344:e2809 doi: 10.1136/bmj.e2809 (Published 18 May 2012). PMID: 19519891
Article is freely available at—
http://www.bmj.com/content/344/bmj.e2809
Supplementary information is available at—
http://www.bmj.com/content/suppl/2012/05/18/bmj.e2809.DC1
For sensitivity analysis results tables, see Appendix 2 at—
http://www.bmj.com/content/suppl/2012/05/18/bmj.e2809.DC1
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Intention-to-Treat
Analysis & the Effects of Various Methods of Handling Missing
Subjects: The Case of the Compelling Rationale
08/04/08
The goals of Intention-to-Treat
Analysis (ITT) are to preserve the benefits of randomization
and mitigate bias from missing data. Not doing so is equivalent
to changing a study design from a randomized controlled trial
(RCT), which is an experiment, into a study with many features
of a cohort design, and thus resulting in many of problems inherent
in observational studies. For example, removal or attrition
of patients after randomization (eg, through disqualification,
a decision to not include in the analysis, discontinuations,
missingness, etc.) may systematically introduce bias, or bias
may be introduced through various aspects related to the interventions
used.
In ITT analysis,
all patients are included in the analysis through an assignment
of a value for those missing final data points. For background
on this, get basic information above and in our EBM
tips, plus the table of contents on this page for further
reading.
The purpose of
this Click is to provide some resistance to the concept of a
“compelling rationale” for excluding patients from
analysis. Sometimes researchers come up with seemingly compelling
rationale for removing patients from analysis; but, as several
EBM experts suggest, “sample size slippages” put
the study on a slippery slope.
Examples
Patients
Excluded Pre-Treatment
Some researchers consider it reasonable to exclude patients
who die before a treatment or before the treatment could take
effect since clearly the treatment was not responsible. If
groups are balanced, such a move should be considered to be
unnecessary because differences unrelated to treatment should
occur equally in each group, excepting due to chance. One
wouldn’t think to do so in a placebo group, and yet,
to keep from introducing a bias by treating groups differently,
except for the intervention or exposure under study, this
would need to be done in the placebo group. The rationale
is the same.
Case in point:
imagine a study comparing surgery to medical treatment. As
pointed out by Hollis and Campbell, if patients assigned to
surgery but not medical therapy were removed because of dying
prior to the intervention, this would create a falsely low
mortality rate in the surgical group.[1] Schultz and Grimes
clarify that this is unnecessary if the study is successfully
randomized, as randomization balances non-attributable deaths.
[2]
Patients
Determined Ineligible Post-randomization
Some investigators remove patients from analysis who are found
post-randomization to be in fact, ineligible for study. Why
would this be a problem if uniformly applied to both groups?
Schultz and Grimes argue that discovery of ineligibility is
“probably not random.” They point out that there
is the potential for a) greater attention paid to those not
responsive to treatment or having side effects; b) systematic
removal of subjects’ data; and, c) physicians to withdraw
patients if they “think” they were randomized
to wrong group. They state that there is a possible reduction
of bias if this is done fully blinded and equally between
groups, but stress that it is best not done at all, pointing
out that such problems should even out if the groups are truly
balanced in the first place due to effective randomization.
Excluding
Patients Post-randomization Who Don’t Pick Up Medication
Frequently, we see that investigators have defined their intention-to-treat
population as being all patients who filled a study prescription
— and then claim to have performed ITT analysis. Firstly,
this should not be called an ITT-analysis — it is more
correctly a modified ITT. Secondly, a problem with excluding
patients after randomization who have not picked up their
prescription is that it allows choice to enter into the experiment,
and choice may be related to differences in the characteristics
(prognostic factors) of individuals who choose to pick-up
their medications as compared to those who do not.
Also, there is
always a possibility that some patients are systematically
discouraged from picking up their medication. If there is
a differential loss in those not picking up their medication,
a systematic bias is possible and is worrisome. If there is
no differential loss, including those who did not pick up
a study medication in the analysis should not be an issue
if groups were created through true randomization.
Excluding
Protocol Deviations
Schultz and Grimes present a case study of a trial of placebo
versus prophylactic antibiotics for IUD insertion in which
25% of the patients in the group were found not to be compliant.
Why not exclude them from the analysis? In response, they
raise the question what if those 25% were in better health
or would tolerate an IUD insertion more easily – the
treatment group would be systematically biased toward those
more susceptible to infection.
A Final
Example
One of our favorite musings on ITT analysis is presented by
Gerard E. Dallal, PhD on his website at http://www.jerrydallal.com/LHSP/itt.htm
Dallal reports
that Paul Meier (of Kaplan-Meier fame), then of the University
of Chicago, offered an example involving a subject in a heart
disease study where there is a question of whether his death
should be counted against the intervention or set aside. The
subject disappeared after falling off his boat. He had been
observed carrying two six-packs of beer on board before setting
off alone. Meier argues that most researchers would set this
event aside as unrelated to the treatment, while intention-to-treat
would require the death be counted against the treatment.
But suppose, Meier continues, that the beer is eventually
recovered and every can is unopened.
“Intention-to-treat
does the right thing in any case. By treating all events the
same way, deaths unrelated to treatment should be equally
likely to occur in all groups and the worst that can happen
is that the treatment effects will be watered down by the
occasional, randomly occurring outcome unrelated to treatment.
If we pick and choose which events should count, we risk introducing
bias into our estimates of treatment effects.” [3]
Key Points
- If groups are
balanced, most adjustments should be considered to be unnecessary.
- Randomization
is the best means of creating balanced groups.
- The effect of
removing patients from an analysis is a potential derandomization,
potentially leaving groups with differing prognostic variables.
- Investigators
should more appropriately deal with these issues in a sensitivity
analysis which can be reported as a secondary analysis.
References
1. Hollis S, Campbell F. What is meant by intention to treat
analysis? Survey of published randomised controlled trials.
BMJ. Vol 319. Sept 1999: 670-674. http://bmj.com/cgi/content/full/319/7211/670?maxtoshow=?eaf
NOTE: Delfini agrees that differential loss is important to
note, but even equivalent loss of greater than five percent
could be a threat to validity.
2. Schulz KF, Grimes
DA. Sample size slippages in randomised trials: exclusions and
the lost and wayward. The Lancet. Vol 359. March 2, 2002: 781-785.
PMID: 11888606
NOTE: Delfini stresses that the approach taken for missing values
should not give an advantage to the intervention.
3. Gerard E. Dallal,
PhD: http://www.jerrydallal.com/LHSP/itt.htm accessed on 08/01/2008
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Intention-to-Treat & Imputing Missing Variables: Last-Observation-Carried-Forward (LOCF)—When We Might Draw Reasonable Conclusions
09/23/2011
Principles of Intention-to-Treat (ITT) analysis require analyzing all patients in the groups to which they were assigned. This is regardless of whether they received their assigned intervention or not and is regardless of whether they completed the trial or not. For those who do not complete the study or for whom data on endpoints is missing, a value is to be assigned—which is referred to as “data imputation.” As anything that systematically leads away from truth is a bias, imputing data is, necessarily a bias. However, it is generally considered the preferred analysis method because it is thought to help preserve the benefits of randomization and deal with problems of missing data.
Imputing outcomes for missing data points is either done to try and approximate what might have been true or is used as a method to test the strength of the results—meaning if I put the intervention through a tough challenge, such as assigning failure to those missing in the intervention group and success to those missing in the comparison group, is any difference favoring the intervention still statistically significant?
This DelfiniClick™ is focused on "last-observation-carried-forward" (LOCF) which is frequently used to assign missing variables. LOCF simply means, for example, that if I lost a patient at month 6 in a 12-month trial, I assign the 12-month value for my data point from what I observed in month 6. A number of authors consider this a method prone to bias for various reasons [1-6] not the least of which is that it is not robust and may not be a reasonable predictor of outcomes.
However, as many researchers use LOCF for data imputation, it is worth exploring whether there are circumstances that allow us to draw reasonable conclusions from otherwise valid studies when LOCF is employed. Although using LOCF in progressive conditions clearly distorts results, we might be able to get at some reasonable answers despite its use because we know the direction or trend line without effective treatment.
Scenario 1: Ideal Study Circumstances & Drug Does Not Work
Assumptions
- Ineffective agent versus placebo
- Study is of a progressive condition in which overall improvement could not be expected to happen without some kind of effective intervention
- Randomization is successful
- Concealment of allocation was performed successfully
- Blinding is successful and was maintained
- Missing data between groups is equal and timing of missing data is similar
- Study is otherwise valid
Imagine a graph that plots results between the groups over various time points—see below. We would expect the lines to be roughly the same. The resulting bias would be that the rate and lower boundary of the reported outcome would be higher than what would actually be true. However, in considering the difference in outcomes between groups, we would have a truthful answer: no difference between the groups.

Scenario 2: Ideal Study Circumstances & Drug Does Work
Assumptions
- Effective agent versus placebo
- Study is of a progressive condition in which overall improvement could not be expected to happen without some kind of effective intervention
- Randomization is successful
- Concealment of allocation was performed successfully
- Blinding is successful and was maintained
- Missing data between groups is equal and timing of missing data is similar
- Study is otherwise valid
Imagine a graph that plots results between the groups over various time points—see below. We would expect the lines to diverge. The resulting bias would be that the rate and lower boundary of the reported outcome would be higher than what would actually be true in the placebo group. Conversely, the rate and the upper boundary of the reported outcome would be lower than what would actually be true in the active agent group. So the bias would favor placebo and be conservative against the intervention. However, in considering the difference in outcomes between groups, we would have a truthful answer: a difference between the groups.

Scenario 3: Uncertain Study Circumstances & Unknown if Drug Works
Assumptions
- Agent of unknown efficacy versus placebo
- Study is of a progressive condition in which overall improvement could not be expected to happen without some kind of effective intervention
- Randomization appears successful: random method used to assign people to their groups plus a review of the table of baseline characteristics is suggestive that the groups are balanced
- Concealment of allocation appears to have been performed successfully: call-in-center was used
- Blinding appears to have been well attended to and drug side effects or other circumstances would not seem to break an effective blind
- Missing data between groups is roughly similar, but timing of missing data is unknown
- Study is otherwise valid insofar as we can tell
If the lines do diverge it seems reasonable to conclude one of three things: 1) we have a chance effect, 2) a systematic bias explains the reported improvement in the active agent group; or, 3) the agent actually works.

Chance is a possibility, though not so likely with a prespecified outcome. If the reporting were actually graphed out over time rather than just reported as a summary measure, and we saw consistency in the data points, we would conclude it would be unlikely to be a chance effect.
Another possibility could be differences in care or co-interventions. Effective concealment of allocation and effective blinding would be likely to enable us to rule out such differences being due to bias from knowing the group to which a person was assigned. Therefore, any such resulting differences would be reasonably likely to be a result of some action of the agent.
Actions of the agent would generally be either benefit or harm. If the agent caused a harm that resulted in a greater number of people in the active agent group receiving a co-intervention, that intervention would have to be effective or synergistic with the active agent, in order to see a reported benefit—which is probably not very likely. (And it is possible that this kind of situation would result in failure of successful blinding—in that instance, we would look for anything that may have resulted in improvement to patients other than the agent.)
If the agent is truly working, it is unlikely that subjects would be receiving a co-intervention. That scenario would be more likely to result if the patient were on placebo or the drug did not work. In the latter instance, probably an equal number of subjects in both groups would be getting a co-intervention and the likelihood would be no or little difference between the groups.
Conclusion Using LOCF in Progressive Illness
We strongly prefer that LOCF not be utilized for data imputation for reasons studied by various authors [1-6], but, in the case of a progressive illness, for example, with unlikely spontaneous improvement, it may be reasonable to trust claims of efficacy under the right study conditions, with a recognition that the estimates of effect will likely be distorted.
Using LOCF in progressive illnesses has the disadvantage of likely upgrading of an estimate of effect where there is actually no effect and downgrading estimates for true effectiveness.
However, our ability to discern potentially efficacious treatment is aided by expected trending. For example in a study with a placebo group with progressive disease and an intervention group with improving disease, LOCF would be conservative because it would imput better-than-actual observations in the placebo group and worse-than-actual observations in the intervention group.
Reporting by various time points strengthens confidence that outcomes are not due to chance.
Conclusion Using LOCF in Non-progressive Illness
Using LOCF in non-progressive illness is possibly more problematic as we do not have the assistance of an expected trend for either group. Consequently, we have fewer clues to aid us in drawing any conclusion.
References [Delfini LOCF Summary Notes]
- Carpenter J, Kenward K. Guidelines for handling missing data in Social Science Research. www.missingdata.org.uk [Strongly recommends avoiding LOCF.]
- Gadbury GL, Coffey CS, Allison DB. Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF. Obes Rev. 2003 Aug;4(3):165-84. PubMed PMID: 12916818. [Reports on some simulations of LOCF producing bias for all three general categories of missing data. “Both multiple imputation and mixed effects models appear to produce unbiased estimates of a treatment effect for all types of missing data.”]
- O'Brien PC, Zhang D, Bailey KR. Semi-parametric and non-parametric methods for clinical trials with incomplete data. Stat Med. 2005 Feb 15;24(3):341-58. Erratum in: Stat Med. 2005 Nov 15;24(21):3385. PubMed PMID: 15546952. [LOCF should not be used.]
- Shih W. Problems in dealing with missing data and informative censoring in clinical trials. Curr Control Trials Cardiovasc Med. 2002 Jan 8;3(1):4. PubMed PMID: 11985668; PubMed Central PMCID: PMC134466. [Discusses various biases with use of LOCF.]
- Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clin Trials. 2004;1(4):368-66. Review. PubMed PMID: 16269265. [LOCF is crude and rarely appropriate.]
- Woolley SB, Cardoni AA, Goethe JW. Last-observation-carried-forward imputation method in clinical efficacy trials: review of 352 antidepressant studies. Pharmacotherapy. 2009 Dec;29(12):1408-16. Review. PubMed PMID: 19946800. [Cautions depending on the pattern of missing data and emphasizes need for explicitly describing this in published reports along with the likely effect of dropouts and how they reached their conclusions. Recommends mixed-effects modeling as it is “less likely to introduce substantial bias.”]
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Intention-to-Treat
Analysis & Censoring: Rofecoxib Example
In
a recent DelfiniClick, we voiced concern about models used for
analysis of study outcomes, especially when information about
assumptions used is not reported. In the July 13, 2006 issue
of the NEJM (published early on-line), there is a very informative
example of what can happen when authors claim to analyze data
using the intention-to-treat (ITT) principle, but do not actually
do an ITT analysis.
Case Study
The NEJM published a correction to an original study of cardiovascular
events associated with rofecoxib versus placebo[1]. This correction
illustrates how Kaplan-Meier curves can be misleading to readers
and how they differ with various censoring assumptions. In this
case, by censoring data that occurred 14+ days after subjects
discontinued the study, the Kaplan-Meir curves for thrombotic
events did not separate until 18 months. The following is part
of the correction published by NEJM:
“…Statements
regarding an increase in risk after 18 months should be removed
from the Abstract (the sentence ‘The increased relative
risk became apparent after 18 months of treatment; during
the first 18 months, the event rates were similar in the two
groups’ should be deleted…”
The reason for
the correction appears to be an analysis of data released by
Merck to the FDA on May 11, 2006. These data provide information
about events in the subgroup of participants whose data were
censored if they had an event more than 14 days after early
discontinuation of the study medication.
Twelve thrombotic
events that occurred more than 14 days after the study drug
was stopped but within 36 months after randomization were noted.
Eight of the “new” events were in the rofecoxib
group, and these events had a definite effect on the published
survival curve for rofecoxib (Fig. 2 of the original article).
When including the new data, the separation of the rofecoxib
and placebo curves begins earlier than 18 months.
The point of all
this is that it is difficult to determine the validity of a
study when assumptions used in censoring of data are not reported.
With insufficient information about loss to follow-up, we cannot
do our own sensitivity analyses for imputing missing data with
our goal being to “test” the P-value reported by
the authors.
To reiterate
from our previous DelfiniClick:
- Intent-to-treat
designs that provide primary outcome data on all randomized
patients are the ideal. All patients randomized are included
in the analysis. The same patients randomized at the beginning
of the RCT are analyzed in the same groups to which they were
randomized.
- Authors should
use a CONSORT diagram to report what happened to various patients
during the course of the study – plus they should provide
detailed information about missing data points including timing.
- Sensitivity
analyses are welcomed, especially those that subject the intervention
to the toughest trial. If p-values remain statistically significant
after such a test, we can be more confident about anticipated
outcomes in an otherwise valid study.
1. Correction to:
Cardiovascular events associated with rofecoxib in a colorectal
adenoma chemoprevention trial. N Engl J Med 2006;355:221.
2. Bresalier RS,
Sandler RS, Quan H, et al. Cardiovascular events associated
with rofecoxib in a colorectal adenoma chemoprevention trial.
N Engl J Med 2005;352:1092-102.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Intention-to-Treat
Analysis: Misreporting and Migraine
Intention-to-treat
analysis (ITT) is an important consideration in randomized,
controlled trials. And determining whether an analysis meets
the definition of ITT analysis or not is incredibly easy. Yet
many authors mislabel their analyses as ITT when they are not
and report their results in a biased way. An article in BMJ
dealing with migraine illustrates some important points about
ITT analysis and reminds us that authors continue to
report outcomes in ways that are highly likely to be biased.
Case Study
As described in
the CONSORT STATEMENT (http://www.consort-statement.org/),
among other things, ITT analysis “prevents bias caused
by the loss of participants, which may disrupt the baseline
equivalence established by random assignment and which may reflect
non-adherence to the protocol.”
ITT analysis is
defined as follows in the CONSORT STATEMENT:
“A strategy for analyzing data in which all participants
are included in the group to which they were assigned, whether
or not they completed the intervention given to the group.”
An easy way to
tell if an ITT analysis has been done is to look at the number
randomized in each group and see if that number is the same
number that is analyzed. Number in should be the same number
out — in each group as originally randomized.
And, as you can
see, determining whether an analysis meets the definition of
ITT analysis or not is incredibly easy. Yet many authors mislabel
their analyses as ITT when they are not. In one study, in articles
reviewed authors were found to say they had performed an ITT
analysis when 47% of the time they had not. (Kruse, R. B Alper
et al. Intention-to-treat analysis: Who is in? Who is out? JFamPrac
2002 Nov: (Vol 51) #11)
An article in BMJ
dealing with migraine illustrates some important points about
ITT analysis and reminds us that authors continue to
report outcomes in ways that are highly likely to be biased.
In the Schrader
study, 30 patients with migraine were randomized to receive
lisinopril and 30 were randomized to placebo. The authors, however, only reported on 55 patients in their so-labeled
“intention-to-treat analysis” because of poor compliance. This is not an intention-to-treat analysis.
The following is
reported by the authors:
Schrader
H, Stovner, LJ, Helde G, Sand T, Bovim G. Prophylactic
treatment of migraine with angiotensin converting inhibitor
(lisinopril): randomised, placebo controlled, crossover
study. BMJ 2001;322:1-5 — article
available at — http://bmj.bmjjournals.com/cgi/content/full/322/7277/19. |
Results
In the 47 participants with complete data, hours with headache,
days with headache, days with migraine, and headache severity
index were significantly reduced by 20% (95% confidence
interval 5% to 36%), 17% (5% to 30%), 21% (9% to 34%), and
20% (3% to 37%), respectively, with lisinopril compared
with placebo. Days with migraine were reduced by at least
50% in 14 participants for active treatment versus placebo
and 17 patients for active treatment versus run-in period.
Days with migraine were fewer by at least 50% in 14 participants
for active treatment versus placebo. Intention to treat
analysis of data from 55 patients supported the differences
in favour of lisinopril for the primary end points. In the
intention to treat analysis in 55 patients, significant
differences were retained for the primary efficacy end points: |
Intention
to Treat Analysis—55 Participants with Means (SD) |
|
Lisinopril |
Placebo |
Mean
% reduction (95% CI) |
Headache hours |
138 (130) |
162 (134) |
15 (0 to 30) |
Headache days |
20.7 (14) |
24.7 (11) |
16 (5 to 27) |
Migraine days |
14.6 (10) |
18.7 (9) |
22 |
Conclusion:
The angiotensin converting enzyme inhibitor, lisinopril,
has a clinically important prophylactic effect in migraine. |
The authors have
done as their primary analysis an “optimal compliance
analysis.” They also state they have done an ITT analysis but they have not.
It is fine to do
non-ITT analyses – “as treated,” and “completer”
analysis are two common ones you will frequently see. But the
ITT analysis must be the primary analysis. Others are considered
secondary (and should be labeled and treated as such).
And so how does
one handle loss to follow-up? There are various methods, but
there is an important principle which should guide us —
the method should put the burden of proof on the intervention.
This is the opposite of our court system – “guilty
until proven innocent,” in effect. So what you do is assign
an outcome to those lost to follow-up that puts the intervention
through the toughest test. “Worse-case-basis” is
one method; “last-observed result” is another.
If you put the
intervention through the hardest test, and you still have positive
results (assuming the study is otherwise valid), you can feel
much more confident about the reported outcomes truly being
valid. If the missing subjects in the above-mentioned migraine
article are handled this way, there is no statistically
significant difference between lisinopril and placebo.
We are frequently
asked what is an acceptable percent loss to follow-up. It depends
on whether the loss to follow-up will affect the results or
not. We have seen what we consider to be important changes even
with small numbers lost to follow-up. We recommend that you
do sensitivity analyses (“what if”s) to see what
the effect might be if you had the data. Without doing an ITT
analysis, we are very uncomfortable about the results if five
percent or more of subjects have missing data for analyzing
endpoints -- and even less than five percent might have impact.
For those who would
like more information, the following article is an excellent
one on the subject and is very helpful for understanding issues
pertaining to ITT analysis and randomization as well:
Schulz
KF, Grimes DA
Sample size slippages in randomised trials: exclusions and the
lost and wayward.
The Lancet. Vol 359. March 2, 2000: 781-785
PMID: 11888606
See other reading
on ITT analysis is available here.
Very special thanks
to Murat Akalin, MD, MPH, UCSD, for selecting
a great article for case study, participating in this review,
doing the ITT analysis and encouraging us to write this.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Missing
Data Points: Difference or No Difference — Does it Matter?
Update 02/11/2013
& 01/14/2014
We continue to
study the "evidence on the evidence" — meaning
we are continually on the look out for information which may
shed light on the impact on reported outcomes of certain kinds
of bias, for example, or information that provides help in how
to handle different biases. Missing data points is an issue
affecting the majority of studies, but currently there is not
clarity on how big an issue this is, especially when there is
not a differential loss between groups.
We have spoken
about this issue with John M. Lachin, Sc.D., Professor of Biostatistics
and Epidemiology, and of Statistics, The George Washington University,
and author. (And then we did some "hard thinking"
as David Eddy would say.) Even without differential loss between
the groups overall, a differential loss could occur in prognostic
variables — and readers are rarely going to have access
to data about changes in prognostic characteristics post-baseline
reporting.
Attrition Bias Update—Here's our current thinking as of 02/13/2013:
Significant attrition, whether it be due to loss of patients or discontinuation or some other reason, is a reality of many clinical trials. And, of course, the key question in any study is whether attrition significantly distorted the study results. We've spent a lot of time researching the evidence on the distorting effects of bias and have found that many researchers, biostatisticians and others struggle with this area—there appears to be no clear agreement in the clinical research community about how to best address these issues. There also is inconsistent evidence on the effects of attrition on study results.
We, therefore, believe that studies should be evaluated on a case-by-case basis and doing so often requires sleuthing and sifting through clues along with critically thinking through the unique circumstances of the study.
The key question is, "Given that attrition has occurred, are the study results likely to be true?" It is important to look at the contextual elements of the study. These contextual elements may include information about the population characteristics, potential effects of the intervention and comparator, the outcomes studied and whether patterns emerge, timing and setting. It is also important to look at the reasons for discontinuation and loss-to-follow up and to look at what data is missing and why to assess likely impact on results.
Attrition may or may not impact study outcomes depending, in part, upon the reasons for withdrawals, censoring rules and the resulting effects of applying those rules, for example. However, differential attrition issues should be looked at especially closely. Unintended differences between groups are more likely to happen when patients have not been allocated to their groups in a blinded fashion, groups are not balanced at the onset of the study and/or the study is not effectively blinded or an effect of the treatment has caused the attrition.
One piece of the puzzle, at times, may be whether prognostic characteristics remained balanced. One item that would be helpful authors could help us all out tremendously by assessing comparability between baseline characteristics at randomization and for those analyzed. However, an imbalance may be an important clue too because it might be informative about efficacy or side effects of the agent understudy.
In general, we think it is important to attempt to answer the following questions:
Examining the contextual elements of a given study—
-
What could explain the results if it is not the case that the reported findings are true?
-
What conditions would have to be present for an opposing set of results (equivalence or inferiority) to be true instead of the study findings?
-
Were those conditions met?
-
If these conditions were not met, is there any reason to believe that the estimate of effect (size of the difference) between groups is not likely to be true.
Attrition Bias Update 01/14/2014
A colleague recently wrote us to ask us more about attrition bias. We shared with him that the short answer is that there is less conclusive research on attrition bias than on other key biases. Attrition does not necessarily mean that attrition bias is present and distorting statistically significant results. Attrition may simply result in a smaller sample size which, depending upon how small the remaining population is, may be more prone to chance due to outliers or false non-significant findings due to lack of power.
If randomization successfully results in balanced groups, if blinding is successful including concealed allocation of patients to their study groups, if adherence is high, if protocol deviations are balanced and low, if co-interventions are balanced, if censoring rules are used which are unbiased, and if there are no differences between the groups except for the interventions studied, then it may be reasonable to conclude that attrition bias is not present even if attrition rates are large. Balanced baseline comparisons between completers provides further support for such a conclusion as does comparability in reasons for discontinuation, especially if many categories are reported.
On the other hand, other biases may result in attrition bias. For example, imagine a comparison of an active agent to a placebo in a situation in which blinding is not successful. A physician might encourage his or her patient to drop out of a study if they know the patient is on placebo, resulting in biased attrition that, in sufficient numbers, would potentially distort the results from what they would otherwise have been.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
A Letter on This Topic: Attrition
Bias Caution: Non-differential Loss Between Groups Can Threaten
Validity
01/16/2011
Read our BMJ
Rapid Response Letter to a critical appraisal and
quiz that we thought missed an important point about non-differential
drop outs, our rationale and our recommedations for future reporting.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Attrition Bias and Baseline Characteristic Testing (Esp for Non-Dichotomous Variables)
05/19/2011; Update 02/11/2013
Not having complete information on all study subjects is a common problem in research. The key issue is whether those subjects for whom data is missing are similar or not to those for whom data is available. In other words, the question is might reported outcomes be distorted due to an imbalance in the groups for which we have information? As Schulz and Grimes state, “Any erosion…over the course of the trial from those initially unbiased groups produces bias, unless, of course, that erosion is random…”. [1] As of this date, we are not aware of a preferred way to handle this problematic area and the effect of various levels of attrition remains unclear.[2], [3].
We have previously summarized our position on performing sensitivity analyses when variables are dichotomous. Non-dichotomous data pose unique challenges. We think it is reasonable to perform a sensitivity analysis on subjects for whom data is available and for whom it is not. Others have recommended this approach. Dumville et al states, “Attrition can introduce bias if the characteristics of people lost to follow-up differ between the randomised groups. In terms of bias, this loss is important only if the differing characteristic is correlated with the trial’s outcome measures.…we suggest it is informative to present baseline characteristics for the participants for whom data have been analysed and those who are lost to follow-up separately. This would provide a clearer picture of the subsample not included in an analysis and may help indicate potential attrition bias.”
Other suggestions regarding missing data through censoring have been provided to us by John M. Lachin, Sc.D., Professor of Biostatistics and Epidemiology, and of Statistics, The George Washington University (personal communication):
- Evaluate censoring by examining both administrative censoring and censoring due to loss-to-follow-up. Administrative censoring (censoring of subjects who enter a study late) may not result in significant bias. Censoring because of loss-to-follow-up or discontinuing is more likely to pose a threat to validity
- Compare characteristics of losses (e.g., withdrawing consent, adverse events, loss to follow-up, protocol violations) versus completers (including administratively censored) within groups.
- Compare characteristics of losses (not administratively censored) between groups.
- Adjust group effect for factors in which groups differ.
There are some caveats that should be raised regarding this kind of sensitivity analysis. There may be other resulting imbalances between groups that are not measurable. Also no differences in characteristcs of the groups could be due to insufficient power to reveal true differences. And importantly, differences found could be due to chance.
However, if the groups appear to be similar, we think it may reasonable to conclude that such sensitivity analyses may be suggestive that the groups remained balanced despite the number of discontinuations. If the groups remained balanced, then—depending on details of the study— the discontinuations may not have created any meaningful distortion of results.
However, even if they are not balanced, it may be that the results are dependable. Read our update on attrition bias.
References
1. Schulz KF, Grimes DA. Sample size slippages in randomised trials: exclusions and the lost and wayward. Lancet. 2002 Mar 2;359(9308):781-5. PubMed PMID: 11888606.
2. Dumville JC, Torgerson DJ, Hewitt CE. Reporting attrition in randomized controlled trials. BMJ. 2006 Apr 22;332(7547):969-71. Review. PubMed PMID: 16627519; PubMed Central PMCID: PMC1444839.
3. Hewitt CE, Kumaravel B, Dumville JC, Torgerson DJ; Trial attrition study group. Assessing the impact of attrition in randomized controlled trials. J Clin Epidemiol. 2010 Nov;63(11):1264-70. Epub 2010 Jun 22. PubMed PMID: 20573482.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Attrition Bias & A Biostatistician Weighs In: Dr. Steve Simon on "Why is a 20% dropout rate bad?"
12/05/2011; Update 02/11/2013
We have written numerous times about attrition bias. Large numbers of patients dropping out of studies or unable to complete participation in studies tends to be one of the biggest barriers in passing critical appraisal screenings. This area is also one of the least understood in evaluating impact on outcomes, with a paucity of helpful evidence.
Biostatistician, Steve Simon, addresses dropout rates in this month’s newsletter in his helpful entry titled, “Why is a 20% dropout rate bad?” Steve provides us with some math to tell us that, “If both the proportion of dropouts is small and the difference in prognosis between dropouts and completers is small, you are truly worry free.”
He also gives us help with differential loss: “The tricky case is when only one [proportion of dropouts] is small. You should be okay as long as the other one isn't horribly bad. So a small dropout rate is okay even with unequal prognosis between completers and dropouts as long as the inequality is not too extreme. Similarly, if the difference in prognosis is small, then any dropout rate that is not terribly bad (less than 30% is what I'd say), should leave you in good shape.”
He gives us a rule of thumb to go by: “Now it is possible to construct settings where a 10% dropout rate leads to disaster or where you'd be safe even with a 90% dropout rate, but these scenarios are unrealistic. My rule is don't worry about a dropout rate less than 10% except in extraordinary settings. A dropout rate of 30% or higher though, is troublesome unless you have pretty good inside information that the difference in prognosis between dropouts and completers is trivially small.”
Here's our current thinking on attrition bias.
You can read Steve’s full entry here and even sign-up to be on his mailing list:
http://www.pmean.com/news/201111.html#1
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Quality
of Studies: VIGOR
Why is it that Vioxx made the front page of
the NYTs in December of 2005 when it was withdrawn from the
market in 2004? Reason: it was discovered that the authors “removed”
3 patients with CV events from the data in the days preceding
final hardcopy submission of the VIGOR study to the NEJM. Here
are some key points made by the NEJM in an editorial entitled,
Expression of Concern: Bombardier et al., “Comparison
of Upper Gastrointestinal Toxicity of Rofecoxib and Naproxen
in Patients with Rheumatoid Arthritis,” N Engl J Med 2000;343:1520-8,
published on the web 12/8/04 and in hard copy, N Engl J Med.
2005.353:25:
- The VIGOR study
was designed primarily to compare gastrointestinal events
in patients with rheumatoid arthritis randomly assigned to
treatment with rofecoxib (Vioxx) or naproxen (Naprosyn), but
data on cardiovascular events were also monitored.
- Three myocardial
infarctions, all in the rofecoxib group, were not included
in the
data submitted to the Journal in hardcopy.
- Until the end
of November 2005, the NEJM believed that these were late events
that were not known to the authors in time to be included
in the article published in the Journal on November 23, 2000.
- It now appears,
however, from a memorandum dated July 5, 2000, that was obtained
by subpoena in the Vioxx litigation and made available to
the NEJM, that at least two of the authors knew about the
three additional myocardial infarctions at least two weeks
before the authors submitted the paper version of their manuscript.
- Lack of inclusion
of the three events resulted in an understatement of the difference
in risk of myocardial infarction between the rofecoxib and
naproxen groups.
- The NEJM determined
from a computer diskette that some of these data were deleted
from the VIGOR manuscript two days before it was initially
submitted to the Journal on May 18, 2000.
- Taken together,
these inaccuracies and deletions call into question the integrity
of the data on adverse cardiovascular events in this article.
Merck's position
is that the additional heart attacks became known after the
publication's "cutoff" date for data to be analyzed
and were therefore not reported in the Journal article. To our
knowledge, NEJM has not responded to Merck's point.
In any event, without
the 3 missing subjects the relative risk of myocardial infarction
risk was 4.25 for refecoxib versus naproxen, 95% CI (1.39 to
17.37). This is based on 17 MIs out of 2315 person years of
exposure for rofecoxib and 4 MIs out of 2336 person years for
naproxen.
Adding in the 3
missing subjects (new total of 20 MIs in the rofecoxib group)
increases the relative risk to 5.00, 95% CI (1.68 to 20.13).
This demonstrates how losing just a few subjects even in a large
study can change results dramatically.
For readers, the
important point is to look carefully to be sure that all randomized
patients were accounted for.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. |
Avoiding
Overestimates of Benefit: Composite Endpoints in Cardiovascular
Trials
04/20/09
Composite endpoints
represent the grouping together of individual endpoints to serve
as a single outcome measure. They are frequently used in clinical
trials to reduce requirements for sample size. In other words,
composite endpoints — by adding together individual outcomes
— increase the overall event rates and, thus, the statistical
power of a study to demonstrate a statistical and clinically
meaningful difference between groups if one exists. Composite
endpoints also enable researchers to conduct studies of smaller
size and still reach what may be clinically meaningful outcomes.
It has been pointed out, however, that the trade-offs for this
increased power may include difficulties for readers in correctly
interpreting results.
Several investigators
[1,2] have pointed out that composite endpoints may be misleading
if the investigators —
- Include individual
outcomes that have differing importance to patients;
- Include individual
outcomes that have differing rates of occurrence; or,
- Do not include
rates for individual outcomes.
For example, in
cardiovascular trials the composite endpoint of cardiovascular
mortality, myocardial infarction and revascularization procedures
is frequently encountered. The reader is very likely to conclude
that the effect for meaningful outcomes is much greater than
the reported results based on the composite endpoint. If one
misunderstands that the apparent effect is driven largely by
revascularization — which is frequently driven by subjective
symptoms and subjective decision-making to perform the procedure
— rather than objective outcomes such as myocardial infarction
and death, then the reported composite endpoint is likely to
result in erroneous (falsely inflated) conclusions by the reader.
Lim and colleagues
[3] found in a review of 304 cardiovascular trials published
in 14 leading journals between January 2000 and January 2007
that 73% trials reported composite primary outcomes. The total
number of individual events and the total number of events represented
by the composite outcome differed in 79% of trials. P values
for composite outcomes less than 0.05 were more frequently reported
than P values of 0.05 or greater. Additionally, death as an
individual endpoint made a relatively small contribution to
estimates of effect summarized by the trials’ composite
endpoints, whereas revascularization made a greater contribution.
Lim et al. recommend that authors report results for each individual
endpoint in addition to the composite endpoint so that readers
can ascertain the contribution of each individual endpoint.
Readers should
bear in mind that safety outcomes when reported as single events
can be made to appear “insignificant” since P values
are frequently greater that 0.05. If investigators report efficacy
results as composite outcomes it may be reasonable to expect
safety results to also be reported as composites.
Bottom Lines for
Recent Cardiovascular Studies (That Also Apply to Trials in
Other Areas):
1. Composite outcomes increase event rates and statistical power.
2. Composite outcomes in cardiovascular trials are frequent
and often comprise 3 to 4 individual end points.
3. Individual events frequently vary in clinical significance.
4. Meaningful differences between the total number of individual
events in a trial and those reported for the composite outcomes
are very common.
5. When studies include composite outcomes comprised of individual
outcomes of varying importance and frequency, interpreting results
becomes difficult for readers.
6. Interpretation becomes easier if authors include individual
outcomes along with the composite measures.
References
1. Freemantle N,
Calvert M, Wood J, Eastaugh J, Griffin C. Composite outcomes
in randomized trials: greater precision but with greater uncertainty?
JAMA. 2003;289:2554-9. [PMID: 12759327].
2. Ferreira-González I, Busse JW, Heels-Ansdell D, Montori
VM, Akl EA, Bryant DM, Alonso-Coello P, Alonso J, Worster A,
Upadhye S, Jaeschke R, Schünemann HJ, Permanyer-Miralda
G, Pacheco-Huergo V, Domingo-Salvany A, Wu P, Mills EJ, Guyatt
GH. Problems with use of composite end points in cardiovascular
trials: systematic review of randomised controlled trials. BMJ.
2007 Apr 14;334(7597):786. Epub 2007 Apr 2. [PMID: 17403713].
3. Lim E, Brown A, Helmy A, Mussa S, Altman DG. Composite outcomes
in cardiovascular research: a survey of randomized trials. Ann
Intern Med. 2008 Nov 4;149(9):612-7. [PMID: 18981486]
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. |
Confidence-Intervals,
Power & Meaningful Clinical Benefit:
Advice to Readers on How to Stop Worrying about Power and Start
Using Confidence Intervals &
Using Confidence Intervals to Evaluate Clinical Benefit of "Statistically
Significant*" Findings
(Special thanks to Brian Alper, MD, MSPH and Ted Ganiats,
MD for their help in understanding this issue.)
*[Important Note: Historically, if P<0.05, it has been the convention to say that the results were “ statistically significant”, i.e., statistical testing strongly argues against the null hypothesis (null hypothesis means that there truly is no difference between study samples). If P>0.05, the convention has been to say the results were “non-significant.” It is now preferred to state the exact P-value and avoid categorizing results as statistically significant or non-significant. However, use of the older conventions persists and some of the explanations below make use of the older terms since readers are certain to encounter results reported as “significant” and “non-significant.”]
Problems
with Non-Statistically Significant Findings
Research outcomes which are not statistically significant (also
referred to as “non-significant findings”) raise
the question, "Is there TRULY no difference, or were there
not enough people to show a difference if there is one?"
(This is known as beta- or Type II error.)
Power calculations
are performed prior to a study help investigators determine
the number of people they should enroll in the study to try
and detect a statistically significant difference if there is
one. A power of >= 80% is conventional and provides some
leeway for chance. Power calculations are generally performed
only for the primary outcome. They entail a lot of assumptions.
Good News
About Power!
The good news for readers is that you don’t need to worry
about power since you can evaluate inconclusiveness of findings
through using confidence intervals.
Here’s what
they are, and here’s how it’s done:
About Confidence
Intervals (CIs)
The results of a valid study represent an approximation of truth.
There might be other possible values that could equally approximate
truth. (What if the study had been done on Friday instead of
on Tuesday, for example? Maybe the difference in outcomes would
be an absolute 4 percent and not 5 percent.) In recognition of this, confidence intervals represent a range of statistically plausible results (within a 95% chance) consistent with an outcome from a single valid study. (As
with all allowances for chance findings, 95 percent is conventional.)
You can apply confidence intervals to any measure of outcomes
such as an odds ratio or absolute risk reduction (ARR).
This is how confidence
intervals are reported:
Example: ARR
= 5%; 95% CI (3% to 7%)
How to
Use Confidence Intervals to Determine Statistical Significance
Absolute
Risk Reduction and Relative Risk Reduction
For measures reported as percentages, if the range includes
zero, the outcomes are not statistically significant.
Relative
Risk (aka Risk Ratio) and Odds Ratio
For measures reported as ratios, if the range includes 1,
the outcomes are not statistically significant.
How to
Use Confidence Intervals to Determine Conclusiveness of Non-significant
Findings
And if something is not statistically significant (also referred
to as non-significant or NS findings), you don’t know
if there truly is no difference, or whether there were not enough
people to show a difference if there is one.
You can look to
the CIs to help you with this situation. But first you want
to decide what you would consider to be your minimum requirement
for a clinically significant outcome (difference between outcomes
in the intervention and comparison groups). This is a judgment
call.
Let’s assume
we are looking at a study, the primary outcome for which is
absolute reduction in mortality. One might reasonably conclude
that an outcome of 1 percent or more is, indeed, a clinically
meaningful benefit.
[Below is a text
explanation. Pictures tell this best, however. Click here to
view a PDF of what this looks like graphically. Note
that the PDF starts out first with how to determine clinical
significance of statistically significant outcomes and then
demonstrates how to determine conclusiveness of non-significant
findings.]
Example:
Clinical Significance Goal
>=1% absolute reduction in mortality
For Non-Significant
Findings:
Example
1
- ARR = 2%;
95% CI (-1% to 5%)
- The upper
boundary tells you it is possible that the true result WOULD
meet your requirements for clinical significance –
thus, from that perspective this trial is inconclusive about
NO DIFFERENCE BETWEEN GROUPS - you do not know if the trial
was insufficiently powered (false negative due to insufficient
number of people to show a statistically significant difference
if there is one)
Example
2
- ARR = 0%;
95% CI (-.5 to .5%)
- The upper
boundary does not reach your goal – therefore, this
can be considered sufficient evidence that there is no difference
between the groups that you would consider clinically significant
How to
Use Confidence Intervals to Determine Conclusiveness of Non-significant
Findings
Again, you can also use confidence intervals to determine whether
a result from a valid study is of meaningful clinical benefit.
Requirements
for Meaningful Clinical Benefit
Remember that outcomes of clinical significance are those which
benefit patients in some way in the areas of morbidity, mortality,
symptom relief, physical or emotional functioning or health-related
quality of life. Intermediate markers are assumed to benefit
patients in these areas, but they may not - thus, a direct causal
chain of benefit must be proved to avoid waste and potential
patient harms occurring as unintended consequences. Meaningful
clinical benefit is a combination of benefits in a clinically
significant area along with the size of the results.
As with evaluating
the conclusiveness of a non-significant finding, you apply judgment
to set your minimum requirement for meaningful clinical significance.
Using the same example of your choosing 1 percent absolute reduction
in mortality as meaningful clinical benefit:
Example:
Clinical Significance Goal
>=1% absolute reduction in mortality
For Statistically
Significant Findings:
Example
1
- ARR = 2%;
95% CI (.5% to 3.5%)
- The lower
boundary tells you it is possible that the true result will
NOT meet your requirements for clinical significance –
thus, from that perspective this trial is inconclusive
Example
2
- ARR = 2%;
95% CI (1 to 3%)
- The lower
boundary reaches your goals for clinical significance –
therefore, this can be considered sufficient evidence of
benefit
Again, pictures
probably tell this best. Click here to view the PDF.
The Authors
Did Not Report CIs?
If you can create a 2 x 2 table from the study data, you can
compute them yourself? Look for an online calculator. Many are available and easy to use.
Evaluate
Definitions for Outcomes
And remember, ensure you agree with the authors’ definitions
of the outcomes, especially if they are using a term like “improved,”
“success,” or “failure” – is a
three-point change on a 200 point scale really a meaningful
clinical difference that should define success? You get to be
the judge.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence
11/20/2013
A recent paper by McCormack reminds us that authors may mislead readers by making unwarranted "all-or-none" statements and that readers should be mindful of this and carefully examine confidence intervals.
When examining results of a valid study, confidence intervals (CIs) provide much more information than p-values. The results are statistically significant if a confidence interval does not touch the line of no difference (zero in the case of measures of outcomes expressed as percentages such as absolute risk reduction and relative risk reduction and 1 in the case of ratios such as relative risk and odds ratios). However, in addition to providing information about statistical significance, confidence intervals also provide a plausible range for possibly true results within a margin of chance (5 percent in the case of a 95% CI). While the actual calculated outcome (i.e., the point estimate) is “the most likely to be true” result within the confidence interval, having this range enables readers to judge, in their opinion, if statistically significant results are clinically meaningful.
However, as McCormack points out, authors frequently do not provide useful interpretation of the confidence intervals, and authors at times report different conclusions from similar data. McCormack presents several cases that illustrate this problem, and this paper is worth reading.
As an illustration, assume two hypothetical studies report very similar results. In the first study of drug A versus drug B, the relative risk for mortality was 0.9, 95% CI (0.80 to 1.05). The authors might state that there was no difference in mortality between the two drugs because the difference is not statistically significant. However, the upper confidence interval is close to the line of no difference and so the confidence interval tells us that it is possible that a difference would have been found if more people were studied, so that statement is misleading. A better statement for the first study would include the confidence intervals and a neutral interpretation of what the results for mortality might mean. Example—
“The relative risk for overall mortality with drug A compared to placebo was 0.9, 95% CI (0.80 to 1.05). The confidence intervals tell us that Drug A may reduce mortality by up to a relative 20% (i.e., the relative risk reduction), but may increase mortality, compared to Drug B, by approximately 5%.”
In a second study with similar populations and interventions, the relative risk for mortality might be 0.93, 95% CI (0.83 to 0.99). In this case, some authors might state, “Drug A reduces mortality.” A better statement for this second hypothetical study would ensure that the reader knows that the upper confidence interval is close to the line of no difference and, therefore, is close to non-significance. Example—
“Although the mortality difference is statistically significant, the confidence interval indicates that the relative risk reduction may be as great as 17% but may be as small as 1%.”
The Bottom Line
- Remember that p-values refer only to statistical significance and confidence intervals are needed to evaluate clinical significance.
- Watch out for statements containing the words “no difference” in the reporting of study results. A finding of no statistically significant difference may be a product of too few people studied (or insufficient time).
- Watch out for statements implying meaningful differences between groups when one of the confidence intervals approaches the line of no difference.
- None of this means anything unless the study is valid. Remember that bias tends to favor the intervention under study.
If authors do not provide you with confidence intervals, you may be able to compute them yourself, if they have supplied you with sufficient data, using an online confidence interval calculator. For our favorites, search "confidence intervals" at our web links page: http://www.delfini.org/delfiniWebSources.htm
Reference
McCormack J, Vandermeer B, Allan GM. How confidence intervals become confusion intervals. BMC Med Res Methodol. 2013 Oct 31;13(1):134. [Epub ahead of print] PubMed PMID: 24172248.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Confidence Intervals: Overlapping Confidence Intervals—A Clarification
11/28/2011
Confidence intervals are useful in studies that compare the difference in outcomes between two interventions, because they provide a range of values (representing the estimate of effect) within which the true difference between the two interventions is likely to be found—assuming that the study is valid.
However, a common error is to draw conclusions based on overlapping 95% confidence intervals when the results in the two groups are compared. The error is to conclude that the means of two different groups are not statistically significantly different from each other. The error frequently occurs when the investigators in such cases do not calculate the confidence intervals for the difference between the groups. For example, two groups of patients with diabetes received two different drug regimens and hemoglobin A1c measurements were assessed. Results are presented in the table below.
Table 1. Example of Overlapping 95% CIs With Statistical Differences
Group |
Hemoglobin A1c with 95% CIs |
P-Value for Difference in Meansa |
#1 receiving drug A |
7.4, 95% CI (7 to 7.8) |
P=0.0376 |
#2 receiving drug B |
8.0, 95% CI (7.6 to 8.4) |
a: For a detailed mathematical explanation about the problems of variability that occur when comparing two means and details about calculating the P-value see Austin et al. [2]
As pointed out by Altman, “In comparative studies, confidence intervals should be reported for the differences between groups, not for the results of each group separately.”[1]
In theory, two treatment groups can have a statistically significant difference in mean effects at the 5% level of significance, with an overlap of as much as 29% between the corresponding 95% CIs. [2,3,4] Calculations illustrating 6 cases of statistically significant differences in groups with overlapping 95% CIs are shown in Table 2.
Table 2. Percent of Overlapping of 95% CIs and P-Values For Differences Between Groups [2]
Percent Overlap |
0% |
5% |
10% |
15% |
20% |
25% |
P-Value |
.0056 |
.0085 |
.0126 |
.0185 |
.0266 |
.0376 |
References
1. Altman DG. Statistics and ethics in medical research. In: Statistics in practice. London: British Medical Association; 1982. Chapter VI.
2. Austin P, Hux J. A brief note on overlapping confidence intervals. Journal of Vascular Surgery. 2002; 36, 1, 194-195.
3. Payton ME, Greenstone MH, Schenker N. Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance? Journal of Insect Science. 2003; 3, 34.
4. Odueyungbo A, Thabane L, Markle-Reid M. Tips on overlapping confidence intervals and univariate linear models. Nurse Res. 2009;16(4):73-83. Review. PubMed PMID: 19653548.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Primary and Secondary Outcomes: Significance Issues
08/08/2011
I am a fan of statistician, Steve Simon. You can sign up for his newsletter here: http://www.pmean.com/. I recently wrote him to ask his opinion about significant secondary outcomes when the primary is not statistically significant. Here's the essence of my letter to him and his response follows. At the end of the day, I think it ends up being like many critical appraisal conundrums, "It depends."
From Sheri Strite to Steve Simon: Excerpts
Assume that my examples represent studies done otherwise “perfectly” to be at low risk of bias with biological plausibility. Let us say for reasons having to do with science fiction (and trying to keep my question completely statistical and logical) that these will be the only studies ever done on their topic using these classes of agents, and I need an option for patient care. So the answer for me can’t be, “Wait for confirmatory studies.”
I have heard off and on that, if a primary outcome is not statistically significant*, you should just discount any statistically significant secondary outcomes. I have never been able to find or to conceptualize why this should be so. I found the following written by you.
*[Important Note: Historically, if P<0.05 it has been the convention to say that the results were “ statistically significant”, i.e., statistical testing strongly argues against the null hypothesis (null hypothesis means that there truly is no difference between study samples). If P>0.05 the convention has been to say the results were “non-significant.” It is now preferred to state the exact P-value and avoid categorizing results as statistically significant or non-significant. However, use of the older conventions persists and some of the explanations below make use of the older terms since readers are certain to encounter results reported as “significant” and “non-significant.”]
“Designating primary outcome variables
When you need to examine many different outcome measures in a single research study, you still may be able to keep a narrow focus by specifying a small number of your outcome measures as primary variables. Typically, a researcher might specify 3-5 variables as primary. The fewer primary outcome variables, the better. You would then label as secondary those variables not identified as primary outcome variables.
“When you designate a small number of primary variables, you are making an implicit decision. The success or failure of your intervention will be judged almost entirely by the primary variables. If you find that none of the primary variables are statistically significant, then you will conclude that the intervention was not successful. You would still discuss any significant findings among your secondary outcome variables, but these findings would be considered tentative and would require replication.”
But I am not getting the why. And is this necessarily so? Read on. I’d be grateful if I could give you a couple of short scenarios.
Please keep in mind that my goals are as a reviewer of evidence (generally on efficacy or safety of therapies) and not as a researcher, so a helpful answer to me would be more in the nature of what I can use, if clinically meaningful, and not how I might redesign my study to make more sense. I am working with what’s out there and not creating new knowledge.
Background
Probably 99 percent of the time, the studies I review have a single primary outcome. The other 1 percent has 2. So I never see 3 to 5. But then I always see a multiplicity of outcomes defined as secondary, all of which seems somewhat arbitrary to me.
Scenario
I read a study comparing Drug X to placebo for prevention of cardiovascular events in type 1 diabetics.
The primary outcome is overall mortality.
Let us say that the researchers chose 4 secondary outcomes:
— death from stroke
—stroke
—death from MI
—MI
Let us assume that Drug X really works. Let us say that we have non-significant findings for overall mortality, which I realize could be a simple case of lack of power.
Let’s say that stroke and MI were statistically significant, favoring Drug X over placebo. Is it really true that you believe I should consider these findings tentative? I find it hard to think why that should be. They are related and the lack of significant mortality outcome could again be a power issue.
If I am correct that I can consider these clinically useful findings provided the effect size meets my minimum, then what about a scenario in which a researcher chose a really unlikely primary outcome (even a goofy one), but reasonable secondary outcomes? Setting aside the fact that such a choice would give me pause about the rigor of the study—setting this aside just to focus on statistical logic—what if in an otherwise valid study—
Drug Y versus Placebo
Clinical Question: Is Drug Y effective in weight reduction over placebo in women between the ages of 20 through 30?
Primary outcome:
—Death, not statistically significant
Secondary outcomes:
—Weight loss of > 10 pounds, statistically significant and clinically meaningful
—Clinically meaningful change in BMI, statistically significant and clinically meaningful
It seems to me that secondary outcomes should be able to be used with as much confidence as primary outcomes given certain factors such as attention to chance effects, relatedness of several outcomes, etc.
If I am wrong about this can you enlighten me or steer me to some helpful resources.
Most gratefully yours, Sheri
And here is Steve's response:
http://www.pmean.com/news/201105.html#2
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Progression Free Survival (PFS) in Oncology Trials
02/01/2015
Progression Free Survival (PFS) continues to be a frequently used endpoint in oncology trials. It is the time from randomization to the first of either objectively measured tumor progression or death from any cause. It is a surrogate outcome because it does not directly assess mortality, morbidity, quality of life, symptom relief or functioning. Even if a valid trial reports a statistically significant improvement in PFS and the reported effect size is large, PFS only provides information about biologic activity of the cancer and tumor burden or tumor response. Even though correlational analysis has shown associations between PFS and overall survival (OS) in some cancers, we believe that extreme caution should be exercised when drawing conclusions about efficacy of a new drug. In other words, PFS evidence alone is insufficient to establish a clinically meaningful benefit for patients or even a reasonable likelihood of net benefit. Many tumors do present a significant clinical burden for patients; however, clinicians frequently mistakenly believe that simply having a reduction in tumor burden equates with clinical benefit and that delaying the growth of a cancer is a clear benefit to patients.
PFS has a number of limitations which increases the risk of biased results and is difficult for readers to interpret. Unlike OS, PFS does not "identify" the time of progression since assessment occurs at scheduled visits and is likely to overestimate time to progression. Also, it is common to stop or add anti-cancer therapies in PFS studies (also a common problem in trials of OS) prior to documentation of tumor progression which may confound outcomes. Further, measurement errors may occur because of complex issues in tumor assessment. Adequate blinding is required to reduce the risk of performance and assessment bias. Other methodological issues include complex calculations to adjust for missed assessments and the need for complete data on adverse events.
Attrition and assessment bias are made even more difficult to assess in oncology trials using time-to-event methodologies. The intention-to-treat principle requires that all randomly assigned patients be observed until they experience the end point or the study ends. Optimal follow-up in PFS trials is to follow each subject to both progression and death.
Delfini Comment
FDA approval based on PFS may result in acceptance of new therapies with greater harms than benefits. The limitations listed above, along with a concern that investigators may be less willing to conduct trials with OS as an endpoint once a drug has been approved, suggest that we should use great caution when considering evidence from studies using PFS as the primary endpoint. We believe that PFS should be thought of as any other surrogate marker—i.e., it represents extremely weak evidence (even in studies judged to be at low risk of bias) unless it is supported by acceptable evidence of improvements in quality of life and overall survival.
When assessing the quality of a trial using PFS, we suggest the following:
- Remember that although in some cases PFS appears to be predictive of OS, in many cases it is not.
- In many cases, improved PFS is accompanied by unacceptable toxicity and unacceptable changes in quality of life.
- Improved PFS results of several months may be due to methodological flaws in the study.
- As with any clinical trial, assess the trial reporting PFS for bias such as selection, performance, attrition and assessment bias.
- Compare characteristics of losses (e.g., due to withdrawing consent, adverse events, loss to follow-up, protocol violations) between groups and, if possible, between completers and those initially randomized.
- Pay special attention to censoring due to loss-to-follow-up. Administrative censoring (censoring of subjects who enter a study late and do not experience an event) may not result in significant bias, but non-administrative censoring (censoring because of loss-to-follow-up or discontinuing) is more likely to pose a threat to validity.
References
Carroll KJ. Analysis of progression-free survival in oncology trials: some common statistical issues. Pharm Stat. 2007 Apr-Jun;6(2):99-113. Review. PubMed PMID: 17243095.
D'Agostino RB Sr. Changing end points in breast-cancer drug approval—the Avastin story. N Engl J Med. 2011 Jul 14;365(2):e2. doi: 10.1056/NEJMp1106984. Epub 2011 Jun 27. PubMed PMID: 21707384.
Fleming TR, Rothmann MD, Lu HL. Issues in using progression-free survival when evaluating oncology products. J Clin Oncol. 2009 Jun 10;27(17):2874-80. doi: 10.1200/JCO.2008.20.4107. Epub 2009 May 4. PubMed PMID: 19414672
Lachin JM. (John M. Lachin, Sc.D., Professor of Biostatistics and Epidemiology, and of Statistics, The George Washington University personal communication)
Lachin JM. Statistical considerations in the intent-to-treat principle. Control Clin Trials. 2000 Jun;21(3):167-89. Erratum in: Control Clin Trials 2000 Oct;21(5):526. PubMed PMID: 10822117.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Adjusting for Multiple Comparisons
05/22/2012
Frequently studies report results that are not the primary or secondary outcome measures—sometimes because the finding is not anticipated, is unusual or judged to be important by the authors. How should these findings be assessed? A common belief is that if outcomes are not pre-specified, serious attention to them is not warranted. But is this the case? Kenneth J. Rothman in 1990 wrote an article that we feel is very helpful in such situations.[1]
- Rothman points out that making statistical adjustments for multiple comparisons is similar to the problem of statistical significance testing where the investigator uses the P-value to estimate the probability of a study demonstrating an effect size as great or greater than the one found in the study, given that the null hypothesis is true—i.e., that there is truly no difference between the groups being studied (with alpha as the arbitrary cutoff for clinical significance which is frequently set at 5%). Obviously if the risk for rejecting a truly null hypothesis is 5% for every hypothesis examined, then examining multiple hypotheses will generate a larger number of falsely positive statistically significant findings because of the increasing number of hypotheses examined.
- Adjusting for multiple comparisons is thought by many to be desirable because it will result in a smaller probability of erroneously rejecting the null hypothesis. Rothman argues this “pay for peeking” at more data by adjusting P-values with multiple comparisons is unnecessary and can be misleading. Adjusting for multiple comparisons might be paying a penalty for simply appropriately doing more comparisons, and there is no logical reason (or good evidence) for doing statistical adjusting. Rather, the burden is on those who advocate for multiple comparison adjustments to show there is a problem requiring a statistical fix.
- Rothman’s conclusion: It is reasonable to consider each association on its own for the information it conveys—he believes that there is no need for adjusting P-values with multiple comparisons.
Delfini Comment: Reading his paper is a bit difficult, but he make some good points about our not really understanding what chance is all about and that evaluating study outcomes for validity requires critical appraisal for the assessment of bias and other factors as well as the use of statistics for evaluating chance effects.
Reference
Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990 Jan;1(1):43-6. PubMed PMID: 2081237.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Getting
“Had” by P-values: Confidence Intervals vs P-values
in Evaluating Safety Results: Low-molecular-weight Heparin (LMWH)
Example
In one of our DelfiniClicks we have pointed out that confidence intervals (CIs) can be very
useful when examining results of randomized controlled trials
(Confidence-Intervals, Power
& Meaningful Clinical Benefit). The first step
in examining safety results is to decide what you consider to
be a range for clinically significant outcomes (i.e., the difference
between outcomes in the intervention and comparison group).
This is a judgment call. Then examine the 95% CI to see if a
clinically significant difference is included in the confidence
interval. If it is, the study has not excluded the possibility
of a clinically significant harm even if the authors state there
is no difference (usually stated as “no difference”
based on a non-significant p-value.) It is important to remember
that a non-significant p-value can be very misleading in this
situation.
This can be illustrated
by an interesting conversation we recently had with an orthopedic
surgeon who felt he couldn’t trust the medical literature
to guide him because it gave him “misleading information.”
He based his conclusion on a study he read (he wasn’t
sure which study it was) regarding bleeding in orthopedic surgery.
After talking with him, we searched for studies that may have
led to his conclusion and found the following study which illustrates
why CIs are preferable to p-values in evaluating safety results
and possibly why he was misled.
Case
Study: An orthopedic surgeon reads an article comparing
outcomes, including bleeding rates, between fondaparinux and
enoxaparin in orthopedic surgery and sees the following statement
by the authors in the Abstract section of
the paper: “The two groups did not differ in frequency
of death or clinically relevant bleeding.” [1]
He looks at the Results section of the paper and reads the
following: “The number of patients who had major bleeding
did not differ between groups (p=0.11).” He knows that
if the p-value is greater than 0.05, the differences are not
considered statistically significant, and he concludes that
there is no difference in bleeding between the groups. His
confidence is shaken when he switches to fondaparinux and
his patients experience increased postoperative bleeding.
Let’s evaluate
this study’s bleeding rates using confidence intervals.
One might reasonably conclude that an outcome of 1 percent or
more difference between the groups is, indeed, a clinically
meaningful difference in bleeding:
- The actual rates
for major bleeding were 47/ 1140 (4.1%) in the fondaparinux
group vs 32/ 1133 (2.8%) in the enoxaparin group, up to day
11, a difference of 1.3%, p=0.11.
- But CIs provide
more information: The absolute risk increase with fondaparinux
(ARI) was 1.3%, but the 95% CI was (0.3, 2.9) and since the
true difference could be as great as 2.9% (i.e., clinically
relevant) the authors’ conclusions are misleading.
The Cochrane Handbook
summarizes this problem nicely:
"A common
mistake when there is inconclusive evidence is to confuse
‘no evidence of an effect’ with ‘evidence
of no effect.’ When there is inconclusive evidence,
it is wrong to claim that it shows that an intervention has
‘no effect’ or is ‘no different’ from
the control intervention. It is safer to report the data,
with a confidence interval, as being compatible with either
a reduction or an increase in the outcome. When there is a
‘positive’ but statistically non-significant trend,
authors commonly describe this as ‘promising,’
whereas a ‘negative’ effect of the same magnitude
is not commonly described as a ‘warning sign.’
Authors should be careful not to do this." [2]
Comments:
Following the Lassen study referenced above, others confirmed
the increased bleeding rate leading to re-operation and other
significant bleeding with fondaparinux vs enoxaprin. [3]
Click here for our
primer on confidence intervals.
When investigators
provide p-values but not confidence intervals, readers can quickly
calculate the 95% CIs if the outcomes are dichotomous and the
investigators report the actual rates of events, as in the example
above, by using the calculator available at:
http://www.graphpad.com/quickcalcs/NNT1.cfm
Also, see our web
links for other sources (search “confidence intervals”):
http://www.delfini.org/delfiniWebSources.htm
References:
- Lassen MR, Bauer
KA, Eriksson BI, Turpie AG. Postoperative fondaparinux versus
preoperative enoxaparin for prevention of venous thromboembolism
in elective hip replacement surgery: a randomised double-blind
comparison. Lancet. 2002;359:1715- 20. [PMID: 12049858]
- Higgins JPT,
Green S, editors. 9.7 Common errors in reaching conclusions.
Cochrane Handbook for Systematic Reviews of Interventions
4.2.6 [updated September 2006]. http://www.cochrane.org/resources/handbook/hbook.htm
(accessed 22nd January 2008).
- Vormfelde SV.
Comment on: Lancet. 2002 May 18;359(9319):1710-1. Lancet.
2002 Nov 23;360(9346):1701. PMID 12457831.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
A Cautionary Tale of Harms versus
Benefits: Misleading Findings Due to Potentially Inadequate
Data Capture — Approtinin Example
05/22/08
Assessing safety
of interventions is frequently challenging for many reasons,
and it is made even more so when data is missing. It is easy
to draw conclusions about the clinical usefulness of new interventions
from studies that have only limited outcome measures without
noticing what is missing.
Aprotinin is a
recent example of a drug which was approved by the FDA on the
basis of reduced bleeding in coronary artery bypass graft (CABG)
surgery and which was quickly adopted by surgeons, but with
what now appears to be outcomes of greater harms than benefits.
There now appears to be increased mortality in patients receiving
aprotinin even though there is a decreased need for blood transfusions.
Aprotonin received
FDA approval in 1993 for use in CABG surgery to decrease blood
loss. However, observational studies in 2006 and 2007 reported
increased mortality with aprotinin [1,2], A 2007 Cochrane Review
[3] of 211 RCTs reported that patients receiving aprotinin were
less likely to have red blood cell transfusions than were those
receiving lysine analogues, tranexamic acid (TXA), and epsilon
aminocaproic acid (EACA). When the pooled estimates from the
head-to-head trials of the two lysine analogues were combined
and compared to aprotinin alone, aprotinin appeared superior
in reducing the need for red blood cell transfusions: RR 0.83
(95% CI 0.69 to 0.99). The Cochrane review concluded that aprotinin
may be superior to the lysine analogues TXA and EACA in reducing
blood loss and the need for transfusion of red cells in patients
undergoing cardiac surgery. The Cochrane review, however, was
limited by inclusion of what appear to be studies with limited
or no mortality reporting.
In contrast, in
May 2008, the Blood Conservation Using Antifibrinolytics in
a Randomized Trial (BART) study [4] which compared massive postoperative
bleeding rates in patients treated with aprotinin versus those
treated with the lysine analogues tranexamic acid and aminocaproic
acid in patients undergoing high-risk cardiac surgery, reported
decreased massive bleeding, but increased mortality in patients
receiving aprotinin. The trial was terminated early because
of a higher rate of death at 30 days in patients receiving aprotinin.
- 74 patients
(9.5%) in the aprotinin group had massive bleeding, as compared
with 93 (12.1%) in the tranexamic acid group and 94 (12.1%)
in the aminocaproic acid group (relative risk in the aprotinin
group for both comparisons, 0.79; 95% confidence interval
[CI], 0.59 to 1.05).
- At 30 days,
the rate of death from any cause was 6.0% in the aprotinin
group, as compared with 3.9% in the tranexamic acid group
(relative risk, 1.55; 95% CI, 0.99 to 2.42) and 4.0% in the
aminocaproic acid group (relative risk, 1.52; 95% CI, 0.98
to 2.36).
- The relative
risk of death in the aprotinin group, as compared with that
in both groups receiving lysine analogues, was 1.53 (95% CI,
1.06 to 2.22).
The authors concluded
that —
In summary, despite
the possibility of a modest reduction in the risk of massive
bleeding, the strong and consistent negative mortality trend
associated with aprotinin as compared with lysine analogues
precludes its use in patients undergoing high-risk cardiac
surgery.
Delfini Comments
Given a potential
relative risk of roughly as high as 2 (meaning that those receiving
aprotinin may have as high as a roughly 2 times greater likelihood
of death than those receiving lysine analogues), it is likely
that aprotinin will no longer be used in high-risk and perhaps
all cardiac surgery based on the BART study because of what
appears to be increased mortality with aprotinin not seen with
the lysine analogues.
And so what possibly
explains this conflict in findings? While it is possible that
the results in the BART study are due to chance, that seems
unlikely given a) the previously observed findings in the 2006
and 2007 observational studies, and b) the consistency of results
in comparing aprotinin against each agent.
- The Cochrane
review of 113 studies, many of low quality, failed to detect
the increased mortality with aprotinin. It is not clear why
the systematic review did not detect the increased mortality
trend, but it may be explained by the Cochrane group’s
inclusion of studies not evaluating or incompletely reporting
mortality data.
- A lesson
from this is that pooling of data in secondary studies
may fail to identify important safety issues if the studies
are small or if outcomes are infrequent or insufficiently
reported.
- The aprotinin
story appears to be an example of how a large, well-designed
and conducted RCT paying close attention to adverse events,
identified a meaningful increase in mortality that a meta-analysis
of many small RCTs of variable quality did not detect. Small,
low-quality RCTs and meta-analyses of small, low-quality RCTs
may distort results because of various deficiencies and biases,
including absence of safety findings due to small sample size
or incomplete reporting of outcomes.
And so what can
a diligent reader do? Our advice is carefully consider whether
primary and secondary outcomes in clinical trials are sufficient
in terms of providing evidence regarding benefits and risks.
If outcome measures are few and are all from small studies or
meta-analyses of small studies, it is possible that clinically
important harms will not be detected. Uncertainty is reduced
when large RCTs confirming results of earlier, smaller studies
become available--or as in the case of aprotinin—when
a large RCT identified meaningful adverse events.
References
1. Mangano DT,
Tudor IC, Dietzel C. The risk associated with aprotinin in cardiac
surgery. N Engl J Med 2006;354:353-65.
2. Mangano DT,
Miao Y, Vuylsteke A, et al. Mortality associated with aprotinin
during 5 years following
coronary artery bypass graft surgery. JAMA 2007;297:471-9.
3. Henry DA, Carless
PA, Moxey AJ, et al. Anti-fibrinolytic use for minimising perioperative
allogeneic blood transfusion. Cochrane Database Syst Rev 2007;4:CD001886.
4. Fergusson DA,
Hébert PC, Mazer CD, et al. A comparison of aprotinin
and lysine analogues in high-risk cardiac surgery. N Engl J
Med 2008;358:2319-31.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?
03/20/2013
For those of you who struggle with the fundamental difference between absolute risk reduction (ARR) versus relative risk reduction (RRR) and their counterparts, absolute and relative risk increase (ARI/RRI), we have always explained that only knowing the RRR or the RRI without other quantitative information about the frequency of events is akin to knowing that a store is having a half-off sale—but when you walk in, you find that they aren't posting the actual price! And so your question is 50 percent off of what???
You should have the same question greet you whenever you are provided with a relative measure (and if you aren't told whether the measure is relative or absolute, you may be safer off assuming that it is relative). Below is a link to a great short cartoon that turns the lens a little differently and which might help.
However, we will add that, in our opinion, ARR alone isn't fully informative either, nor is its kin, the number-needed-to-treat or NNT, and for ARI, the number-needed-to-harm or NNH. A 5 percent reduction in risk may be perceived very differently when "10 people out of a hundred benefit with one intervention compared to 5 with placebo" as compared to a different scenario in which "95 people out of a hundred benefit with one intervention as compared to 90 with placebo." As a patient, I might be less likely to want to expose myself to side effects if it is highly likely I am going to improve without treatment, for example. Providing this full information--for critically appraised studies that are deemed to be valid--of course, may best provide patients with information that helps them make choices based on their own needs and requirements including their values and preferences.
We think that anyone involved in health care decision-making—including the patient—is best helped by knowing the event rates for each of the groups studied—i.e., the numerators and denominators for the outcome of interest by group which comprise the 4 numbers that make up the 2 by 2 table which is used to calculate many statistics.
Isn't it great when learning can be fun too! Enjoy!
http://www.ibtimes.com/articles/347476/20120531/relative-risk-absolute-comic-health-medical-reporting.htm
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Obtaining Absolute Risk Reduction (ARR) and Number Needed To Treat (NNT) From Relative Risk (RR) and Odds Ratios (OR) Reported in Systematic Reviews
07/02/2012
Background
Estimates of effect in meta-analyses can be expressed as either relative effects or absolute effects. Relative risks (aka risk ratios) and odds ratios are relative measures. Absolute risk reduction (aka risk difference) and number-needed-to-treat are absolute measures. When reviewing meta-analyses, readers will almost always see results (usually mean differences between groups) presented as relative risks or odds ratios. The reason for this is that relative risks are considered to be the most consistent statistic for study results combined from multiple studies. Meta-analysts usually avoid performing meta-analyses using absolute differences for this reason.
Fortunately we are now seeing more meta-analyses reporting both the relative risks along with ARR and NNT. The key point is that meta-analyses almost always use relative effect measures (relative risk or odds ratio) and then (hopefully) re-express the results using absolute effect measures (ARR or NNT).
You may see the term, "assumed control group risk" or "assumed control risk" (ACR). This frequently refers to risk in a control group or subgroup of patients in a meta-analysis, but could also refer to risk in any group (i.e., patients not receiving the study intervention) being compared to an intervention group.
The Cochrane Handbook now recommends that meta-analysts provide a summary table for the main outcome and that the table include the following items—
- The topic, population, intervention and comparison
- The assumed risk and corresponding risk (i.e., those receiving the intervention)
- Relative effect statistic (RR or OR)
When RR is provided, ARR can easily be calculated. Odds ratios deal with odds and not probabilities and, therefore, cannot be converted to ARR with accuracy because odds cannot account for a number within a population—only how many with, for example, as compared to how many without. For more on "odds," see—http://www.delfini.org/page_Glossary.htm#odds
Example 1: Antihypertensive drug therapy compared to control in elderly (60 years or older) for hypertension in the elderly
Reference: Musini VM, Tejani AM, Bassett K, Wright JM. Pharmacotherapy for hypertension in the elderly. Cochrane Database Syst Rev. 2009 Oct 7;(4):CD000028. Review. PubMed PMID: 19821263.
- Computing ARR and NNT from Relative Risk
When RR is reported in a meta-analysis, determine (this is a judgment) the assumed control risk (ACR)—i.e., the risk in the group being compared to the new intervention—from the control event rate or other data/source
- Formula: ARR=100 X ACR X (1-RR)
Calculating the ARR and NNT from the Musini Meta-analysis
- In the above meta-analysis of 12 RCTs in elderly patients with moderate hypertension, the RR for overall mortality with treatment compared to no treatment over 4.5 years was 0.90.
- The event rate (ACR) in the control group was 116 per 1000 or 0.116
- ARR=100 X .116 X 0.01=1.16%
- NNT=100/1.16=87
- Interpretation: The relative risk with treatment compared to usual care is 90% of the control group (in this case the group of elderly patients not receiving treatment for hypertension) which translates into 1 to 2 fewer deaths per 100 treated patients over 4.5 years with treatment. In other words you would need to treat 87 elderly hypertensive people at moderate risk with antihypertensives for 4.5 years to prevent one death.
Computing ARR and NNT from Odds Ratios
In some older meta-analyses you may not be given the assumed (ACR) risk.
Example 2: Oncology Agent
Assume a meta-analysis on an oncology agent reports an estimate of effect (mortality) as an OR of 0.8 over 3 years for a new drug. In order to do the calculation, an ACR is required. Hopefully this information will be provided in the study. If not, the reader will have to obtain the assumed control group risk (ACR) from other studies or another source. Let’s assume that the control risk in this example is 0.3.
Formula for converting OR to ARR: ARR=100 X (ACR-OR X ACR) / (1-ACR+OR X ACR)
- ARR=100 X (0.3-0.8 X 0.3) / (1-0.3 + 0.8 X 0.3)
- In this example:
- ARR =100 X (0.3-0.24) / (0.7 + 0.28)
- ARR = 0.06/0.98
- ARR = 0.061 or 6.1%
- Thus the ARR is 6.1% over 3 years.
- The NNT to benefit one patient over 3 years is 100/6.1 (rounded) is 17
Because of the limitations of odds ratios, as described above, it should be noted that when outcomes occur commonly (e.g., >5%), odds ratios may then overestimate the effect of a treatment.
For more information see The Cochrane Handbook, Part 2, Chapter 12.5.4 available at http://www.cochrane-handbook.org/
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Early
Discontinuation of Clinical Trials: Oncology Medication Studies—Recent
Developments and Concern
04/28/08
With the trend
for more rapid approval of oncology drugs has come concern regarding
the validity of reported results because of methodological problems.
Validity and usefulness of reported results from oncology (and
other) studies are clearly threatened by lack of randomization,
blinding, the use of surrogate outcomes and other methodological
problems. Trotta et al. have extended this concern in a recent
study that highlights an additional problem with oncology studies—stopping
ocncology trials early [1. Trotta F, Apolone G, Garattini S,
Tafuri G. Stopping a trial early in oncology: for patients or
for industry? Ann Oncol. 2008 Apr 9 [Epub ahead of print] PMID:
18304961].The aim of the study was to assess the use of interim
analyses in randomized controlled trials (RCTs) testing new
anticancer drugs, focusing on oncological clinical trials stopped
early for benefit. A second aim was to estimate how often trials
prematurely stopped as a result of an interim analysis are used
for registration i.e., approval by European Medicines Agency
(EMEA), the European equivalent of FDA approval. The authors
searched Medline along with hand-searches of The Lancet, The
New England Journal of Medicine, and The Journal of Clinical
Oncology and evaluated all published clinical trials stopped
early for benefit and published in the last 11 years. The focus
was on anticancer drugs that contained an interim analysis.
Results
and Authors’ Conclusions
Twenty-five RCTs were analyzed. In 95% of studies, at the interim
analysis, efficacy was evaluated using the same end point as
planned for the final analysis. The authors’ found a consistent
increase (>50%) in prematurely stopped trials in oncology
during the last 3 years. As a consequence of early stopping
after the interim analysis, approximately 3,300 patients/events
across all studies were spared potential harms of continued
therapy. This may appear to be clearly beneficial, but as the
authors point out, stopping a trial early does not guarantee
that other patients will receive the apparent benefit of stopping,
assuming one exists, unless study findings are immediately publicly
disseminated. The authors found long delays between study termination
and published reports (approximately 2 years). If the trials
had continued for these further 2 years, more efficacy and safety
data could have been gathered. Delays in reporting results further
lengthen the time needed for translating trial findings into
practice.
Surprisingly, there
was a very small percentage of trials (approximately 4%) stopped
early because of harms, i.e. serious adverse events. Therefore,
toxicity does not represent the main factor leading to early
termination of trials. Of the 25 trials, six had no data and
safety monitoring board (DSMB) and five had enrolled less than
40% of the planned sample size. Even so, 11 were used to support
licensing applications on the basis of what could have been
exaggerated chance events. Thus, more than 78% of the oncology
RCTs published in the last 3 years were used for registration
purposes. The authors argue that only untruncated trials can
provide a full level of evidence which might be useful for informing
clinical practice decisions without further confirmative trials.
They concluded that early termination may be done for ethical
reasons such as minimizing the number of people given an unsafe,
ineffective, or clearly inferior treatment. However, interim
analyses may also have drawbacks, since stopping trials early
for apparent benefit will systematically overestimate treatment
effects [2. Pocock SJ. When (not) to stop a clinical trial for
benefit. JAMA 2005; 294: 2228–2230. PMID: 16264167] and
raises new concerns about what they describe as “market-driven
intent.” Some additional key points made by the authors:
- Repeated interim
analyses at short intervals raise concern about data reliability:
this strategy risks the appearance of seeking the statistical
significance necessary to stop a trial;
- Repeated analyses
on the same data pool often lead to statistically significant
results only by chance;
- If a trial is
evaluating the long-term efficacy of a treatment for conditions
such as cancer, short-term benefits — no matter how
significant statistically — may not justify early stopping.
Data on disease recurrence and progression, drug resistance,
metastasis, or adverse events could easily be missed. Early
stopping may reduce the likelihood of detecting a difference
in overall survival (the only relevant endpoint in this setting).
The authors conclude
that:
…a decision
whether to stop a clinical trial before its completion requires
a complex of ethical, statistical, and practical considerations,
indicating that results of RCTs stopped early for benefit
should be viewed with criticism and need to be further confirmed.
The main effect of such decisions is mainly to move forward
to an earlier-than-ideal point along the drug approval path;
this could jeopardise consumers’ health, leading
to unsafe and ineffective drugs being marketed and prescribed.
Even if well designed, truncated studies should not become
routine. We believe that only untruncated trials can provide
a full level of evidence which can be translated into clinical
practice without further confirmative trials.
Lancet
Comment
In a Lancet editorial on April 19, 2008 the editorialist states
that early stopping of RCTs should require proof beyond reasonable
doubt that equipoise no longer exists. Data safety and monitoring
boards must balance the decision to stop, which favors immediate
stakeholders (participants, investigators, sponsors, manufacturers,
patients’ advocates, and editors), with continuing the
study to obtain more accurate estimates of not only effectiveness,
but also of longer-term safety and that in judging whether or
not to stop a trial early for benefit, the plausibility of the
findings and their clinical significance are as important as
statistical boundaries.
Delfini
Comments
Overall we are concerned about the FDA’s loosening of
standards for accepting oncology study data as valid when it
comes from studies that many would judge to be fatally flawed
and that there is a likelihood these studies will accentuate
clinical advantages because of falsely inflated results.
We are seeing more oncology medications with FDA approval based
on observational studies.
The trend towards early stopping of studies in many instances
represents yet another step towards acceptance of low quality
oncology studies.
- We believe that
—
- Oncologists
may not be aware of the threats to validity in many of
the newest oncology medication studies and develop unwarranted
enthusiasm for unproven, possibly harmful new agents.
- Patients should
receive complete information about the risks of distorted
study results when low quality studies are used to inform
decisions that entail unproven benefits and significant
potential risks.
- We agree that
in most studies, the benefits of longer follow-up with more
accurate assessment of outcomes including more complete assessments
of adverse events will provide a greater likelihood of deriving
valid, useful information for informing clinical decisions.
References
- Trotta F, Apolone
G, Garattini S, Tafuri G. Stopping a trial early in oncology:
for patients or for industry? Ann Oncol. 2008 Apr 9 [Epub
ahead of print] PMID: 18304961.
- Pocock SJ. When
(not) to stop a clinical trial for benefit. JAMA 2005; 294:
2228–2230. PMID: 16264167.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Early Termination of Clinical Trials—2012 Update
07/17/2012
Several years ago we presented the increasing evidence of problems with early termination of clinical trials for benefit after interim analyses.[1] The bottom line is that results are very likely to be distorted because of chance findings. A useful review of this topic has been recently published.[2] Briefly, this review points out that—
- Frequently trials stopped early for benefit report results that are not credible, e.g., in one review, relative risk reductions were over 47% in half, over 70% in a quarter. The apparent overestimates were larger in smaller trials.
- Stopping trials early for apparent benefit is highly likely to systematically overestimate treatment effects.
- Large overestimates were common when the total number of events was less than 200.
- Smaller but important overestimates are likely with 200 to 500 events, and trials with over 500 events are likely to show small overestimates.
- Stopping rules do not appear to ensure protection against distortion of results.
- Despite the fact that stopped trials may report chance findings that overestimate true effect sizes—especially when based on a small number of events—positive results receive significant attention and can bias clinical practice, clinical guidelines and subsequent systematic reviews.
- Trials stopped early reduce opportunities to find potential harms.
The authors provide 3 examples to illustrate the above points where harm is likely to have occurred to patients.
Case 1 is the use of preoperative beta blockers in non-cardiac surgery in 1999 a clinical trial of bisoprolol in patients with vascular disease having non-cardiac surgery with a planned sample size of 266 stopped early after enrolling 112 patients—with 20 events. Two of 59 patients in the bisoprolol group and 18 of 53 in the control group had experienced a composite endpoint event (cardiac death or myocardial infarction). The authors reported a 91% reduction in relative risk for this endpoint, 95% confidence interval (63% to 98%). In 2002, a ACC/AHA clinical practice guideline recommended perioperative use of beta blockers for this population. In 2008, a systematic review and meta-analysis, including over 12,000 patients having non-cardiac surgery, reported a 35% reduction in the odds of non-fatal myocardial infarction, 95% CI (21% to 46%), a twofold increase in non-fatal strokes, odds ratio 2.1, 95% CI (2.7 to 3.68), and a possible increase in all-cause mortality, odds ratio 1.20, 95% CI (0.95 to 1.51). Despite the results of this good quality systematic review, subsequent guidelines published in 2009 and 2012 continue to recommend beta blockers.
Case 2 is the use of Intensive insulin therapy (IIT) in critically ill patients. In 2001, a single center randomized trial of IIT in critically ill patients with raised serum glucose reported a 42% relative risk reduction in mortality, 95% CI (22% to 62%). The authors used a liberal stopping threshold (P=0.01) and took frequent looks at the data, strategies they said were “designed to allow early termination of the study.” Results were rapidly incorporated into guidelines, e.g., American College Endocrinology practice guidelines, with recommendations for an upper limit of glucose of </=8.3 mmol/L. A systematic review published in 2008 summarized the results of subsequent studies which did not confirm lower mortality with IIT and documented an increased risk of hypoglycemia. Later, a good quality SR confirmed these later findings. Nevertheless, some guideline groups continue to advocate limits of </=8.3 mmol/L. Other guidelines utilizing the results of more recent studies, recommend a range of 7.8-10 mmol/L.15.
Case 3 is the use of activated protein C in critically ill patients with sepsis. The original 2001 trial of recombinant human activated protein C (rhAPC) was stopped early after the second interim analysis because of an apparent difference in mortality. In 2004, the Surviving Sepsis Campaign, a global initiative to improve management, recommended use of the drug as part of a “bundle” of interventions in sepsis. A subsequent trial, published in 2005, reinforced previous concerns from studies reporting increased risk of bleeding with rhAPC and raised questions about the apparent mortality reduction in the original study. As of 2007, trials had failed to replicate the favorable results reported in the pivotal Recombinant Human Activated Protein C Worldwide Evaluation in Severe Sepsis (PROWESS) study. Nevertheless, the 2008 iteration of the Surviving Sepsis guidelines and another guideline in 2009 continued to recommend rhAPC. Finally, after further discouraging trial results, Eli Lilly withdrew the drug, activated drotrecogin alfa (Xigris) from the market 2011.
Key points about trials terminated early for benefit:
- Truncated trials are likely to overestimate benefits.
- Results should be confirmed in other studies.
- Maintain a high level of scepticism regarding the findings of trials stopped early for benefit, particularly when those trials are relatively small and replication is limited or absent.
- Stopping rules do not protect against overestimation of benefits.
- Stringent criteria for stopping for benefit would include not stopping before approximately 500 events have accumulated.
References
1. http://www.delfini.org/delfiniClick_PrimaryStudies.htm#truncation
2. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012 Jun 15;344:e3863. doi: 10.1136/bmj.e3863. PMID:22705814.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Advanced
Concepts: Can Useful Information Be Obtained From Studies With
Significant Threats To Validity? A Case Study of Missing Data
Points in Venous Thromboembolism (VTE) Prevention Studies &
A Case Study of How Evidence from One Study Might Support Conclusions
from a Flawed Study
09/02/09
We approach
critical appraisal of the medical literature by applying critical
appraisal concepts coupled with critical thinking. This requires
a movement from the general to the very particular circumstances
before us. Paraphrasing from — is it Voltaire?; along
with one of our favorite medical leaders, Dr. Tim Young —
“It is important to keep perfection from being the enemy
of the good.” Ultimately, the goal of doing critical appraisal
work is not to “pass” or “fail” studies,
but to help predict, for example, the effect of an intervention
on outcomes of interest, based on what we can glean as possible
explanations for the observed outcomes.
Understanding
critical appraisal concepts is necessary to conceive of possible
explanations. Critical appraisal is not and should not be a
mere recording of tick marks on a checklist or earning points
on a quality scale. Despite attempts of various groups to do
so, we maintain that the reliability and usefulness of a study
cannot be “scored” through such a system. Moher
has pointed out a number of shortcomings of “scales”
designed to quantitate the likelihood of freedom from bias in
clinical trials.[1]
It requires
reflective thought to determine why we might see a particular
outcome — and this is wholly dependent upon a variety
of factors including the population, the circumstances of care,
threats to validity in applying critical appraisal concepts
and more. It is also important to keep in mind that failing
a critical appraisal screening does NOT mean something doesn’t
work. Furthermore, it is important to understand that sometimes
— despite “failing” a critical appraisal for
research design, execution or reporting — a study will,
in fact, give us evidence that is reasonable to rely upon. Our
Venous Thromboembolism (VTE) Prevention story is a case in point.
Recently
we assisted Kaiser Permanente Hawaii in developing a standardized,
evidence based VTE prophylaxis guideline for the known high
risk total knee and hip replacement population.
Our key
questions were as follows:
- What
is the evidence that thromboembolism or DVT prophylaxis with
various agents reduces mortality and clinically significant
morbidity in hip and knee replacement surgery?
- What
is the evidence regarding timing (starting and duration) of
anticoagulant prophylaxis for appropriate agents when used
for prevention of thromboembolism in hip and knee replacement
surgery?
- What
is the evidence regarding bleeding from thromboembolism prophylaxis
with the various appropriate agents?
There are
several interesting lessons this project taught us about applying
general critical appraisal concepts to individual trials and
keeping one’s eye focused on the true goal behind the
concepts. Firstly, in much of the literature on VTE and DVT
prophylaxis, the rates for missing venogram data are very high
— frequently this is as high as 40 percent. Delfini’s
stance on missing data is that even a small drop-out rate or
percent of missing data can threaten validity.[2,3] But it is
the reason for the missing data rates that is truly what matters.
A fundamental issue in critical appraisal of clinical trials
is that there can be no difference between the groups being
studied since it is that difference that may account for the
difference in outcomes.
As stated
above, in examining multiple studies of VTE prophylaxis in THR
and TKR surgery, we found a high percentage of studies had missing
venogram information. It appears that patients and their clinicians
frequently chose to omit the final venogram despite a study
protocol requiring a venogram for assessing DVT rates. From
a clinical standpoint and a patient perspective, this makes
perfect sense. For example, most patients in the study will
be asymptomatic and there are risks associated with the procedure.
In addition undergoing a venogram is inconvenient (e.g., creating
a delay in hospital discharge).
So the key
question becomes — do the groups differ with respect to
the missing data? Success of concealed allocation, blinding
and comparable rates of missing data are all validity detection
clues to help ensure it is unlikely that the groups were different
or treated differently. In our review of the data, we think
that it may be reasonable to conclude that a decision to have
a final venogram was independent of anything about the interventions
and prognostic variables in the two groups and unlikely to be
the factor responsible for different DVT rates in the groups.
A different,
yet an interesting challenge with the Westrich study revolved
around the scientific evidence on compression devices.[4] This
study reported the overall incidence of deep vein thrombosis
(DVT) rates in total knee replacement (TKR) surgery rates in
mechanical compression plus enoxaparin versus mechanical compression
plus aspirin (ASA). Our original grading of this study (partly
due to problems in study reporting) was Grade U: Uncertain Validity.
Delfini almost never utilizes a Grade U study for questions
of efficacy. [NB: Following discussions with the author, clarifying
certain questions, the study was upgraded to Grade BU: Possible
to uncertain usefulness using the Delfini
Evidence Grading System.] However, upon careful review and
reasoning, and armed with evidence from another Grade BU study,
Haas, which studied aspirin alone for VTE prophylaxis, our team
was able to deduce that the results of Westrich were likely
to be valid and useful.[5]
Here is
our summation of the Westrich results:
Westrich 06 (grade B-U) reported overall DVT rates in TKR surgery
rates in the mechanical compression and enoxaparin group of
14.1% versus 17.8% in the mechanical compression and ASA group;
ARR 1.36%; 95% CI (-6.83% to 9.55%);
p=0.27. Rates in both groups are significantly lower than the
41% to 85% DVT incidence rates reported in the literature for
no VTE prophylaxis and the reported distal DVT rate of 47% (Haas
90) for aspirin alone.
- Mechanical
compression was initiated in the recovery room; 325 mg of
enteric-coated aspirin twice daily was started the night prior
to surgery; enoxaparin was started ~48 hours after removal
of the epidural catheter).
Here is
our reasoning as to why the Westrich results are likely to be
reliable:
- The Haas
study provided information about the rates of DVT likely to
be expected through use of aspirin (reported DVT rate of 47%).
DVT rates in the Westrich study groups (14.1% and 17.8%) were
dramatically better than what one would expect from aspirin
alone. After taking into account differences in the subjects
and other care in the two studies, the DVT rates in the two
studies remain extremely large.
- In Westrich,
mechanical compression was used on both lower extremeties.
Therefore, the difference between the two groups was likely
to be enoxaparin versus ASA.
- In Westrich,
the incidence rate of DVT in both groups was less than would
be expected based on the DVT rates reported in the Haas study
in which the intervention was aspirin versus mechanical compression.
Therefore, we considered that it was reasonable to conclude
that mechanical devices provide significant benefit in preventing
DVT since that would appear to explain the much lower incidence
rates of DVT in both Westrich study groups.
At times
it makes sense to grade individual study conclusions. Documentation
of reasons is always important and required as good evidence-based
practice.
Bottom
Line: It is important to understand critical appraisal
concepts, and it is important to critically appraise studies.
The goal, however, is getting close to truth. Doing so requires
critical thinking as well about the unique circumstances of
each study and each study topic.
References
1. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S.
Assessing the quality of randomized controlled trials: an annotated
bibliography of scales and checklists. Control Clin Trials.
1995 Feb;16(1):62-73. PMID: 7743790
2. Strite
SA, Stuart ME, Urban S. Process steps and suggestions for creating
drug monographs and drug class reviews in an evidence-based
formulary system. Formulary. April 2008;43:135–145.
3. Delfini
Group White Paper — Missing Data: Considerations
4. Westrich
GH, Bottner F, Windsor RE, Laskin RS, Haas SB, Sculco TP. VenaFlow
plus Lovenox vs VenaFlow plus aspirin for thromboembolic disease
prophylaxis in total knee arthroplasty. J Arthroplasty. 2006
Sep;21(6 Suppl 2):139-43. PMID: 16950076
5. Haas
SB, Insall JN, Scuderi GR, et al. Pneumatic sequential compression
boots compared with aspirin prophylaxis of deep-vein thrombosis
after total knee arthroplasty. J Bone Joint Surg Am 1990; 72:27–31.
PMID: 2404020
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Review of Bias In Diabetes Randomized Controlled Trials
04/30/2013
Healthcare professionals must evaluate the internal validity of randomized controlled trials (RCTs) as a first step in the process of considering the application of clinical findings (results) for particular patients. Bias has been repeatedly shown to increase the likelihood of distorted study results, frequently favoring the intervention.
Readers may be interested in a new systematic review of diabetes RCTs. Risk of bias (low, unclear or high) was assessed in 142 trials using the Cochrane Risk of Bias Tool. Overall, 69 trials (49%) had at least one out of seven domains with high risk of bias. Inadequate reporting frequently hampered the risk of bias assessment: the method of producing the allocation sequence was unclear in 82 trials (58%) and allocation concealment was unclear in 78 trials (55%). There were no significant reductions in the proportion of studies at high risk of bias over time nor in the adequacy of reporting of risk of bias domains. The authors conclude that these trials have serious limitations that put the findings in question and therefore inhibit evidence-based quality improvement (QI). There is a need to limit the potential for bias when conducting QI trials and improve the quality of reporting of QI trials so that stakeholders have adequate evidence for implementation. The entire freely-available study is available at—
http://bmjopen.bmj.com/content/3/4/e002727.long
Ivers NM, Tricco AC, Taljaard M, Halperin I, Turner L, Moher D, Grimshaw JM. Quality improvement needed in quality improvement randomised trials: systematic review of interventions to improve care in diabetes. BMJ Open. 2013 Apr 9;3(4). doi:pii: e002727. 10.1136/bmjopen-2013-002727. Print 2013. PubMed PMID: 23576000.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
Comparative Study Designs: Claiming Superiority, Equivalence and Non-inferiority—A Few Considerations & Practical Approaches
06/18/2014
This is a complex area, and we recommend downloading our freely available 1-page summary to help assess issues with equivalence and non-inferiority trials. Here is a short sampling of some of the problems in these designs: lack of sufficient evidence confirming efficacy of referent treatment, ("referent" refers to the comparator treatment); study not sufficiently similar to referent study; inappropriate Deltas (meaning the margin established for equivalence or non-inferiority); or significant biases or analysis methods that would tend to diminish an effect size and "favor" no difference between groups (e.g., conservative application of ITT analysis, insufficient power, etc.), thus pushing toward non-inferiority or equivalence.
However, we do want to say a few more things about non-inferiority trials based on some recent questions and readings.
Is it acceptable to claim superiority in a non-inferiority trial? Yes. The Food and Drug Administration (FDA) and the European Medicines Agency (EMA), among others, including ourselves, all agree that declaring superiority in a non-inferiority trial is acceptable. What's more, there is agreement that multiplicity adjusting does not need to be done when first testing for non-inferiority and then superiority.
See Delfini Recommended Reading: Included here is a nice article by Steve Snapinn. Snapinn even recommends that "…most, if not all, active-controlled clinical trial protocols should define a noninferiority margin and include a noninferiority hypothesis." We agree. Clinical trials are expensive to do, take time, have opportunity costs, and—most importantly—are of impact on the lives of the human subjects who engage in them. This is a smart procedure that costs nothing especially as multiplicity adjusting is not needed.
What does matter is having an appropriate population for doing a superiority analysis. For superiority, in studies with dichotomous variables, the population should be Intention-to-Treat (ITT) with an appropriate imputation method that does not favor the intervention under study. In studies with time-to-event outcomes, the population should be based on the ITT principle (meaning all randomized patients should be used in the analysis by the group to which they were randomized) with unbiased censoring rules.
Confidence intervals (CIs) should be evaluated to determine superiority. Some evaluators seem to suggest that superiority can be declared only if the CIs are wholly above the Delta. Schumi et al. express their opinion that you can declare superiority if the confidence interval for the new treatment is above the line of no difference (i.e.., is statistically significant). They state, “The calculated CI does not know whether its purpose is to judge superiority or non-inferiority. If it sits wholly above zero [or 1, depending upon the measure of outcome], then it has shown superiority.” EMA would seem to agree. We agree as well. If one wishes to take a more conservative approach, one method we recommend is to judge whether the Delta seems clinically reasonable (you should always do this) and if not, establishing your own through clinical judgment. Then determine if the entire CI meets or exceeds what you deem to be clinically meaningful. To us, this method satisfies both approaches and makes practical and clinical sense.
Is it acceptable to claim non-inferiority trial superiority? It depends. This area is controversial with some saying no and some saying it depends. However, there is agreement amongst those on the "it depends" side that it generally should not be done due to validity issues as described above.
References
US Department of Health and Human Services, Food and Drug Administration: Guidance for Industry Non-Inferiority Clinical Trials (DRAFT). 2010.
http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/ Guidances/UCM202140.pdf
European Agency for the Evaluation of Medicinal Products Committee for Proprietary Medicinal Products (CPMP): Points to Consider on Switching Between Superiority and Non-Inferiority. 2000. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014556/
http://www.delfini.org/delfiniReading.htm#equivalence
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message
|
Are Adaptive Trials Ready For Primetime?
06/26/2012
It is well-known that many patients volunteer for clinical trials because they mistakenly believe that the goal of the trial is to improve outcomes for the volunteers. A type of trial that does attempt to improve outcomes for those who enter into the trial late is the adaptive trial. In adaptive trials investigators change the enrollment and treatment procedures as the study gathers data from the trial about treatment efficacy. For example, if a study compares a new drug against a placebo treatment and the drug appears to be working, subjects enrolling later will be more likely to receive it. The idea is that adaptive designs will attract more study volunteers.
As pointed out in a couple of recent commentaries, however, there are many unanswered questions about this type of trial. A major concern is the problem of unblinding that may occur with this design with resulting problems with allocation of patients to groups. Frequent peeks at the data may influence decisions made by monitoring boards, investigators and participants. Another issue is the unknown ability to replicate adaptive trials. Finally, there are ethical questions such as the issue of greater risk for early enrollees compared to risk for later enrollees.
For further information see—
1. Adaptive Trials in Clinical Research: Scientific and Ethical Issues to Consider
van der Graaf R, Roes KC, van Delden JJ. Adaptive Trials in Clinical Research: Scientific and Ethical Issues to ConsiderAdaptive Trials in Clinical Research. JAMA. 2012 Jun 13;307(22):2379-80. PubMed PMID: 22692169.
2. Adaptive Clinical Trials: A Partial Remedy for the Therapeutic Misconception?
Meurer WJ, Lewis RJ, Berry DA. Adaptive clinical trials: a partial remedy for the therapeutic Misconception?adaptive clinical trials. JAMA. 2012 Jun 13;307(22):2377-8. PubMed PMID: 22692168.
Click here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.
|
|
........
CONTACT DELFINI  
|
At DelfiniClick™

DelfiniClick™
|
Read Our Blog...
Menu........
Use of our website implies agreement to our Notices. Citations for references available upon request.
Home
Best of Delfini
What's New
Blog
Seminars
Services
Delfini Group Publishing
Resources
Sample Projects
Notices
About
Us & Our Work
Testimonials
Other
Site Search
Contact
Info/Updates
........
Quick
Navigator to Selected Resources
.......................
.
|