| Use
of Evidence:
Reporting
the Evidence
Contents
- Untrustable
Abstracts & P-Values »
- CONSORT
Statement on Harms »
- When
There is No Evidence »
- TREND:
Reporting Standards for Non-randomized Studies »
- Poorly
Written Papers »
- Media
Heyday: Aspirin and (Potentially) Reduced Risk of Breast Cancer
»
|
| Untrustable
P-values & Abstracts
One
of the first things we teach our EBM learners is that although abstracts
can be useful to get a sense of what an article is about and can
be at times be used to exclude studies from further review, abstracts
cannot reliably be used to determine if a study is valid.
Validity
must be determined by examining the methods of the study (assuming
it is the right study type). A little-known problem with abstracts
is that the information provided in the abstract cannot be documented
in the body of the paper up to 68% of the time in some of the top-tier
medical journals [Pitkin, R et al. Accuracy of Data in Abstracts
of Published Research Articles. JAMA. 1999; 281: 1110-1111 PMID:
10188662 — reviewing JAMA, NEJM, The Lancet, The Annuals of
Internal Medicine, BMJ and the Canadian Medical Journal]. In this
DelfiniClick we report another problem with abstracts—the
problem of bias.
Peter
C Gøtzsche in a BMJ article (Believability of relative risks
and odds ratios in abstracts: cross sectional study. BMJ 2006;333;231-234;
PMID: 16854948) reviews previous publications reporting biased results-reporting
and biased reporting of conclusions, and he presents additional
evidence of bias in reporting P values.
We
do not have the expertise to evaluate all the points made in his
paper; however, we present his comments and findings here for you
to evaluate and draw your own conclusions. Although, we believe
the assumptions upon which Gøtzsche bases his conclusions
can
be challenged, the following should be of interest to anyone interested
in critical apppraisal of the medical literature.
Gøtzsche’s
Comments
-
Significant results in abstracts should generally be disbelieved
-
Ongoing research has shown that more than 200 statistical tests
are sometimes specified in trial protocols. If you compare a treatment
with itself—that is, the null hypothesis of no difference
is known to be true—the chance that one or more of 200 tests
will be statistically significant at the 5% level is 99.996% if
we assume the tests are independent
- Thus,
the investigators or sponsor can be fairly confident that “something
interesting will turn up.”
-
Due allowance for multiple testing is rarely made, and it is generally
not possible to discern reliably between primary and secondary
outcomes
-
Recent studies that compared protocols with trial reports have
shown selective publication of outcomes, depending on the obtained
P values, and that at least one primary outcome was changed, introduced,
or omitted in 62% of the trials.
-
The scope for bias is also large in observational studies. Many
studies are underpowered and do not give any power calculations.
-
Furthermore, a survey found that 92% of articles adjusted for
confounders and reported a median of seven confounders but most
did not specify whether they were pre-declared.
-
Fourteen per cent of these articles reported more than 100 effect
estimates, and subgroup analyses appeared in 57% of studies and
were generally believed.
-
The preponderance of significant results could be reduced if the
following actions were taken.
- First,
if we need a conventional significance level at all, which
is doubtful, it should be set at P < 0.001
-
Second, analysis of data and writing of manuscripts should
be done blind, hiding the nature of the interventions, exposures,
or disease status, as applicable, until all authors have approved
the two versions of the text
-
Third, journal editors should scrutinize abstracts more closely
and demand that research protocols and raw data—both for
randomized trials and for observational studies—be submitted
with the manuscript.
In
short, yet another reminder to read the methods section of papers
and not rely on results or conclusions presented in abstracts.
Gøtzsche’s
Findings in Brief
-
The first result in the abstract was statistically significant
in 70% of the trials, 84% of cohort studies and 84% of case-control
studies. Although many of these results were derived from subgroup
or secondary analyses, or biased selection of results, they were
presented without reservations in 98% of the trials
-
The distribution of P values in the studies he reviewed in the
interval 0.04 to 0.06 was extremely skewed
-
The number of P values in the interval 0.05 <= P < 0.06
would be expected to be similar to the number in the interval
0.04 <= P < 0.05, but he found five in the first interval
compared with 46 in the second, which is highly unlikely to occur
(P < 0.0001) if researchers are unbiased when they analyze
and report their data.
-
The distribution of P values between 0.04 and 0.06 was even more
extreme for the observational studies he reviewed
-
Nine cohort studies and eight case-control studies gave P
values in this interval, but in all 17 cases P values were
presented as < 0.05
-
One of the nine cohort studies and two of the eight case-control
studies gave a confidence interval where one of the borders was
touching one; in all three studies, this was interpreted as a
positive finding, although in one this seemed to be the only positive
result out of six time periods the authors had reported
|
| CONSORT
Statement on Harms
One
of the main reasons for using valid, relevant evidence in health
care is to more accurately predict outcomes from various interventions
and thus be equipped to make informed choices. The area of harms
has always been problematic because the terminology used in the
literature varies greatly, adverse events are frequently rare and
are often detected by observational means long after a drug or intervention
has become standard of care. Searching for and finding adverse events
may also require a separate search after finding quality evidence
regarding benefit.
CONSORT (Consolidating Standards of Reporting Trials) is a checklist
aimed at standardizing published reports of RCTs, but the CONSORT
items contained only 1 item dealing with harms. Now the CONSORT
group is adding a number of items dealing with harms to the checklist.
Ioannidis JP, Evans SJ, MSc; Peter C. Gøtzsche PC, et al.
for the CONSORT Group have published in the Annals of Internal Medicine
an article titled, “Better Reporting of Harms in Randomized
Trials: An Extension of the CONSORT Statement,”; 16 November
2004; Volume 141; Issue 10; Pages 781-788. The group made 10 new
recommendations (e.g., listing adverse events with definitions,
stating in the title and abstracts that the study collected data
about harms) about reporting harms-related issues along with examples
to highlight specific aspects of proper reporting.
The 2001 CONSORT Statement (without this update) is available at
(http://www.consort-statement.org).
Hopefully the new items dealing with harms will help authors improve
their reporting and users in finding harms-related data.
|
|
When
there is No Evidence: One Perspective...
The
Delfini definition of evidence-based medicine is simply the use
of the scientific method and application of valid and useful science
to inform health care provision, practice, evaluation and decisions.
To do this, you look first to the evidence and then work to assess
its scientific quality, usefulness and application. Failing useful
evidence, you then have to make choices based on an assortment of
what we refer to as “other triangulation issues,” which
include patient preferences, community standards, legal considerations,
publicity, and so forth.
Aron
Sousa, MD, from the Department of Medicine at Michigan State University
writes:
“I
have a question about the very bottom of the evidence hierarchy.
Most of my work as an educator and clinician deals with issues at
the top of the evidence hierarchy, but of late I have become involved
in a clinical area with no high level and little low level clinical
evidence. I am an internist who has begun to care for adult patients
who were born with ambiguous genitalia (intersex conditions). Most
of these people underwent (and many children still undergo) surgeries
designed to "normalize" the appearance of their genitals
(we are not talking about urinary, sexual, or reproductive function).
In terms of the available evidence, the intellectual basis of the
surgeries (children with abnormal genitals become abnormal adults)
is based on a fraudulent case study (John-Joan), there is no evidence
of a need for these surgeries, there are a series of poorly done
case series of short-term surgical outcomes, and there is a whole
host of expert opinions and published MGSATs (multiple guys sitting
around together). When pressed for justification, surgeons (and
parents) tend to fall back to fears of future schoolyard and locker
room bullying and harassment.
In
general I'd say that you have to do the best you can with the evidence
you have, but here is the thing. The adult patient reports of their
treatment are horrific and impressive in their volume and consistency.
Multiple scholars and reporters have looked for patients happy with
their treatment and not found one -- not one, not even one who is
happy but not willing to go public. In truth finding such a patient
is a bit hard to do since a successfully treated patient would have
been lied to and would not know of their condition. (There are clearly
ethical problems as well.) Independent patient report does not make
most hierarchies of evidence but in the Internet era is one of the
most prevalent data reports we have.
In
this situation there are patient opinions on the value of surgery
that are nearly unanimous but uncontrolled and self selecting vs.
experts with little intellectual or ethical standing. How can EBM
help me deal with this? No fair punting and suggesting I get better
data."
Our
reaction is this:
We
would consider the reports from patients to be "evidence"
as well -- and of "uncertain" quality as is the "evidence"
from the experts and for all the excellent reasons Dr. Sousa has
raised.
"How
EBM can help" is simply to say that you strive to see if valid
and useful scientific information can reduce your uncertainty. At
this point, with the available information, the medical literature
cannot provide us with a clear answer.
After
trying to round up everything that might be germane to the issue
and understanding what the quality of that evidence, in a situation
such as this, we would suggest one look to patient involvement as
a real partner.
The
Delfini model for patient decision-making gives suggested approaches
where, when lack of helpful evidence leaves one uncertain, we believe
it is a matter of sharing that information and assorted facts with
the patient -- then engaging with them to determine what mode of
decision making they desire.
http://www.delfini.org/page_SamePage_PatDM.htm#dm
Dr.
Sousa writes back:
"Thanks very much for this. While I find uncertainty a motivating
factor to seek better data and more understanding, my surgical colleagues
appear to view uncertainty as something that can be cut out with
a scalpel. The issue of risk data gets at the very heart of our
problem...without evidence of need, we do not need therapy. The
painful retort "absence of evidence is not evidence of absence"
loses sight of the fact the burden of proof should fall on the therapy
and not on the patient.
As
you clearly realize, shared decision making is the only reasonable
model for helping these patients.
Thanks
very much for your insights.
Aron"
|
| TREND:
Reporting Standards for Non-randomized Studies
In
an article entitled, “Evidence-Based Public Health: Moving
Beyond Randomized Trials” by Cesar G. Victora, MD, PhD, Jean-Pierre
Habicht, MD, PhD and Jennifer Bryce, EdD describes the evidence-based
movement in public health practices.
Victora
CG, Habicht, JP, Bryce J “Evidence-Based Public Health: Moving
Beyond Randomized Trials” Am J Public Health. 2004 Mar;94(3):400-5.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=
pubmed&dopt=Abstract&list_uids=14998803
The
authors argue that there is an urgent need to develop evaluation
standards and protocols for use of non-randomized studies in circumstances
where RCTs are not appropriate or where strong plausibility support
for RCTs can be provided by reporting intermediate steps along a
causal pathway.
For
example, a study reporting that 1 year old children in Brazil attending
14 health centers randomized to a health care training program had
significantly greater weight gain over 6 months than children attending
14 matched clinics with standard care.
Victora
et al. acknowledge the limited internal validity of the study, but
believe the study would be less convincing if the authors had not
demonstrated that –
o It was possible to train a large number of health care workers,
o Trained workers performed better,
o Mothers were receptive and understood the messages,
o Mothers in the intervention group changed their breast feeding
behavior, and
o Children in the intervention group had better growth rates.
In
a commentary, Des Jarlais DC, Lyles C, Crepaz N; TREND Group present
the initial version of the Transparent Reporting of Evaluations
with Non-randomized Designs (TREND), a checklist for reporting behavioral
and public health interventions using non-randomized designs. (Am
J Public Health. 2004 Mar;94(3):361-6.).
The
TREND checklist will be of interest for everyone reading the behavioral
and public health literature. The initial version is of the TREND
checklist summarized at:
http://www.ajph.org/cgi/content/abstract/94/3/361 |
| Poorly
Written Papers
Horacio
Plotkin, assistant professor of paediatrics and orthopaedics at
the University of Nebraska Medical Center, Omaha, has written a
spoof on how to get your paper rejected. However, in our line of
work, with what we see -- we see a lot of this that gets published!
Here's what's not to do...
Plotkin
H. How
to Get Your Paper Rejected. BMJ 2004;329:1469
(18 December), doi:10.1136/bmj.329.7480.1469
(And
then there is also the wee BMJ annual Christmas present —
here.)
|
| Media
Heyday: Aspirin and (Potentially) Reduced
Risk of Breast Cancer
We
have repeatedly seen clinicians and patients make therapeutic decisions
based on observational data. The HRT story is the classic case.
Using HRT in women with coronary artery disease became usual care
based on case-control and cohort studies. Years later RCTs showed
there were more harms than benefits with HRT and no cardiac protection.
It
is interesting to look at some of the language in the media when
a “breakthrough” publication appears. Below are some
quotes from various newspapers regarding the association of ASA
and a decreased risk of breast cancer for the JAMA case/control
study (Terry MB, Gammon MD, Zhang FF, et al. Association of frequency
and duration of aspirin use and hormone receptor status with breast
cancer risk. JAMA. 2004;291:2433-2440 — PMID: 15161893).
The
ASA-breast cancer study demonstrates how unproven interventions
get attention —» and then are likely to become common
practice —» then usual care —» and at times
standards of care — before valid evidence of benefit has been
presented.
Health-AFP
o “Women who regularly take aspirin appear to have
a reduced risk of breast cancer, a study in the May 26
issue of the Journal of the American Medical Association found.”
o “Other studies already had shown a link between aspiring
consumption and reducing breast cancer risk but this was the
first to show a link between the medicine and reducing
the breast cancer risk in women with hormone-receptor-positive cancers.”
Health-Associated
Press
o “An effective weapon against many women's most feared
disease might be as close as their medicine cabinets, according
to new research linking aspirin with a reduced risk of breast cancer.”
o "It's a landmark study," said Dr. Sheryl
Gabram, a breast specialist at Loyola University Medical Center
in suburban Chicago who was not involved in the study.
o “…The results are tantalizing and make biological
sense, the researchers and other doctors said.”
Los
Angeles Times:
o “An aspirin a day…may protect women against
breast cancer, especially those who have gone through menopause.”
o “The study also found that daily aspirin use reduced
by 32% the incidence of tumors fueled by estrogen, which
accounted for 70% to 75% of all breast cancers…” [A
correct statement would be that the study was associated with a
reduced incidence.]
o “In an accompanying editorial, Dr. Raymond N. DuBois of
Vanderbilt University in Nashville said that despite emerging evidence
supporting aspirin's potential, it was too soon to recommend
it for breast cancer prevention because doctors didn't know the
optimal dose or regimen.”
And
the headlines themselves can be very misleading. While some articles
responsibly include something in their headers that indicates this
is still a question, others blatantly indicate a cause/effect relationship.
Here’s
the title of a National Public Radio news audio: Study: "Aspirin
Cuts Breast Cancer Risk” – despite the use
of “may” in the body of the text.
And
the AFP Title of their article is, “Aspirin can reduce
breast cancer risk: study” – despite their
use of the word, “appears” in the article itself.
And
from Reuter’s Health Information – "Hormones
Affect Aspirin's Anti-breast Cancer Effect”
And
the headline at KRON 4 — The Bay Area's News Station and voted
California's Best TV Website by the Associated Press, announces
— "Aspirin Reduces Breast Cancer Risk."
So now we know.
To
be fair, most of the newspaper articles point out that the study
is not definitive, but without further explanation of confounding,
most lay (and professional) readers will assume that phrases such
as “linked-to” and “appear to have a decrease
risk” are read as statements of cause and effect.
What
we might do to help…
If we could get media writers to understand, perhaps they could
add something like this:
“It
is important to point out that this type of study cannot show cause
and effect. When people chose to take a treatment (aspirin in this
case) and the researchers compare the incidence of breast cancer
to people who do not chose to take aspirin, the results are very
likely to be “confounded” by another factor. The biggest
problem in studies of this type is that the group taking aspirin
differs from the group not taking aspirin. Women who chose to take
aspirin may take better care of themselves in many ways —
diet, optimal weight, not smoking, good exercise, etc. They may
have genetic differences from those who chose not to take aspirin.
All of the potential differences could never be known, so 'adjusting'
for these factors statistically (as is done in this type of study)
will never be enough.
What
should be done? Only a different type of study can tell us if aspirin
truly results in a reduced incidence of breast cancer. Women would
have to be blindly 'randomized' to each group in order to distribute
the unknown differences (confounders) equally between the aspirin
and non-aspirin groups. Only then can we isolate the intervention
(aspirin or placebo) and know that, if a difference is found, that
the difference is truly due to aspirin and not some other factor
(one of the many confounders).”
Also,
we think it is important to point out harms. In the case of aspirin,
the following would be responsible reporting:
"Before
taking aspirin, patients should be aware of the fact that taking
aspirin daily carries risks such as stomach problems and bleeding.
For example, over 5 years of taking aspirin, the risk of developing
a major problem with bleeding is about 1 in 500." (Ref: PS
Sanmuganathan et al. Aspirin for primary prevention of coronary
heart disease: safety and absolute benefit related to coronary risk
derived from meta-analysis of randomised trials. Heart 2001 85:
265-271). |
|