DelfiniClick

Use of Evidence:
Reporting the Evidence

Contents      

  • Untrustable Abstracts & P-Values »
  • CONSORT Statement on Harms »
  • When There is No Evidence »
  • TREND: Reporting Standards for Non-randomized Studies »
  • Poorly Written Papers »
  • Media Heyday: Aspirin and (Potentially) Reduced Risk of Breast Cancer »

Untrustable P-values & Abstracts

One of the first things we teach our EBM learners is that although abstracts can be useful to get a sense of what an article is about and can be at times be used to exclude studies from further review, abstracts cannot reliably be used to determine if a study is valid.

Validity must be determined by examining the methods of the study (assuming it is the right study type). A little-known problem with abstracts is that the information provided in the abstract cannot be documented in the body of the paper up to 68% of the time in some of the top-tier medical journals [Pitkin, R et al. Accuracy of Data in Abstracts of Published Research Articles. JAMA. 1999; 281: 1110-1111 PMID: 10188662 — reviewing JAMA, NEJM, The Lancet, The Annuals of Internal Medicine, BMJ and the Canadian Medical Journal]. In this DelfiniClick we report another problem with abstracts—the problem of bias.

Peter C Gøtzsche in a BMJ article (Believability of relative risks and odds ratios in abstracts: cross sectional study. BMJ 2006;333;231-234; PMID: 16854948) reviews previous publications reporting biased results-reporting and biased reporting of conclusions, and he presents additional evidence of bias in reporting P values.

We do not have the expertise to evaluate all the points made in his paper; however, we present his comments and findings here for you to evaluate and draw your own conclusions. Although, we believe the assumptions upon which Gøtzsche bases his conclusions can be challenged, the following should be of interest to anyone interested in critical apppraisal of the medical literature.

Gøtzsche’s Comments

  • Significant results in abstracts should generally be disbelieved
  • Ongoing research has shown that more than 200 statistical tests are sometimes specified in trial protocols. If you compare a treatment with itself—that is, the null hypothesis of no difference is known to be true—the chance that one or more of 200 tests will be statistically significant at the 5% level is 99.996% if we assume the tests are independent
  • Thus, the investigators or sponsor can be fairly confident that “something interesting will turn up.”
  • Due allowance for multiple testing is rarely made, and it is generally not possible to discern reliably between primary and secondary outcomes
  • Recent studies that compared protocols with trial reports have shown selective publication of outcomes, depending on the obtained P values, and that at least one primary outcome was changed, introduced, or omitted in 62% of the trials.
  • The scope for bias is also large in observational studies. Many studies are underpowered and do not give any power calculations.
  • Furthermore, a survey found that 92% of articles adjusted for confounders and reported a median of seven confounders but most did not specify whether they were pre-declared.
  • Fourteen per cent of these articles reported more than 100 effect estimates, and subgroup analyses appeared in 57% of studies and were generally believed.
  • The preponderance of significant results could be reduced if the following actions were taken.
    • First, if we need a conventional significance level at all, which is doubtful, it should be set at P < 0.001
    • Second, analysis of data and writing of manuscripts should be done blind, hiding the nature of the interventions, exposures, or disease status, as applicable, until all authors have approved the two versions of the text
    • Third, journal editors should scrutinize abstracts more closely and demand that research protocols and raw data—both for randomized trials and for observational studies—be submitted with the manuscript.

In short, yet another reminder to read the methods section of papers and not rely on results or conclusions presented in abstracts.

Gøtzsche’s Findings in Brief

  • The first result in the abstract was statistically significant in 70% of the trials, 84% of cohort studies and 84% of case-control studies. Although many of these results were derived from subgroup or secondary analyses, or biased selection of results, they were presented without reservations in 98% of the trials
  • The distribution of P values in the studies he reviewed in the interval 0.04 to 0.06 was extremely skewed
  • The number of P values in the interval 0.05 <= P < 0.06 would be expected to be similar to the number in the interval 0.04 <= P < 0.05, but he found five in the first interval compared with 46 in the second, which is highly unlikely to occur (P < 0.0001) if researchers are unbiased when they analyze and report their data.
  • The distribution of P values between 0.04 and 0.06 was even more extreme for the observational studies he reviewed
    • Nine cohort studies and eight case-control studies gave P values in this interval, but in all 17 cases P values were presented as < 0.05
  • One of the nine cohort studies and two of the eight case-control studies gave a confidence interval where one of the borders was touching one; in all three studies, this was interpreted as a positive finding, although in one this seemed to be the only positive result out of six time periods the authors had reported

CONSORT Statement on Harms

One of the main reasons for using valid, relevant evidence in health care is to more accurately predict outcomes from various interventions and thus be equipped to make informed choices. The area of harms has always been problematic because the terminology used in the literature varies greatly, adverse events are frequently rare and are often detected by observational means long after a drug or intervention has become standard of care. Searching for and finding adverse events may also require a separate search after finding quality evidence regarding benefit.


CONSORT (Consolidating Standards of Reporting Trials) is a checklist aimed at standardizing published reports of RCTs, but the CONSORT items contained only 1 item dealing with harms. Now the CONSORT group is adding a number of items dealing with harms to the checklist.


Ioannidis JP, Evans SJ, MSc; Peter C. Gøtzsche PC, et al. for the CONSORT Group have published in the Annals of Internal Medicine an article titled, “Better Reporting of Harms in Randomized Trials: An Extension of the CONSORT Statement,”; 16 November 2004; Volume 141; Issue 10; Pages 781-788. The group made 10 new recommendations (e.g., listing adverse events with definitions, stating in the title and abstracts that the study collected data about harms) about reporting harms-related issues along with examples to highlight specific aspects of proper reporting.


The 2001 CONSORT Statement (without this update) is available at (http://www.consort-statement.org). Hopefully the new items dealing with harms will help authors improve their reporting and users in finding harms-related data.

When there is No Evidence: One Perspective...

The Delfini definition of evidence-based medicine is simply the use of the scientific method and application of valid and useful science to inform health care provision, practice, evaluation and decisions. To do this, you look first to the evidence and then work to assess its scientific quality, usefulness and application. Failing useful evidence, you then have to make choices based on an assortment of what we refer to as “other triangulation issues,” which include patient preferences, community standards, legal considerations, publicity, and so forth.

Aron Sousa, MD, from the Department of Medicine at Michigan State University writes:

“I have a question about the very bottom of the evidence hierarchy. Most of my work as an educator and clinician deals with issues at the top of the evidence hierarchy, but of late I have become involved in a clinical area with no high level and little low level clinical evidence. I am an internist who has begun to care for adult patients who were born with ambiguous genitalia (intersex conditions). Most of these people underwent (and many children still undergo) surgeries designed to "normalize" the appearance of their genitals (we are not talking about urinary, sexual, or reproductive function). In terms of the available evidence, the intellectual basis of the surgeries (children with abnormal genitals become abnormal adults) is based on a fraudulent case study (John-Joan), there is no evidence of a need for these surgeries, there are a series of poorly done case series of short-term surgical outcomes, and there is a whole host of expert opinions and published MGSATs (multiple guys sitting around together). When pressed for justification, surgeons (and parents) tend to fall back to fears of future schoolyard and locker room bullying and harassment.

In general I'd say that you have to do the best you can with the evidence you have, but here is the thing. The adult patient reports of their treatment are horrific and impressive in their volume and consistency. Multiple scholars and reporters have looked for patients happy with their treatment and not found one -- not one, not even one who is happy but not willing to go public. In truth finding such a patient is a bit hard to do since a successfully treated patient would have been lied to and would not know of their condition. (There are clearly ethical problems as well.) Independent patient report does not make most hierarchies of evidence but in the Internet era is one of the most prevalent data reports we have.

In this situation there are patient opinions on the value of surgery that are nearly unanimous but uncontrolled and self selecting vs. experts with little intellectual or ethical standing. How can EBM help me deal with this? No fair punting and suggesting I get better data."

Our reaction is this:

We would consider the reports from patients to be "evidence" as well -- and of "uncertain" quality as is the "evidence" from the experts and for all the excellent reasons Dr. Sousa has raised.

"How EBM can help" is simply to say that you strive to see if valid and useful scientific information can reduce your uncertainty. At this point, with the available information, the medical literature cannot provide us with a clear answer.

After trying to round up everything that might be germane to the issue and understanding what the quality of that evidence, in a situation such as this, we would suggest one look to patient involvement as a real partner.

The Delfini model for patient decision-making gives suggested approaches where, when lack of helpful evidence leaves one uncertain, we believe it is a matter of sharing that information and assorted facts with the patient -- then engaging with them to determine what mode of decision making they desire.

http://www.delfini.org/page_SamePage_PatDM.htm#dm

Dr. Sousa writes back:
"Thanks very much for this. While I find uncertainty a motivating factor to seek better data and more understanding, my surgical colleagues appear to view uncertainty as something that can be cut out with a scalpel. The issue of risk data gets at the very heart of our problem...without evidence of need, we do not need therapy. The painful retort "absence of evidence is not evidence of absence" loses sight of the fact the burden of proof should fall on the therapy and not on the patient.

As you clearly realize, shared decision making is the only reasonable model for helping these patients.

Thanks very much for your insights.

Aron"

TREND: Reporting Standards for Non-randomized Studies

In an article entitled, “Evidence-Based Public Health: Moving Beyond Randomized Trials” by Cesar G. Victora, MD, PhD, Jean-Pierre Habicht, MD, PhD and Jennifer Bryce, EdD describes the evidence-based movement in public health practices.

Victora CG, Habicht, JP, Bryce J “Evidence-Based Public Health: Moving Beyond Randomized Trials” Am J Public Health. 2004 Mar;94(3):400-5.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=
pubmed&dopt=Abstract&list_uids=14998803

The authors argue that there is an urgent need to develop evaluation standards and protocols for use of non-randomized studies in circumstances where RCTs are not appropriate or where strong plausibility support for RCTs can be provided by reporting intermediate steps along a causal pathway.

For example, a study reporting that 1 year old children in Brazil attending 14 health centers randomized to a health care training program had significantly greater weight gain over 6 months than children attending 14 matched clinics with standard care.

Victora et al. acknowledge the limited internal validity of the study, but believe the study would be less convincing if the authors had not demonstrated that –
o It was possible to train a large number of health care workers,
o Trained workers performed better,
o Mothers were receptive and understood the messages,
o Mothers in the intervention group changed their breast feeding behavior, and
o Children in the intervention group had better growth rates.

In a commentary, Des Jarlais DC, Lyles C, Crepaz N; TREND Group present the initial version of the Transparent Reporting of Evaluations with Non-randomized Designs (TREND), a checklist for reporting behavioral and public health interventions using non-randomized designs. (Am J Public Health. 2004 Mar;94(3):361-6.).

The TREND checklist will be of interest for everyone reading the behavioral and public health literature. The initial version is of the TREND checklist summarized at:

http://www.ajph.org/cgi/content/abstract/94/3/361

Poorly Written Papers

Horacio Plotkin, assistant professor of paediatrics and orthopaedics at the University of Nebraska Medical Center, Omaha, has written a spoof on how to get your paper rejected. However, in our line of work, with what we see -- we see a lot of this that gets published! Here's what's not to do...

Plotkin H. How to Get Your Paper Rejected. BMJ 2004;329:1469 (18 December), doi:10.1136/bmj.329.7480.1469

(And then there is also the wee BMJ annual Christmas present — here.)

Media Heyday: Aspirin and (Potentially) Reduced Risk of Breast Cancer

We have repeatedly seen clinicians and patients make therapeutic decisions based on observational data. The HRT story is the classic case. Using HRT in women with coronary artery disease became usual care based on case-control and cohort studies. Years later RCTs showed there were more harms than benefits with HRT and no cardiac protection.

It is interesting to look at some of the language in the media when a “breakthrough” publication appears. Below are some quotes from various newspapers regarding the association of ASA and a decreased risk of breast cancer for the JAMA case/control study (Terry MB, Gammon MD, Zhang FF, et al. Association of frequency and duration of aspirin use and hormone receptor status with breast cancer risk. JAMA. 2004;291:2433-2440 — PMID: 15161893).

The ASA-breast cancer study demonstrates how unproven interventions get attention —» and then are likely to become common practice —» then usual care —» and at times standards of care — before valid evidence of benefit has been presented.

Health-AFP
o “Women who regularly take aspirin appear to have a reduced risk of breast cancer, a study in the May 26 issue of the Journal of the American Medical Association found.”
o “Other studies already had shown a link between aspiring consumption and reducing breast cancer risk but this was the first to show a link between the medicine and reducing the breast cancer risk in women with hormone-receptor-positive cancers.”

Health-Associated Press
o “An effective weapon against many women's most feared disease might be as close as their medicine cabinets, according to new research linking aspirin with a reduced risk of breast cancer.”
o "It's a landmark study," said Dr. Sheryl Gabram, a breast specialist at Loyola University Medical Center in suburban Chicago who was not involved in the study.
o “…The results are tantalizing and make biological sense, the researchers and other doctors said.”

Los Angeles Times:
o “An aspirin a day…may protect women against breast cancer, especially those who have gone through menopause.”
o “The study also found that daily aspirin use reduced by 32% the incidence of tumors fueled by estrogen, which accounted for 70% to 75% of all breast cancers…” [A correct statement would be that the study was associated with a reduced incidence.]
o “In an accompanying editorial, Dr. Raymond N. DuBois of Vanderbilt University in Nashville said that despite emerging evidence supporting aspirin's potential, it was too soon to recommend it for breast cancer prevention because doctors didn't know the optimal dose or regimen.”

And the headlines themselves can be very misleading. While some articles responsibly include something in their headers that indicates this is still a question, others blatantly indicate a cause/effect relationship.

Here’s the title of a National Public Radio news audio: Study: "Aspirin Cuts Breast Cancer Risk” – despite the use of “may” in the body of the text.

And the AFP Title of their article is, “Aspirin can reduce breast cancer risk: study” – despite their use of the word, “appears” in the article itself.

And from Reuter’s Health Information – "Hormones Affect Aspirin's Anti-breast Cancer Effect”

And the headline at KRON 4 — The Bay Area's News Station and voted California's Best TV Website by the Associated Press, announces — "Aspirin Reduces Breast Cancer Risk." So now we know.

To be fair, most of the newspaper articles point out that the study is not definitive, but without further explanation of confounding, most lay (and professional) readers will assume that phrases such as “linked-to” and “appear to have a decrease risk” are read as statements of cause and effect.

What we might do to help…
If we could get media writers to understand, perhaps they could add something like this:

“It is important to point out that this type of study cannot show cause and effect. When people chose to take a treatment (aspirin in this case) and the researchers compare the incidence of breast cancer to people who do not chose to take aspirin, the results are very likely to be “confounded” by another factor. The biggest problem in studies of this type is that the group taking aspirin differs from the group not taking aspirin. Women who chose to take aspirin may take better care of themselves in many ways — diet, optimal weight, not smoking, good exercise, etc. They may have genetic differences from those who chose not to take aspirin. All of the potential differences could never be known, so 'adjusting' for these factors statistically (as is done in this type of study) will never be enough.

What should be done? Only a different type of study can tell us if aspirin truly results in a reduced incidence of breast cancer. Women would have to be blindly 'randomized' to each group in order to distribute the unknown differences (confounders) equally between the aspirin and non-aspirin groups. Only then can we isolate the intervention (aspirin or placebo) and know that, if a difference is found, that the difference is truly due to aspirin and not some other factor (one of the many confounders).”

Also, we think it is important to point out harms. In the case of aspirin, the following would be responsible reporting:

"Before taking aspirin, patients should be aware of the fact that taking aspirin daily carries risks such as stomach problems and bleeding. For example, over 5 years of taking aspirin, the risk of developing a major problem with bleeding is about 1 in 500." (Ref: PS Sanmuganathan et al. Aspirin for primary prevention of coronary heart disease: safety and absolute benefit related to coronary risk derived from meta-analysis of randomised trials. Heart 2001 85: 265-271).


© Delfini Group, LLC, 2002-2008. All Rights Reserved Worldwide.
Use of this website implies your agreement to our Notices »

Counter courtesy of www.amazingcounters.com and sponsors below:

Web Site Hit Counters
Dell Computer

Notebook Computer
Notebook Computer