| More
on The Problem with Drawing Cause-Effect Conclusions from Observational
Studies
Our
last teaching engagement was in Framingham, Massachusetts and
reminds us of the value of observational studies to assist us
in developing risk stratification models. The Framingham Study
began in 1948 as the first prospective study of cardiovascular
disease and is important because through observations it has
identified cardiovascular disease (CVD) risk factors which can
be associated with morbidity and mortality.
But there is good
evidence that basing cause and effect conclusions from observational
studies is unreliable. Cause and effect conclusions should be
based on randomized controlled trials (RCTs) where bias, confounding
and chance have been ruled out as possible explanations for
the observed association between the intervention and the outcome.
Because there are so many observational studies published each
week and because we keep seeing health professionals inappropriately
basing treatment decisions on them, it is worthwhile summarizing
an excellent review of the literature on this topic.
The study and literature
review can be found in the reference:
Deeks JJ, Dinnes
J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M,
Altman DG. Evaluating non-randomised intervention studies. Health
Technology Assessment 2003; Vol. 7: No. 27.
Some key points from this article:
- Comparison
of results of randomized and non-randomized studies across
multiple interventions in multiple studies demonstrate that,
in the majority of cases, observational studies are not consistent
with the results of RCTs
- This study,
using meta-epidemiological techniques, demonstrates that —
- None of
the study results can be adequately adjusted for bias
in observational studies using historic and concurrent
controls
- Logistic
regression on average increases bias when applied to observational
studies
Conclusions
- Non-randomized
studies may still give seriously misleading results even when
those treated and control groups appear similar in key prognostic
factors
- Standard methods
of case-mix adjustment do not guarantee removal of bias
- Omission of
important confounding factors can explain failure of adjustment
as a substitute for randomization
- There is no
known method for reliably adjusting for confounding factors
in observational studies
Delfini
Commentary
Extreme caution is urged when considering results of
observational studies in interventions for screening, prevention
and therapy. Cause and effect conclusions should only be drawn
from RCTs.
One reason for
this is that there may be major differences in the characteristics
(prognostic factors) of individuals who choose a therapy compared
to people who do not choose that therapy. A classic example
is hormone replacement therapy after myocardial infarction (MI)
in women. Most observational studies reported that roughly twice
as many women who did not choose to take hormone replacement
therapy (HRT) had a recurrent MI compared to women who chose
to take HRT. This led people to believe — incorrectly
— that HRT caused this benefit. Later, well-done RCTs
were conducted and no such benefit was found. Why? The most
likely reason is that the observational studies were highly
prone to biases resulting from differences between the groups
which could not be eliminated even with statistical adjustments
in which researchers try to balance confounders between the
groups, such as adjusting for smoking.
Another reason
is that in observations, investigators do not “control”
all elements of the study as they do in RCTs. The end result
is that in observational studies other aspects affecting the
groups are almost certain to be different in important ways
which are likely to explain or affect the study results.
Key Point
Any difference between groups — except for what is being
studied (e.g., HRT use) — is a bias.
In the case of
HRT after MI, selection bias was present in that women who chose
to take HRT were probably more likely to be “health-conscious,”
exercise, watch their diets, etc., making them different from
the women who did not take HRT. It is also likely that there
were other differences in how the two groups experienced their
health care because in observational studies there is no formal
protocol and so there will be differences in many ways that
could affected observed outcomes such as other therapies used,
how outcomes are assessed, frequency for follow-up, and so on.
Even with statistical
adjustments for differences between potential and known prognostic
characteristics of the groups, bias cannot be reliably eliminated
because whatever is actually responsible for the outcome (i.e.,
the confounder) is what would have to be adjusted. This would
entail having advance knowledge of cause and effect (but that
is why the study is being conducted). Plus statistical adjustment
has limitations. How could every single factor that made the
HRT users different be adjusted? Humans embody an infinite number
of variables such as characteristics and exposures.
Comparisons of
RCTs and observational studies of the same interventions have
repeatedly demonstrated that even with the most meticulous statistical
adjustments, bias cannot be reliably eliminated from observational
studies. The key message is that without randomization and assurance
that interventions and assessments are the same for both study
and comparison groups, one cannot reliably draw conclusions
about cause and effect relationships. Associations between interventions
and outcomes in observational studies are very likely to be
due to bias or confounding. Therefore, observational studies
are only useful for hypothesis-generating when considering questions
of preventive, screening or therapeutic interventions.
Database
Studies
Some groups have tried to demonstrate improved health outcomes
(e.g., death, stroke, etc.) through studies of their databases.
It should be remembered that this type of study is an observational
study and prone to bias and confounding for the reasons explained
above, plus it is highly prone to chance findings of statistical
significance. Therefore, database studies may be useful for
suggesting areas for further study, but they should not be thought
of as valid studies from which cause and effect relationships
can be concluded. |