Analytic perspective
The Bradford Hill considerations on causality: a counterfactual perspective
Michael Höfler
Correspondence: Michael Höfler [email protected]
Author Affiliations
Clinical Psychology and Epidemiology, Max Planck Institute of Psychiatry, Kraepelinstrasse 2-10, 80804 München, Germany
Emerging Themes in Epidemiology 2005, 2:11 doi:10.1186/1742-7622-2-11
The electronic version of this article is the complete one and can be found online at:http://www.ete-online.com/content/2/1/11
Received: 10 June 2005
Accepted: 3 November 2005
Published: 3 November 2005
© 2005 Höfler; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Abstract
Bradford Hill's considerations published in 1965 had an enormous influence on attempts to separate causal from non-causal explanations
of observed associations. These considerations were often applied as a checklist of criteria, although they were by no means intended to
be used in this way by Hill himself. Hill, however, avoided defining explicitly what he meant by "causal effect".
This paper provides a fresh point of view on Hill's considerations from the perspective of counterfactual causality. I argue that
counterfactual arguments strongly contribute to the question of when to apply the Hill considerations. Some of the considerations,
however, involve many counterfactuals in a broader causal system, and their heuristic value decreases as the complexity of a system
increases; the danger of misapplying them can be high. The impacts of these insights for study design and data analysis are discussed.
The key analysis tool to assess the applicability of Hill's considerations is multiple bias modelling (Bayesian methods and Monte Carlo
sensitivity analysis); these methods should be used much more frequently.
Introduction
Sir Austin Bradford Hill (1897 – 1991) was an outstanding pioneer in medical statistics and epidemiology [1-4]. His summary of a
lecture entitled "The environment and disease: Association or causation" [5] had an enormous impact on epidemiologists and medical
researchers. Ironically, this paper became famous for something it was by no means intended to be [6,7]: a checklist for causal criteria
(e.g. [8-10]).
Hill [5] provided nine considerations for assessing whether an observed association involved a causal component or not. These
considerations were influenced by others before him [11,12]. He avoided defining explicitly what he meant by a causal effect, although
seemingly he had the counterfactual conceptualisation mind. My core thesis in this paper is that counterfactual arguments contribute
much to the question of when to apply Hill's considerations to a specific causal question. This is not to say that other conceptualisations
of causality would not contribute to clarifying Hill's considerations, but the counterfactual model is the one that directly relates to many
statistical methods [13,14], and it links the "metaphysical" side of causality to epidemiological practice. Moreover, I shall argue that
some of Hill's considerations involve many counterfactuals in a broader causal system, and that the heuristic value of these
considerations can be low.
Analysis
Counterfactual causality
Hill [5] avoided defining exactly what he meant with a causal effect:
"I have no wish, nor the skill to embark upon a philosophical discussion of the meaning of 'causation'."
However, it seems that he applied the counterfactual model because he then writes:
mailto:
[email protected]http://www.ete-online.com/content/2/1/11
http://www.ete-online.com/content/2/1/11
http://creativecommons.org/licenses/by/2.0
http://www.ete-online.com/content/2/1/11#B1
http://www.ete-online.com/content/2/1/11#B4
http://www.ete-online.com/content/2/1/11#B5
http://www.ete-online.com/content/2/1/11#B6
http://www.ete-online.com/content/2/1/11#B7
http://www.ete-online.com/content/2/1/11#B8
http://www.ete-online.com/content/2/1/11#B10
http://www.ete-online.com/content/2/1/11#B5
http://www.ete-online.com/content/2/1/11#B11
http://www.ete-online.com/content/2/1/11#B12
http://www.ete-online.com/content/2/1/11#B13
http://www.ete-online.com/content/2/1/11#B14
http://www.ete-online.com/content/2/1/11#B5
"... the decisive question is whether the frequency of the undesirable event B would be influenced by a change in the environmental
feature A."
Counterfactual causality dates back at least to the 18
th
century Scottish philosopher David Hume[15] but only became standard in
epidemiology from the 1980s. Being the inventor of randomised clinical trials [3,4], Hill was strongly influenced by the idea of
randomised group assignment, which precludes confounding. The idea of randomisation, invented by R.A. Fisher's in the 1920s and
1930s was, in turn, stimulated by Hume [3]. As Fisher and Hill were friends for at least some years[3], it seems likely that Hill was
strongly influenced by counterfactual thinking. Today, the counterfactual, or potential outcome, model of causality has become more or
less standard in epidemiology, and it has been argued that counterfactual causality captures most aspects of causality in health
sciences [13,14].
To define a counterfactual effect, imagine an individual i at a fixed time. Principally we assume that
(a) this individual could be assigned to both exposure levels we aim to compare (X = 0 and X = 1 respectively) and
(b) that the outcome Yi exists under both exposure levels (denoted by Yi0 and Yi1 respectively) [[14] and references therein].
The causal effect of X = 1 versus X = 0 within an individual i at the time of treatment or exposure assignment can be defined as [13-20]:
Yi1 - Yi0.
Note that the use of the difference measure is not exclusive – for strictly positive outcomes one can also use the ratio measure Yi1/Yi0.
For a binary outcome, this definition means that the outcome event occurs under one exposure level but not under the other. Therefore, a
causal effect of a binary event is a necessary condition without which the event would not have occurred; it is not necessarily a sufficient
condition. Clearly, the outcome is not observable under at least one of the two exposure levels of interest. Thus, the outcome has to be
estimated under the unobserved or counterfactual condition, known as the counterfactual or potential outcome.
According to Rothman [21], a comprehensive causal mechanism is defined as a set of factors that are jointly sufficient to induce a binary
outcome event, and that are minimally sufficient; that is, under the omission of just one factor the outcome would change. Rothman [21]
called this thesufficient -component cause model. A similar idea can be found in an earlier paper by Lewis [22]. Since several causal
mechanisms are in line with the same specific counterfactual difference for a fixed individual at a fixed time, the sufficient-component
cause model can be regarded as a finer version of the counterfactual model [[14], and references therein].
As there are often no objective criteria to determine individual counterfactual outcomes, the best option is usually to estimate population
average effects. The population average effect is defined as the average of individual causal effects over all individuals in the target
population on whom inference is to be made. The estimation of average causal effects in epidemiology is subject to various biases [23].
These biases are determined both by the study design and the mechanism that generates the data. In a randomised controlled trial (RCT),
bias due to confounding cannot occur, but confounders might be distributed unequally across treatment levels by chance, especially in
small samples. If compliance is perfect, there is no measurement error in the treatment. Other biases might still occur, however, such as
bias due to measurement error in the outcome and selection bias (because individuals in the RCT might not represent all individuals in
the target population). Observational studies are prone to all kinds of biases, and these depend on the causal mechanism underlying the
data. For instance, bias due to confounding is determined by the factors that affect both exposure and outcome, and the distribution of
these factors.
I shall demonstrate that most of Hill's considerations involve more than the X – Y association and biases in that association; their
application depends on assumptions about a comprehensive causal system, of which the X – Y effect is just one component. I argue that
the heuristic value of Hill's considerations converges to zero as the complexity of a causal system and the uncertainty about the true
causal system increase.
The Bradford Hill considerations
The discussion of Hill's considerations is organised as follows: first, I use my own wording (in italics) to summarise the respective
consideration. Hill's own argumentation is then briefly reviewed, followed by arguments of other authors (a subjective selection from the
vast literature). I will then show which counterfactuals are involved in the application of a given consideration and what novel insights
can be derived for the interpretation of study design and data analysis. To simplify the discussions, I will sometimes disregard random
variation. Some of my arguments apply to several of Hill's considerations and I will occasionally not repeat them to avoid redundancy.
1. Strength of association
A strong association is more likely to have a causal component than is a modest association
Hill [5] illustrated this point with the high risk ratios for the association between exposure levels of smoking and incidence of lung
cancer. However, he demonstrated with two counter-examples that the absence of a strong association does not rule out a causal effect.
Hill acknowledged that the impression of strength of association depended on the index used for the magnitude of association[5].
http://www.ete-online.com/content/2/1/11#B15
http://www.ete-online.com/content/2/1/11#B3
http://www.ete-online.com/content/2/1/11#B4
http://www.ete-online.com/content/2/1/11#B3
http://www.ete-online.com/content/2/1/11#B3
http://www.ete-online.com/content/2/1/11#B13
http://www.ete-online.com/content/2/1/11#B14
http://www.ete-online.com/content/2/1/11#B14
http://www.ete-online.com/content/2/1/11#B13
http://www.ete-online.com/content/2/1/11#B20
http://www.ete-online.com/content/2/1/11#B21
http://www.ete-online.com/content/2/1/11#B21
http://www.ete-online.com/content/2/1/11#B22
http://www.ete-online.com/content/2/1/11#B14
http://www.ete-online.com/content/2/1/11#B23
http://www.ete-online.com/content/2/1/11#B5
http://www.ete-online.com/content/2/1/11#B5
Rothman and Greenland [[18], p.24] provided counter-examples for strong but non-causal relationships. Note that, unlike ratio measures,
difference measures tend to be small unless there is nearly a one-to-one association between exposure and outcome [24]. The
fundamental problem with the choice of an effect measure is that "neither relative risk nor any other measure of association is a
biologically consistent feature of an association... it is a characteristic of a study population that depends on the relative prevalence of
other causes" [[18], p.24]. Rothman and Poole [25] described how studies should be designed to detect weak effects. For Rothman and
Greenland [[18], p. 25] the benefit of the consideration on strength was that strong associations could not be solely due to small biases,
whether through modest confounding or other sources of bias.
The consideration on strength involves two main counterfactual questions about biases that have presumably produced the observed
association (in terms of a pre-specified index): how strong would the association be expected to be as compared to the observed
association if the data were free of bias? Would the interval estimate that properly accounts for not only random, but also systematic
error (uncertainty about bias parameters such as misclassification probabilities) allow for the desired conclusion or not? (A desired
conclusion might be the simple existence of a causal effect or a causal effect of at least a certain magnitude, for instance a two-fold
increase in risk.)
Bias can be addressed with multiple bias modelling. Contemporary methods for multiple bias modelling include Bayesian methods and
Monte Carlo sensitivity analysis (MCSA, which can be modified to be approximatively interpretable in a Bayesian manner under certain
conditions [27]). These methods address the uncertainties about bias parameters by assigning prior distributions to them. Although Hill
pointed out that non-random error was often under-estimated, these methods were hardly available in his time. With Bayesian and
MCSA methods one can assess whether the observed magnitude of association is sufficiently high to allow for a certain conclusion. This
requires a bias model including assumptions on which kinds of biases exist, how biases act together and which priors should be used for
them. However, if one uses a bias model that addresses understood bias, the inference could still be distorted by misunderstood or
unknown bias. Moreover, one can calculate which X – Y association and random error values would have allowed for the desired
conclusion. One may also ask which priors on bias parameters would, if applied, have allowed for the desired conclusion ("reverse
Bayesian analysis") and assess the probability that these priors are not counterfactual.
Clearly, high uncertainty about bias parameters requires larger associations than modest uncertainty does. In studies that control for
several sources of bias, modest associations might still be indicative of a causal effect, whereas in more error-prone designs more bias
and higher non-random error has to be taken into account. However, specifying a bias model can be a difficult task if the knowledge
about biases is also limited. This uncertainty then carries over to the uncertainty on applying the consideration on strength. Despite this,
there seems to be no alternative to multiple bias modelling when assessing which magnitude of association is necessary for the desired
conclusion.
2. Consistency
A relationship is observed repeatedly
For Hill [5], the repeated observation of an association included "different persons, places, circumstances and time". The benefit of this
rule was that consistently finding an association with different study designs (e.g. in both retrospective and prospective studies) reduced
the probability that an association would be due to a "constant error or fallacy" in the same study design. On the other hand, he pointed
out that shared flaws in different studies would tend to replicate the same wrong conclusion. Likewise, differing results in different
investigations might indicate that some studies correctly showed a causal relationship, whereas others failed to identify it.
This point is explained by Rothman and Greenland [[18], p. 25]: causal agents might require that another condition was present; for
instance, transfusion could lead to infection with the human immunodeficiency virus only if the virus was present. Now, according to the
sufficient-component cause model [20,21], and as stated by Rothman and Greenland [[18], p. 25], whether and to what extent there is a
causal effect on average depends on the prevalences of complementary causal factors.
Cox and Wermuth [[28], pp. 225] have added the consideration that an association that does not vary strongly across the values of
intrinsic variables would be more likely to be causal. If an association were similar across individuals with different immutable
properties, such as sex and birth date, the association would be more likely to have a stable substantive interpretation. Variables other
than X and Y might change as a consequence of interventions among other factors in a comprehensive causal system. One should be
careful when applying this guideline; effect heterogeneity depends on the choice of the effect measure. This choice should be based on a
relevant substantive theory and on correspondence with the counterfactual and sufficient-component cause model (the latter two
indicating that differences rather than ratios should be used); both may, however, contradict [29].
From the counterfactual perspective, the following questions arise when asking whether to apply the consideration on consistency:
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B24
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B25
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B27
http://www.ete-online.com/content/2/1/11#B5
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B20
http://www.ete-online.com/content/2/1/11#B21
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B28
http://www.ete-online.com/content/2/1/11#B29
a) If the causal effect was truly the same in all studies, would one expect to observe different associations in different studies (possibly
involving different persons, places, circumstances and time)? To what degree would the associations be expected to differ?
b) If the causal effect varied across the studies, would one expect to observe equal or different associations? What magnitude of
differences would one expect?
Note that in the presence of effect modifiers there exists no such thing as "the causal effect", the effect modifiers need to be fixed at
suitable values. Also note that only a) or b) is actually counterfactual depending on whether the effect truly varies across the different
studies or not. Answering these questions requires a comprehensive causal theory that indicates how different entities (individual factors,
setting, time, etc.) act together in causing Y. Within such a causal system one can predict how the X – Y association should change if one
used different persons, places, circumstances and times in different studies. As one can only observe associations this also involves bias,
and bias might operate differently in different studies.
An observed pattern of association across the different studies that is in line with the expected pattern would provide evidence for an
effect of X on Y if the underlying causal theory applies. Another pattern would indicate that there is either no effect of X on Y or that the
supposed theory is false. In complex situations and bias-prone designs, the probability might be substantial that a causal theory does not
include important features that change the expected X – Y association. Here, the uncertainty regarding whether or not to demand an
association (or which magnitude of association) could be high, and so the consistency consideration might bring more harm than benefit.
3. Specificity
A factor influences specifically a particular outcome or population
For Hill [5], if one observed an association that was specific for an outcome or group of individuals, this was a strong argument for a
causal effect. In the absence of specificity, Hill alludes to fallacies in applying this rule to conclude the absence of a causal effect:
Diseases may have more than one cause (which Hill considered to be the predominant case). In turn, a factor might cause several
diseases. According to Hill, the value of this rule lay in its combination with the strength of an association: For instance, among smokers,
the risk of death from lung cancer should be elevated to a higher degree as compared to the risk of other causes of death. Hill's
consideration on specificity for persons apparently contradicts his consideration on consistency, where repeatedly observing an
association in different populations would increase the evidence for a causal effect.
Rothman and Greenland [[18], p. 25] have argued that when applying this rule one assumes that a cause had one single effect. This
assumption is often meaningless; for instance, smoking has effects on several diseases. They considered the demanding of specificity
"useless and misleading". Cox and Wermuth [[28], p. 226f.] pointed out that this rule applies to systems where quite specific processes
act, rather than to systems where the variables involved represent aggregates of many characteristics. Weiss [30] has mentioned a
situation in which it can be meaningful to require specificity with respect to an outcome: a theory could predict that an exposure affects a
certain outcome, but does not affect other particular outcomes. He illustrated this with the example that wearing helmets should be
protective specifically against head injury, not against injury of other parts of the body. If wearing helmets also protected against other
injuries, so he argues, this could be indicative that the association is confounded by more careful riders tending to use helmets. He
provided similar arguments for specificity of exposure and specificity with regard to individuals in whom a theory predicts an effect.
Counterfactual causality, and the logically equivalent causal graphs [19,30], generalise the argument of Weiss [30] and solve the problem
of specificity with respect to other exposures and outcomes: outcomes other than the one under consideration (Y) must be related with the
exposure (X) if they are either part of the causal chain between X and Y or a causal consequence of Y. Otherwise, they must not be
associated with X. In the example above, other injuries are neither part of the causal chain between wearing helmets and head injury, nor
the causal consequence of head injury; the association between wearing helmets and head injury might share the common cause of
carefulness if wearing helmets were related to both head and other injury. Likewise, exposures other than X must be associated with Y if
they belong to the causal chain between X andY. Whether exposures that occur before X are associated with Y or not is not informative
about causality between X and Y.
When applying this consideration, a bias model is required for each association in the entire causal system, involving the assessment of
as many counterfactual differences as there are associations. A single wrong conclusion about the existence of a particular effect might
still yield a graph that contradicts the theory and, thus, a wrong conclusion about the existence of the X – Y effect. RCTs only allow a
small number of factors to be simultaneously randomised, and whether they are associated with one another is just a question of how the
randomisation is done. The assessment of specificity with respect to other factors in RCTs is therefore limited. Cohort studies are more
useful here, but confounding and measurement error in the exposure are of higher importance. To conclude, the consideration of
specificity appears to be useful only when a causal system is simple and the knowledge about it is largely certain.
http://www.ete-online.com/content/2/1/11#B5
http://www.ete-online.com/content/2/1/11#B18
http://www.ete-online.com/content/2/1/11#B28
http://www.ete-online.com/content/2/1/11#B30
http://www.ete-online.com/content/2/1/11#B19
http://www.ete-online.com/content/2/1/11#B30
http://www.ete-online.com/content/2/1/11#B30
4. Temporality
The factor must precede the outcome it is assumed to affect
Hill [5] introduced this reflection with the proverb "Which is the cart and which is the horse?" For instance, he asked whether a particular
diet triggered a certain disease or whether the disease led to subsequently altered dietary habits. According to Hill, temporal direction
might be difficult to establish if a disease developed slowly and initial forms of disease were difficult to measure.
Considering individuals in whom X has occurred before Y, it is logically not possible that X would have changed if Y had changed,
because X is fixed at the time when Y occurs (or not). Thus, Ycannot have caused X in these individuals. This is, indeed, the only sine qua
non criterion for a counterfactual effect in a single individual [[7], p. 27, 11] – a point missed by Hill. Note that there is no logical link to
individuals in whom Y has occurred before X. Among those, Y might or might not have caused X [[7], p. 25].
Even more confusion in applying this criterion arises when one aggregates information across several individuals. Some researchers
believe that an association that is only observed in one direction is more likely to be causal than an association that is observed in both
directions. If the presence of a certain disease (X) is associated with a higher subsequent incidence rate of another disease (Y), and if
prior Y also predicts an elevated probability of subsequent onset of X, it is sometimes assumed that a shared vulnerability was the
common cause of both associations. This possibility has, for instance, been discussed for the role of anxiety in the development of
depression [31].
Applying this argument, however, requires a causal system that produces no Y – X association. This requires sufficient knowledge about
shared risk factors of X and Y. To assess temporal order between X and Y, a longitudinal design is preferable to a design in which the
temporal direction between X and Y has to be assessed retrospectively. In RCTs, temporal direction can be established without error.
5. Biological gradient
The outcome increases monotonically with increasing dose of exposure or according to a function predicted
by a substantive theory
Hill [5] favoured linear relationships between exposure level and outcome, for instance, between the number of cigarettes smoked per
day and the death rate from cancer. If the shape of the dose-response relationship were a more complex, especially a non-monotonic,
function, this would require a more complex substantive explanation.
Others have been less demanding and more specific in their definition of a dose-response relation, requiring only a particular shape of
relationship (not necessarily linear or monotonic), which is predicted from a substantive theory [[28], p. 225]. Rothman and Greenland
[[18], p. 26] have argued that parts of J-shaped dose-response curves might be caused by the respective exposure levels while others
might be due to confounding only. They also provided a counter-example for a non-causal dose-response association. To demand a dose-
response relationship could be misleading if such an assumption contradicted substantive knowledge. No dose-response relationship in
presumably causal effects has been found, for example, between the intake of inhaled corticosteroids and lung function among asthma...