
Introduction
Introduction.Rmd
To estimate a population attributable fraction two ingredients are required:
-
beta
: the relative risk or a value from which to calculate the relative risk (with variancevar_beta
). -
p
: the exposure prevalence (with variancevar_p
).
and in the case of potential impact fractions we also require: -
p_cft
: the counterfactual prevalence.
Note An important hypothesis of the current method is that the relative risk estimate and the prevalence of exposure are both independent in the sense that they were estimated in different populations and studies.
For example consider the population attributable fraction of depression into dementia. Consider:
- A relative risk of relative risk of
1.90
with a variance of0.02
. - A prevalence of depression of 12% (
0.12
) with variance0.0009
.
We can calculate a point estimate of the PAF as follows:
paf(p = 0.12, beta = 1.90, var_p = 0.0009, var_beta = 0.02)
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 9.747% [95% CI: 4.510% to 14.698%]
#> standard_deviation(paf %) = 2.598
#> standard_deviation(link(paf)) = 0.029
A potential impact fraction of reducing the prevalence of depression to 6% can be calculated as follows:
pif(p = 0.12, p_cft = 0.06, beta = 1.90, var_p = 0.0009, var_beta = 0.02)
#>
#> ── Potential Impact Fraction ──
#>
#> PIF = 4.874% [95% CI: 0.013% to 9.498%]
#> standard_deviation(pif %) = 2.419
#> standard_deviation(link(pif)) = 0.025
The following examples show how to calculate the PIF and PAF under different scenarios as well as how to combine the fractions of subpopulations and of different risk estimates.
Example 1: A dichotomous potential impact fraction
The article (1) discusses the population attributable fraction (PAF) of dementia associated with 12 different risk factors in US adults. It also estimates a potential impact fraction correspondent to a 15% reduction in the exposures of such factors.
The dataset dementiarisk
contains the relative risk of
dementia for different exposures as well as the prevalence of exposure
for the total population and for different subpopulations (races):
data(dementiarisk)
#> risk_factor logrr sdlog total hispanic asian black white
#> 1 Less education 0.4637340 0.119140710 10.7 27.1 6.4 10.6 5.5
#> 2 Hearing loss 0.6626880 0.174038430 10.8 13.1 6.9 6.5 10.6
#> 3 TBI 0.6097656 0.090990178 17.1 10.3 6.0 9.2 20.1
#> 4 Hypertension 0.4762342 0.167874478 42.2 38.5 38.5 61.0 39.8
#> 5 Excessive alcohol 0.1655144 0.054020949 3.6 2.0 0.7 2.7 4.2
#> 6 Obesity 0.4700036 0.091750556 44.0 48.3 14.6 54.3 43.5
#> 7 Smoking 0.4637340 0.165486566 8.5 6.9 4.9 11.7 8.4
#> 8 Depression 0.6418539 0.103984905 7.4 10.7 4.3 6.6 7.2
#> 9 Social isolation 0.4510756 0.086112272 11.9 24.0 8.0 12.1 10.8
#> 10 Physical inactivity 0.3293037 0.092961816 62.8 68.6 56.6 73.2 61.3
#> 11 Diabetes 0.4317824 0.075776055 28.6 41.0 44.1 37.2 25.4
#> 12 Air pollution 0.0861777 0.009362766 22.8 44.4 55.2 41.3 17.2
Here we show how to calculate the potential impact fraction and the population attributable fraction for smoking among the 4 subpopulations (race groups) and then how to calculate the total attributable fraction.
For that purpose we will choose smoking which has a log relative risk
of beta = 0.4637340
with variance of
var_beta = 0.0273858
(= 0.165486566^2
).
Among hispanic individuals 6.9% (p = 0.069
) smoke. Hence
their paf is:
paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858,
rr_link = exp)
#> ! Assuming parameters `p` have no variance Use `var_p` to input their link_variances and/or covariance
paf_hispanic
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 3.912% [95% CI: 0.569% to 7.142%]
#> standard_deviation(paf %) = 1.676
#> standard_deviation(link(paf)) = 0.017
Note that the model throws a warning for us not specifying the
variance of var_p
. We will assume that the value was
perfectly estimated and it has no associated variance thus setting
var_p = 0
:
paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858,
var_p = 0, rr_link = exp)
paf_hispanic
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 3.912% [95% CI: 0.569% to 7.142%]
#> standard_deviation(paf %) = 1.676
#> standard_deviation(link(paf)) = 0.017
The coef
, confint
, summary
,
and as.data.frame
functions have been extended to allow the
user to extract information from a paf
(or
pif
):
as.data.frame(paf_hispanic)
#> PAF standard_deviation ci_low ci_up confidence
#> 1 0.03911752 0.01676291 0.005694675 0.07141689 0.95
The other fractions can be computed in the same way noting that 4.9 of non-hispanic asians, 11.7 of non-hispanic blacks and 8.4 of non-hispanic whites smoke:
paf_asian <- paf(p = 0.049, beta = 0.4637340, var_beta = 0.0273858,
var_p = 0, rr_link = exp)
paf_black <- paf(p = 0.117, beta = 0.4637340, var_beta = 0.0273858,
var_p = 0, rr_link = exp)
paf_white <- paf(p = 0.084, beta = 0.4637340, var_beta = 0.0273858,
var_p = 0, rr_link = exp)
Correlations (correlation
) and covariances
(covariance
) can also be calculated for multiple
paf
objects:
covariance(paf_white, paf_hispanic, paf_asian, paf_black)
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> [,1] [,2] [,3] [,4]
#> [1,] 0.0004025772 0.0003363365 0.0002443575 0.0005404935
#> [2,] 0.0003363365 0.0002809952 0.0002041506 0.0004515599
#> [3,] 0.0002443575 0.0002041506 0.0001483209 0.0003280704
#> [4,] 0.0005404935 0.0004515599 0.0003280704 0.0007256577
Combining subpopulations into a total
We can calculate the total population attributable fraction by combining the fractions of the subpopulations weighted by the proportion of the population. According to Wikipedia the distribution in 2020 was as follows:
pif_weights <- c("white" = 0.5784, "hispanic" = 0.1873, "black" = 0.1205, "asian" = 0.0592)
#Normalized pif_weights to sum to 1
pif_weights <- pif_weights / sum(pif_weights)
The paf_total
function computes the aggregated
paf
of the whole population:
paf_total(paf_white, paf_hispanic, paf_black, paf_asian, pif_weights = pif_weights, sigma_pif_weights = 0)
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 4.663% [95% CI: 0.704% to 8.464%]
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> standard_deviation(paf %) = 1.979
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> standard_deviation(link(paf)) = 0.021
where we set sigma_pif_weights = 0
as no covariance
matrix was known for the race distribution data (it does exist its just
not on Wikipedia).
Why are we using the log relative risk?
A common question is why in this case we are using the log relative
risk (logrr
) with exponential
link and not the
relative risks. The reason depends on what standard deviation can be
properly recovered. Recall that the classic (Wald-type) confidence
intervals should have the same distance from the point estimate (in this
case the RR
) to the bounds. We can see that this does not
happen here as for example in the case of smoking the
RR is 1.59 [1.15-2.20] and:
2.20 - 1.59
#> [1] 0.61
while
1.59 - 1.15
#> [1] 0.44
The distances are not the same!
However in the log-scale they are much more similar:
which suggests that the uncertainty was estimated in the log-scale
and hence we need to use the log relative risk as beta and exponentiate
it (link = "exponential"
). Furthermore we can go to the
original paper, check the methodology and conclude that the confidence
interval (and hence the variance!) was estimated using the log relative
risk and then exponentiated. Hence the uncertainty should be propagated
for the log risk and then exponentiated.
Mathematically the explanation of why this happens is because usually relative risks, odds ratios and hazard ratios are transformed to the log-scale to estimate the confidence intervals. You can see in Wikipedia, for example, the classic formulas for the relative risks CIs and for the odds ratios all involve a log-transform.
Example 2: A categorical population attributable fraction
The article (2) estimates the population attributable fraction of different cancers given distinct levels of alcohol consumption for the Australian population.
The alcohol
dataframe included in the package contains
an estimate of the intake of alcohol by sex as well as the proportion of
individuals in each of the intake-categories:
data(alcohol)
#> sex alcohol_g median_intake age_18_plus
#> 1 MALE None 0.0 47.2
#> 2 MALE >0-<1 g 0.8 1.4
#> 3 MALE 1-<2g 1.6 2.2
#> 4 MALE 2-<5g 3.2 8.1
#> 5 MALE 5-<10g 7.1 8.1
#> 6 MALE 10-<15g 12.6 6.9
#> 7 MALE 15-<20g 17.4 4.6
#> 8 MALE 20-<30g 24.5 7.3
#> 9 MALE 30-<40g 34.7 4.2
#> 10 MALE 40-<50g 44.2 3.2
#> 11 MALE 50-<60g 54.4 2.0
#> 12 MALE ≥60g 85.2 4.8
#> 13 FEMALE None 0.0 60.3
#> 14 FEMALE >0-<1 g 0.8 2.0
#> 15 FEMALE 1-<2g 1.6 5.4
#> 16 FEMALE 2-<5g 3.9 9.1
#> 17 FEMALE 5-<10g 7.1 8.1
#> 18 FEMALE 10-<15g 12.6 6.1
#> 19 FEMALE 15-<20g 17.4 2.6
#> 20 FEMALE 20-<30g 24.5 3.4
#> 21 FEMALE 30-<40g 34.7 1.5
#> 22 FEMALE 40-<50g 44.2 0.7
#> 23 FEMALE 50-<60g 54.4 0.3
#> 24 FEMALE ≥60g 78.1 0.5
The relative risk of alcohol varies by consumption level
(algohol_g
). For example in the case of rectal cancer the
estimated relative risk is 1.10 (1.07–1.12) per 10g/day. We will need to
compute the relative risk for each of the consumption level groups using
their median intake as follows:
#Get alcohol intake for men divided by 10 cause RR is per 10g/day
alcohol_men_intake <- c(0, 0.8, 1.6, 3.2, 7.1, 12.6, 17.4, 24.5, 34.7, 44.2, 54.4, 85.2) / 10
#Divided by 10 cause risk is per 10 g/day
relative_risk <- exp(log(1.10)*alcohol_men_intake)
#Divide by 100 to get the prevalence in decimals
prevalence <- c(0.472, 0.014, 0.022, 0.081, 0.081, 0.069, 0.046, 0.073, 0.042, 0.032, 0.02, 0.048)
#Calculate the paf
paf(prevalence, beta = relative_risk, var_p = 0, var_b = 0)
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 13.184% [95% CI: 13.184% to 13.184%]
#> standard_deviation(paf %) = 0.000
#> standard_deviation(link(paf)) = 0.000
If we want to add uncertainty we can approximate the covariance of the relative risk as follows:
#Calculate an approximate covariance of the betas this is given by var*intake*intake
beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975))
beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence))
for (k in 1:length(prevalence)){
for (j in 1:length(prevalence)){
beta_var[k,j] <- beta_sd^2*(alcohol_men_intake[j]/10)*(alcohol_men_intake[k]/10)
}
}
#Calculate the paf
paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var)
#>
#> ── Population Attributable Fraction ──
#>
#> PAF = 13.184% [95% CI: 12.951% to 13.417%]
#> standard_deviation(paf %) = 0.119
#> standard_deviation(link(paf)) = 0.001