Skip to contents

To estimate a population attributable fraction two ingredients are required:

  • beta: the relative risk or a value from which to calculate the relative risk (with variance var_beta).
  • p: the exposure prevalence (with variance var_p).

and in the case of potential impact fractions we also require: - p_cft: the counterfactual prevalence.

Note An important hypothesis of the current method is that the relative risk estimate and the prevalence of exposure are both independent in the sense that they were estimated in different populations and studies.

For example consider the population attributable fraction of depression into dementia. Consider:

  • A relative risk of relative risk of 1.90 with a variance of 0.02.
  • A prevalence of depression of 12% (0.12) with variance 0.0009.

We can calculate a point estimate of the PAF as follows:

paf(p = 0.12, beta = 1.90, var_p = 0.0009, var_beta = 0.02)
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 9.747% [95% CI: 4.510% to 14.698%]
#> standard_deviation(paf %) = 2.598
#> standard_deviation(link(paf)) = 0.029

A potential impact fraction of reducing the prevalence of depression to 6% can be calculated as follows:

pif(p = 0.12, p_cft = 0.06, beta = 1.90, var_p = 0.0009, var_beta = 0.02)
#> 
#> ── Potential Impact Fraction ──
#> 
#> PIF = 4.874% [95% CI: 0.013% to 9.498%]
#> standard_deviation(pif %) = 2.419
#> standard_deviation(link(pif)) = 0.025

The following examples show how to calculate the PIF and PAF under different scenarios as well as how to combine the fractions of subpopulations and of different risk estimates.

Example 1: A dichotomous potential impact fraction

The article (1) discusses the population attributable fraction (PAF) of dementia associated with 12 different risk factors in US adults. It also estimates a potential impact fraction correspondent to a 15% reduction in the exposures of such factors.

The dataset dementiarisk contains the relative risk of dementia for different exposures as well as the prevalence of exposure for the total population and for different subpopulations (races):

data(dementiarisk)
#>            risk_factor     logrr       sdlog total hispanic asian black white
#> 1       Less education 0.4637340 0.119140710  10.7     27.1   6.4  10.6   5.5
#> 2         Hearing loss 0.6626880 0.174038430  10.8     13.1   6.9   6.5  10.6
#> 3                  TBI 0.6097656 0.090990178  17.1     10.3   6.0   9.2  20.1
#> 4         Hypertension 0.4762342 0.167874478  42.2     38.5  38.5  61.0  39.8
#> 5    Excessive alcohol 0.1655144 0.054020949   3.6      2.0   0.7   2.7   4.2
#> 6              Obesity 0.4700036 0.091750556  44.0     48.3  14.6  54.3  43.5
#> 7              Smoking 0.4637340 0.165486566   8.5      6.9   4.9  11.7   8.4
#> 8           Depression 0.6418539 0.103984905   7.4     10.7   4.3   6.6   7.2
#> 9     Social isolation 0.4510756 0.086112272  11.9     24.0   8.0  12.1  10.8
#> 10 Physical inactivity 0.3293037 0.092961816  62.8     68.6  56.6  73.2  61.3
#> 11            Diabetes 0.4317824 0.075776055  28.6     41.0  44.1  37.2  25.4
#> 12       Air pollution 0.0861777 0.009362766  22.8     44.4  55.2  41.3  17.2

Here we show how to calculate the potential impact fraction and the population attributable fraction for smoking among the 4 subpopulations (race groups) and then how to calculate the total attributable fraction.

For that purpose we will choose smoking which has a log relative risk of beta = 0.4637340 with variance of var_beta = 0.0273858 (= 0.165486566^2).

Among hispanic individuals 6.9% (p = 0.069) smoke. Hence their paf is:

paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858, 
                    rr_link = exp)
#> ! Assuming parameters `p` have no variance Use `var_p` to input their link_variances and/or covariance
paf_hispanic
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 3.912% [95% CI: 0.569% to 7.142%]
#> standard_deviation(paf %) = 1.676
#> standard_deviation(link(paf)) = 0.017

Note that the model throws a warning for us not specifying the variance of var_p. We will assume that the value was perfectly estimated and it has no associated variance thus setting var_p = 0:

paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp)
paf_hispanic
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 3.912% [95% CI: 0.569% to 7.142%]
#> standard_deviation(paf %) = 1.676
#> standard_deviation(link(paf)) = 0.017

The coef, confint, summary, and as.data.frame functions have been extended to allow the user to extract information from a paf (or pif):

as.data.frame(paf_hispanic)
#>          PAF standard_deviation      ci_low      ci_up confidence
#> 1 0.03911752         0.01676291 0.005694675 0.07141689       0.95

The other fractions can be computed in the same way noting that 4.9 of non-hispanic asians, 11.7 of non-hispanic blacks and 8.4 of non-hispanic whites smoke:

paf_asian <- paf(p = 0.049, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp)
paf_black <- paf(p = 0.117, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp)
paf_white <- paf(p = 0.084, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp)

Correlations (correlation) and covariances (covariance) can also be calculated for multiple paf objects:

covariance(paf_white, paf_hispanic, paf_asian, paf_black)
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#>              [,1]         [,2]         [,3]         [,4]
#> [1,] 0.0004025772 0.0003363365 0.0002443575 0.0005404935
#> [2,] 0.0003363365 0.0002809952 0.0002041506 0.0004515599
#> [3,] 0.0002443575 0.0002041506 0.0001483209 0.0003280704
#> [4,] 0.0005404935 0.0004515599 0.0003280704 0.0007256577

Combining subpopulations into a total

We can calculate the total population attributable fraction by combining the fractions of the subpopulations weighted by the proportion of the population. According to Wikipedia the distribution in 2020 was as follows:

pif_weights <- c("white" = 0.5784, "hispanic" = 0.1873, "black" = 0.1205, "asian" = 0.0592)

#Normalized pif_weights to sum to 1
pif_weights <- pif_weights / sum(pif_weights)

The paf_total function computes the aggregated paf of the whole population:

paf_total(paf_white, paf_hispanic, paf_black, paf_asian, pif_weights = pif_weights, sigma_pif_weights = 0)
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 4.663% [95% CI: 0.704% to 8.464%]
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> standard_deviation(paf %) = 1.979
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> ! The relative risk beta parameters for the potential impact fractions appear to be the same. If they are, set `uncorrelated_beta = FALSE`. Otherwise set `uncorrelated_beta = TRUE
#> standard_deviation(link(paf)) = 0.021

where we set sigma_pif_weights = 0 as no covariance matrix was known for the race distribution data (it does exist its just not on Wikipedia).

Why are we using the log relative risk?

A common question is why in this case we are using the log relative risk (logrr) with exponential link and not the relative risks. The reason depends on what standard deviation can be properly recovered. Recall that the classic (Wald-type) confidence intervals should have the same distance from the point estimate (in this case the RR) to the bounds. We can see that this does not happen here as for example in the case of smoking the RR is 1.59 [1.15-2.20] and:

2.20 - 1.59
#> [1] 0.61

while

1.59 - 1.15
#> [1] 0.44

The distances are not the same!

However in the log-scale they are much more similar:

log(2.20) - log(1.59)
#> [1] 0.3247233
log(1.59) - log(1.15)
#> [1] 0.3239721

which suggests that the uncertainty was estimated in the log-scale and hence we need to use the log relative risk as beta and exponentiate it (link = "exponential"). Furthermore we can go to the original paper, check the methodology and conclude that the confidence interval (and hence the variance!) was estimated using the log relative risk and then exponentiated. Hence the uncertainty should be propagated for the log risk and then exponentiated.

Mathematically the explanation of why this happens is because usually relative risks, odds ratios and hazard ratios are transformed to the log-scale to estimate the confidence intervals. You can see in Wikipedia, for example, the classic formulas for the relative risks CIs and for the odds ratios all involve a log-transform.

Example 2: A categorical population attributable fraction

The article (2) estimates the population attributable fraction of different cancers given distinct levels of alcohol consumption for the Australian population.

The alcohol dataframe included in the package contains an estimate of the intake of alcohol by sex as well as the proportion of individuals in each of the intake-categories:

data(alcohol)
#>       sex alcohol_g median_intake age_18_plus
#> 1    MALE      None           0.0        47.2
#> 2    MALE   >0-<1 g           0.8         1.4
#> 3    MALE     1-<2g           1.6         2.2
#> 4    MALE     2-<5g           3.2         8.1
#> 5    MALE    5-<10g           7.1         8.1
#> 6    MALE   10-<15g          12.6         6.9
#> 7    MALE   15-<20g          17.4         4.6
#> 8    MALE   20-<30g          24.5         7.3
#> 9    MALE   30-<40g          34.7         4.2
#> 10   MALE   40-<50g          44.2         3.2
#> 11   MALE   50-<60g          54.4         2.0
#> 12   MALE      ≥60g          85.2         4.8
#> 13 FEMALE      None           0.0        60.3
#> 14 FEMALE   >0-<1 g           0.8         2.0
#> 15 FEMALE     1-<2g           1.6         5.4
#> 16 FEMALE     2-<5g           3.9         9.1
#> 17 FEMALE    5-<10g           7.1         8.1
#> 18 FEMALE   10-<15g          12.6         6.1
#> 19 FEMALE   15-<20g          17.4         2.6
#> 20 FEMALE   20-<30g          24.5         3.4
#> 21 FEMALE   30-<40g          34.7         1.5
#> 22 FEMALE   40-<50g          44.2         0.7
#> 23 FEMALE   50-<60g          54.4         0.3
#> 24 FEMALE      ≥60g          78.1         0.5

The relative risk of alcohol varies by consumption level (algohol_g). For example in the case of rectal cancer the estimated relative risk is 1.10 (1.07–1.12) per 10g/day. We will need to compute the relative risk for each of the consumption level groups using their median intake as follows:

#Get alcohol intake for men divided by 10 cause RR is per 10g/day
alcohol_men_intake <- c(0, 0.8, 1.6, 3.2, 7.1, 12.6, 17.4, 24.5, 34.7, 44.2, 54.4, 85.2) / 10 

#Divided by 10 cause risk is per 10 g/day
relative_risk <- exp(log(1.10)*alcohol_men_intake)

#Divide by 100 to get the prevalence in decimals
prevalence    <- c(0.472, 0.014, 0.022, 0.081, 0.081, 0.069, 0.046, 0.073, 0.042, 0.032, 0.02, 0.048)

#Calculate the paf
paf(prevalence, beta = relative_risk, var_p = 0, var_b = 0)
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 13.184% [95% CI: 13.184% to 13.184%]
#> standard_deviation(paf %) = 0.000
#> standard_deviation(link(paf)) = 0.000

If we want to add uncertainty we can approximate the covariance of the relative risk as follows:

#Calculate an approximate covariance of the betas this is given by var*intake*intake 
beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975))
beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence))
for (k in 1:length(prevalence)){
  for (j in 1:length(prevalence)){
    beta_var[k,j] <- beta_sd^2*(alcohol_men_intake[j]/10)*(alcohol_men_intake[k]/10)
  }
}


#Calculate the paf
paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var)
#> 
#> ── Population Attributable Fraction ──
#> 
#> PAF = 13.184% [95% CI: 12.951% to 13.417%]
#> standard_deviation(paf %) = 0.119
#> standard_deviation(link(paf)) = 0.001

References

1.
Lee M, Whitsel E, Avery C, et al. Variation in population attributable fraction of dementia associated with potentially modifiable risk factors by race and ethnicity in the US. JAMA network open. 2022;5(7):e2219672–e2219672.
2.
Pandeya N, Wilson LF, Webb PM, et al. Cancers in australia in 2010 attributable to the consumption of alcohol. Australian and New Zealand journal of public health. 2015;39(5):408–413.