Skip to contents

Estimates the potential impact fraction, pif, for individual-level exposure data (and covariates), X, from a cross-sectional survey. Exposure is assumed to be associated with a relative risk function, rr, with parameter theta. A counterfactual scenario as a function of the exposure cft is assumed.

The potential impact fraction is defined Chan et al. (2023) : \[ \text{PIF} = \dfrac{\mathbb{E}\Big[RR(X;\theta)\Big] - \mathbb{E}\Big[RR\big(\text{cft}(X);\theta\big)\Big]}{\mathbb{E}\Big[RR(X;\theta)\Big]} \]

where:

  • \(X\) denotes the individual-level matrix of exposure and covariates,

  • \(\theta\) represents additional parameters of the relative risk function,

  • \(RR(X,\theta)\) denotes the relative risk of exposure (and covariates) at level \(X\) given parameters \(\theta\),

  • \(cft(X)\) denotes the counterfactual function applied to exposure and covariates,

  • \({\mathbb{E}\Big[RR(X;\theta)\Big]}\) and \({\mathbb{E}\Big[RR(\text{cft}(X);\theta)\Big]}\) denote the population average relative risk under current (observed) conditions and the relative risk under the counterfactual scenario.

  • \(\text{PIF}\) represents the potential impact fraction.

Usage

pif(
  design,
  theta,
  rr,
  cft = NULL,
  additional_theta_arguments,
  n_bootstrap_samples = NULL,
  theta_distribution = "default",
  is_paf = FALSE,
  weights = NULL,
  .options.future = list(seed = TRUE),
  ...
)

Arguments

design

(survey.design, data.frame,tibble, or svyrep.design) survey data structure. If data comes from a survey set the design with survey::svydesign(). It can also support a survey::svrepdesign() design if your survey comes with replicates. Finally, the model can also accommodate a data.frame or tibble with weights assuming simple random sampling without replacement.

theta

(vector/double) parameters of the relative risk function rr.

rr

(function/list) a relative risk function with two parameters: a data.frame called X containing the individual-level exposure and covariates, and theta (in that order). It can also be a list of several relative risk functions to apply with each function being a different modelling scenario.

cft

(function/list) a counterfactual function that takes a data.frame, X, of individual-level exposure and covariates and returns a new data.frame of individual-level counterfactual exposure and covariates. It can also be a list of several counterfactual functions to apply with each function being a different modelling scenario.

additional_theta_arguments

any additional information on theta utilized for obtaining bootstrap samples from the paramter. Options are:

  • (double) the variance of theta if theta is one dimensional and asymptotical normality is assumed (default).

  • (vector) the variances of each entry of theta if theta is n-dimensional and its entries are uncorrelated and asymptotical normality is assumed (default).

  • (matrix) the variance-covariance matrix of theta if theta is n-dimensional and its entries are correlated and asymptotical normality is assumed (default).

  • any list of arguments to pass via base::do.call() to theta_distribution to simulate samples from theta if theta is not assumed to be asymptotically normally distributed.

Optional

n_bootstrap_samples

(double) number of bootstrap samples. If a svyrep.design is passed as an argument, then n_bootstrap_samples represents the number of number of replicates in the design.

theta_distribution

(function) random number generator that follows the distribution of the estimator theta. By default, theta is assumed to be asymptotically normal and thus theta_distribution is set to mvtnorm::rmvnorm() with variance given by additional_theta_arguments. The number of simulations for the theta_distribution function must be parametrized by a parameter of name n.

is_paf

(boolean) Whether the function being estimated is the Population Attributable Fraction (is_paf = TRUE) or the Potential Impact Fraction (is_paf = FALSE)

weights

(vector) If you are not following the recommended version and use a svydesign object for the design you can still use weights to associate weights to your estimation. Beware that it might not give accurate estimations of the variance nor the uncertainty intervals.

.options.future

List of additional options for doFuture::%dofuture%().

...

Additional parameters for svrep::as_bootstrap_design().

Value

A pif_class() object containing the bootstrap simulations for the potential impact fraction, the average relative risk, and the average counterfactual if applicable.

Additional parallelization options

Faster computation occurs when doing parallelization which allows to use more cores in your machine. Parallelization utilizes the future::future() package. For paralelization to work you need to establish a plan (see future::plan() for more information). The most common way to create parallelization in your local machine is to do:

plan(multisession) #Start parallelization
pif(...)
plan(sequential) #Return to 'normal'

References

Chan CE, Zepeda-Tello R, Camacho-García-Formentí D, Cudhea F, Meza R, Rodrigues E, Spiegelman D, Barrientos-Gutierrez T, Zhou X (2023). “Nonparametric Estimation of the Potential Impact Fraction and Population Attributable Fraction with Individual-Level and Aggregated Data.” 2207.03597.

Examples

# Use the ensanut dataset
data(ensanut)

# EXAMPLE 1
# Setup the survey design
options(survey.lonely.psu = "adjust")
design <- survey::svydesign(data = ensanut, ids = ~1, weights = ~weight, strata = ~strata)
rr <- function(X, theta) {
  exp(-2 +
    theta[1] * X[, "age"] + theta[2] * X[, "systolic_blood_pressure"] / 100)
}
cft <- function(X) {
  X[, "systolic_blood_pressure"] <- X[, "systolic_blood_pressure"] - 5
  return(X)
}
pif(design,
  theta = log(c(1.05, 1.38)), rr, cft,
  additional_theta_arguments = c(0.01, 0.03), n_bootstrap_samples = 10,
)
#> ── Potential Impact Fraction (PIF) ─────────────────────────────────────────────
#>     counterfactual   relative_risk potential_impact_fraction
#> 1 Counterfactual_1 Relative_Risk_1              0.0116826670
#> 2 Counterfactual_1 Relative_Risk_1             -0.0001784108
#> 3 Counterfactual_1 Relative_Risk_1              0.0235437449
#>   average_relative_risk average_counterfactual           type
#> 1              49697.33               48849.14 point_estimate
#> 2            -266864.41             -262185.39     Lower 2.5%
#> 3             366259.06              359883.67    Upper 97.5%
#> ────────────────────────────────────────────────────────────────────────────────
#> • Number of bootstrap simulations: 10
#>  A low number of bootstrap simulations will result in an unstable estimate.
#> • Use `as.data.frame` to access values.
#> • Use `summary` to save list of main results.

# EXAMPLE 2
# Now do the same but using a replicate design
rep_design <- svrep::as_bootstrap_design(design, replicates = 10)
pif(rep_design,
  theta = log(c(1.05, 1.38)), rr, cft,
  additional_theta_arguments = c(0.01, 0.03)
)
#> ── Potential Impact Fraction (PIF) ─────────────────────────────────────────────
#>     counterfactual   relative_risk potential_impact_fraction
#> 1 Counterfactual_1 Relative_Risk_1               0.019710411
#> 2 Counterfactual_1 Relative_Risk_1               0.009555227
#> 3 Counterfactual_1 Relative_Risk_1               0.029865595
#>   average_relative_risk average_counterfactual           type
#> 1              480346.1               470521.3 point_estimate
#> 2            -2902961.4             -2843589.3     Lower 2.5%
#> 3             3863653.6              3784632.0    Upper 97.5%
#> ────────────────────────────────────────────────────────────────────────────────
#> • Number of bootstrap simulations: 10
#>  A low number of bootstrap simulations will result in an unstable estimate.
#> • Use `as.data.frame` to access values.
#> • Use `summary` to save list of main results.

# EXAMPLE 3
# Calculate two different relative risks
rr <- list(
 function(X, theta) {exp(theta[1] * X[, "systolic_blood_pressure"] / 100)},
 function(X, theta) {exp(theta[2] * X[, "systolic_blood_pressure"] / 100 + theta[3]* X[,"age"])}
)

# Calculate three counterfactual scenarios of SBP reduction from 1 to 3 mmhg
cft <- list(
function(X){X[, "systolic_blood_pressure"]  <- X[, "systolic_blood_pressure"]  - 1; return(X)},
function(X){X[, "systolic_blood_pressure"]  <- X[, "systolic_blood_pressure"]  - 2; return(X)},
function(X){X[, "systolic_blood_pressure"]  <- X[, "systolic_blood_pressure"]  - 3; return(X)}
)

pif(design,
  theta = log(c(1.05, 1.38, 1.21)), rr, cft,
  additional_theta_arguments = c(0.01, 0.03, 0.025),
  n_bootstrap_samples = 10,
)
#> ── Potential Impact Fraction (PIF) ─────────────────────────────────────────────
#>      counterfactual   relative_risk potential_impact_fraction
#> 1  Counterfactual_1 Relative_Risk_1              0.0006866338
#> 2  Counterfactual_2 Relative_Risk_1              0.0013718955
#> 3  Counterfactual_3 Relative_Risk_1              0.0020557870
#> 4  Counterfactual_1 Relative_Risk_2              0.0031625969
#> 5  Counterfactual_2 Relative_Risk_2              0.0063121574
#> 6  Counterfactual_3 Relative_Risk_2              0.0094487477
#> 7  Counterfactual_1 Relative_Risk_1             -0.0015423152
#> 8  Counterfactual_1 Relative_Risk_1              0.0029155829
#> 9  Counterfactual_2 Relative_Risk_1             -0.0030836996
#> 10 Counterfactual_2 Relative_Risk_1              0.0058274907
#> 11 Counterfactual_3 Relative_Risk_1             -0.0046241582
#> 12 Counterfactual_3 Relative_Risk_1              0.0087357322
#> 13 Counterfactual_1 Relative_Risk_2             -0.0009287064
#> 14 Counterfactual_1 Relative_Risk_2              0.0072539002
#> 15 Counterfactual_2 Relative_Risk_2             -0.0018366025
#> 16 Counterfactual_2 Relative_Risk_2              0.0144609172
#> 17 Counterfactual_3 Relative_Risk_2             -0.0027238747
#> 18 Counterfactual_3 Relative_Risk_2              0.0216213700
#>    average_relative_risk average_counterfactual           type
#> 1           1.092660e+00           1.091795e+00 point_estimate
#> 2           1.092660e+00           1.090931e+00 point_estimate
#> 3           1.092660e+00           1.090069e+00 point_estimate
#> 4           3.040966e+07           3.027446e+07 point_estimate
#> 5           3.040966e+07           3.013986e+07 point_estimate
#> 6           3.040966e+07           3.000587e+07 point_estimate
#> 7           8.080367e-01           8.097297e-01     Lower 2.5%
#> 8           1.377283e+00           1.373860e+00    Upper 97.5%
#> 9           8.080367e-01           8.114210e-01     Lower 2.5%
#> 10          1.377283e+00           1.370441e+00    Upper 97.5%
#> 11          8.080367e-01           8.131108e-01     Lower 2.5%
#> 12          1.377283e+00           1.367028e+00    Upper 97.5%
#> 13         -1.102794e+08          -1.097862e+08     Lower 2.5%
#> 14          1.710987e+08           1.703351e+08    Upper 97.5%
#> 15         -1.102794e+08          -1.092952e+08     Lower 2.5%
#> 16          1.710987e+08           1.695749e+08    Upper 97.5%
#> 17         -1.102794e+08          -1.088064e+08     Lower 2.5%
#> 18          1.710987e+08           1.688181e+08    Upper 97.5%
#> ────────────────────────────────────────────────────────────────────────────────
#> • Number of bootstrap simulations: 10
#>  A low number of bootstrap simulations will result in an unstable estimate.
#> • Use `as.data.frame` to access values.
#> • Use `summary` to save list of main results.