Population Attributable Fraction
paf.Rd
Estimates the population attributable fraction, paf
, for
individual-level exposure data (and covariates), X
, from a
cross-sectional survey. Exposure is assumed to be associated with
a relative risk function, rr
, with parameter theta
. A counterfactual
scenario as a function of the exposure cft
is assumed.
The population attributable fraction is defined Chan et al. (2023) as: \[ \text{PAF} = \dfrac{\mathbb{E}\Big[RR(X;\theta)\Big] - 1}{\mathbb{E}\Big[RR(X;\theta)\Big]} \]
where:
\(X\) denotes the individual-level matrix of exposure and covariates,
\(\theta\) represents additional parameters of the relative risk function,
\(RR(X,\theta)\) denotes the relative risk of exposure (and covariates) at level \(X\) given parameters \(\theta\),
\({\mathbb{E}\Big[RR(X;\theta)\Big]}\) represents the average relative risk in the population
\(1\) represents the relative risk under the theoretical minimum risk scenario,
\(\text{PAF}\) represents the population attributable fraction
Usage
paf(
design,
theta,
rr,
additional_theta_arguments,
n_bootstrap_samples = NULL,
theta_distribution = "default",
weights = NULL,
.options.future = list(seed = TRUE),
...
)
Arguments
- design
(
survey.design
,data.frame
,tibble
, orsvyrep.design
) survey data structure. If data comes from a survey set thedesign
withsurvey::svydesign()
. It can also support asurvey::svrepdesign()
design if your survey comes with replicates. Finally, the model can also accommodate adata.frame
ortibble
with weights assuming simple random sampling without replacement.- theta
(
vector
/double
) parameters of the relative risk functionrr
.- rr
(
function
/list
) a relative risk function with two parameters: adata.frame
calledX
containing the individual-level exposure and covariates, andtheta
(in that order). It can also be a list of several relative risk functions to apply with each function being a different modelling scenario.- additional_theta_arguments
any additional information on
theta
utilized for obtaining bootstrap samples from the paramter. Options are:(
double
) the variance oftheta
iftheta
is one dimensional and asymptotical normality is assumed (default).(
vector
) the variances of each entry oftheta
iftheta
is n-dimensional and its entries are uncorrelated and asymptotical normality is assumed (default).(
matrix
) the variance-covariance matrix oftheta
iftheta
is n-dimensional and its entries are correlated and asymptotical normality is assumed (default).any list of arguments to pass via
base::do.call()
totheta_distribution
to simulate samples fromtheta
iftheta
is not assumed to be asymptotically normally distributed.
Optional
- n_bootstrap_samples
(
double
) number of bootstrap samples. If asvyrep.design
is passed as an argument, thenn_bootstrap_samples
represents the number of number of replicates in the design.- theta_distribution
(
function
) random number generator that follows the distribution of the estimatortheta
. By default,theta
is assumed to be asymptotically normal and thustheta_distribution
is set tomvtnorm::rmvnorm()
with variance given byadditional_theta_arguments
. The number of simulations for thetheta_distribution
function must be parametrized by a parameter of namen
.- weights
(
vector
) If you are not following the recommended version and use asvydesign
object for the design you can still useweights
to associate weights to your estimation. Beware that it might not give accurate estimations of the variance nor the uncertainty intervals.- .options.future
List of additional options for
doFuture::%dofuture%()
.- ...
Additional parameters for
svrep::as_bootstrap_design()
.
Value
A pif_class()
object containing the bootstrap simulations for the
population attributable fraction, and the average relative risk.
Additional parallelization options
Faster computation occurs when doing parallelization which allows to use more cores in your
machine. Parallelization utilizes the future::future()
package. For paralelization to work you
need to establish a plan (see future::plan()
for more information). The most common
way to create parallelization in your local machine is to do:
References
Chan CE, Zepeda-Tello R, Camacho-García-Formentí D, Cudhea F, Meza R, Rodrigues E, Spiegelman D, Barrientos-Gutierrez T, Zhou X (2023). “Nonparametric Estimation of the Potential Impact Fraction and Population Attributable Fraction with Individual-Level and Aggregated Data.” 2207.03597.
Examples
# Use the ensanut dataset
data(ensanut)
# EXAMPLE 1
# Setup the survey design
options(survey.lonely.psu = "adjust")
design <- survey::svydesign(data = ensanut, ids = ~1, weights = ~weight, strata = ~strata)
rr <- function(X, theta) {
exp(
theta[1] * X[, "age"] + theta[2] * X[, "systolic_blood_pressure"] / 100)
}
paf(design,
theta = log(c(1.05, 1.38)), rr,
additional_theta_arguments = c(0.01, 0.03), n_bootstrap_samples = 10,
)
#> ── Population Attributable Fraction (PAF) ──────────────────────────────────────
#> counterfactual relative_risk
#> 1 Theoretical_minimum_risk_level Relative_Risk_1
#> 2 Theoretical_minimum_risk_level Relative_Risk_1
#> 3 Theoretical_minimum_risk_level Relative_Risk_1
#> population_attributable_fraction average_relative_risk average_counterfactual
#> 1 -10.15319 56724.76 1
#> 2 -71.39238 -233921.36 1
#> 3 51.08599 347370.87 1
#> type
#> 1 point_estimate
#> 2 Lower 2.5%
#> 3 Upper 97.5%
#> ────────────────────────────────────────────────────────────────────────────────
#> • Number of bootstrap simulations: 10
#> ✖ A low number of bootstrap simulations will result in an unstable estimate.
#> • Use `as.data.frame` to access values.
#> • Use `summary` to save list of main results.
# EXAMPLE 2
# Now do the same but using a replicate design
options(survey.lonely.psu = "adjust")
rep_design <- svrep::as_bootstrap_design(design, replicates = 10)
paf(rep_design,
theta = log(c(1.05, 1.38)), rr,
additional_theta_arguments = c(0.01, 0.03)
)
#> ── Population Attributable Fraction (PAF) ──────────────────────────────────────
#> counterfactual relative_risk
#> 1 Theoretical_minimum_risk_level Relative_Risk_1
#> 2 Theoretical_minimum_risk_level Relative_Risk_1
#> 3 Theoretical_minimum_risk_level Relative_Risk_1
#> population_attributable_fraction average_relative_risk average_counterfactual
#> 1 -2.58360 38075.31 1
#> 2 -23.23141 -186912.70 1
#> 3 18.06421 263063.32 1
#> type
#> 1 point_estimate
#> 2 Lower 2.5%
#> 3 Upper 97.5%
#> ────────────────────────────────────────────────────────────────────────────────
#> • Number of bootstrap simulations: 10
#> ✖ A low number of bootstrap simulations will result in an unstable estimate.
#> • Use `as.data.frame` to access values.
#> • Use `summary` to save list of main results.