
Preprocess data.frame for generating a nowcast
Source:R/preprocess_for_nowcast.R
preprocess_for_nowcast.Rd
Function that takes a data frame with true_date
and report_date
and generates all
possible combinations of true_dates and report_dates observable controlling by the covariates
specified in ...
Usage
preprocess_for_nowcast(
.disease_data,
true_date,
report_date,
strata = NULL,
now,
units,
max_delay = Inf,
data_type = c("auto", "linelist", "count"),
verbose = TRUE
)
Arguments
- .disease_data
A time series of reporting data in aggregated line list format such that each row has a column for onset date, report date, and (optionally) strata
- true_date
In quotations, the name of the column of datatype
Date
designating the date of case onset. e.g. "onset_week"- report_date
In quotations, the name of the column of datatype
Date
designating the date of case report. e.g. "report_week"- strata
Character vector of names of the strata included in the data.
- now
An object of datatype
Date
indicating the date at which to perlform the nowcast.- units
Time scale of reporting. Options: "1 day", "1 week".
- max_delay
Maximum possible delay observed or considered for estimation of the delay distribution (numeric). Default:
Inf
- data_type
Either
linedata
if each row represents a test orcounts
if there is a column namedn
with counts of how many tests had that onset and report dates- verbose
Boolean. Whether to print the data type assumptions.
Value
A data.frame
with all possible counts for all delay-onset combinations.
The new column with the counts is named n
. Additional columns .tval
and .delay
are added where .tval
codifies the dates as numbers (starting at 0) and delay
codifies the difference between onset and report.
Examples
data(denguedat)
# Get counts by onset date and report week consider all possible delays
preprocess_for_nowcast(denguedat, "onset_week", "report_week",
units = "weeks", now = as.Date("1990-03-05")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 55 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 3 1 0 1990-01-01 1990-01-01
#> 2 24 1 1 1990-01-01 1990-01-08
#> 3 23 1 2 1990-01-01 1990-01-15
#> 4 8 1 3 1990-01-01 1990-01-22
#> 5 1 1 4 1990-01-01 1990-01-29
#> 6 0 1 5 1990-01-01 1990-02-05
#> 7 1 1 6 1990-01-01 1990-02-12
#> 8 0 1 7 1990-01-01 1990-02-19
#> 9 0 1 8 1990-01-01 1990-02-26
#> 10 1 1 9 1990-01-01 1990-03-05
#> # ℹ 45 more rows
# Complete one date when there was no onset week
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-03")),
report_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 7 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 1 1 0 1994-09-19 1994-09-19
#> 2 0 1 1 1994-09-19 1994-09-26
#> 3 0 2 0 1994-09-26 1994-09-26
#> 4 0 2 1 1994-09-26 1994-10-03
#> 5 1 3 0 1994-10-03 1994-10-03
#> 6 2 3 1 1994-10-03 1994-10-10
#> 7 0 4 0 1994-10-10 1994-10-10
# Complete one date when there was no report of delay 3 mostly
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
report_week = as.Date(c("1994-10-10", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 10 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 0 1 0 1994-09-19 1994-09-19
#> 2 0 1 1 1994-09-19 1994-09-26
#> 3 0 1 2 1994-09-19 1994-10-03
#> 4 1 1 3 1994-09-19 1994-10-10
#> 5 0 2 0 1994-09-26 1994-09-26
#> 6 0 2 1 1994-09-26 1994-10-03
#> 7 0 2 2 1994-09-26 1994-10-10
#> 8 1 3 0 1994-10-03 1994-10-03
#> 9 1 3 1 1994-10-03 1994-10-10
#> 10 1 4 0 1994-10-10 1994-10-10
# Get counts by onset date and report week stratifying by gender and state
df <- data.frame(
onset_week = sample(as.Date(c("1994-09-19", "1994-10-03", "1994-10-10")), 100, replace = TRUE),
gender = sample(c("Male", "Female"), 100, replace = TRUE),
state = sample(c("A", "B", "C", "D"), prob = c(0.5, 0.2, 0.2, 0.1), size = 100, replace = TRUE)
)
df$report_week <- df$onset_week +
sample(c(lubridate::weeks(1), lubridate::weeks(2)), 100, replace = TRUE)
preprocess_for_nowcast(df, "onset_week", "report_week", c("gender", "state"),
units = "weeks",
now = as.Date("1994-09-26")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 8 × 7
#> n .tval .delay onset_week gender state report_week
#> <int> <dbl> <dbl> <date> <chr> <chr> <date>
#> 1 2 1 1 1994-09-19 Female B 1994-09-26
#> 2 4 1 1 1994-09-19 Female A 1994-09-26
#> 3 0 1 1 1994-09-19 Female D 1994-09-26
#> 4 2 1 1 1994-09-19 Female C 1994-09-26
#> 5 3 1 1 1994-09-19 Male B 1994-09-26
#> 6 3 1 1 1994-09-19 Male A 1994-09-26
#> 7 1 1 1 1994-09-19 Male D 1994-09-26
#> 8 2 1 1 1994-09-19 Male C 1994-09-26