Preprocess data.frame for generating a nowcast
preprocess_for_nowcast.Rd
Function that takes a data frame with `onset_date` and `report_date` and generates all possible combinations of onset_dates and report_dates observable controlling by the covariates specified in `...`
Usage
preprocess_for_nowcast(
.disease_data,
onset_date,
report_date,
strata = NULL,
now,
units,
max_delay = Inf,
data_type = c("auto", "linelist", "count")
)
Arguments
- .disease_data
A time series of reporting data in aggregated line list format such that each row has a column for onset date, report date, and
- onset_date
In quotations, the name of the column of datatype
Date
designating the date of case onset. e.g. "onset_week"- report_date
In quotations, the name of the column of datatype
Date
designating the date of case report. e.g. "report_week"- strata
Character vector of names of the strata included in the data.
- now
An object of datatype
Date
indicating the date at which to perform the nowcast.- units
Time scale of reporting. Options: "1 day", "1 week".
- max_delay
Maximum possible delay observed or considered for estimation of the delay distribution (numeric). Default: `Inf`
- data_type
Either `linedata` if each row represents a test or `counts` if there is a column named `n` with counts of how many tests had that onset and report dates
Value
A `data.frame` with all possible counts for all delay-onset combinations. The new column with the counts is named `n`. Additional columns `.tval` and `.delay` are added where `.tval` codifies the dates as numbers (starting at 0) and delay codifies the difference between onset and report.
Examples
data(denguedat)
# Get counts by onset date and report week consider all possible delays
preprocess_for_nowcast(denguedat, "onset_week", "report_week",
units = "weeks", now = as.Date("1990-03-05")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 55 × 5
#> onset_week .delay report_week n .tval
#> <date> <dbl> <date> <int> <dbl>
#> 1 1990-01-01 0 1990-01-01 3 1
#> 2 1990-01-01 1 1990-01-08 24 1
#> 3 1990-01-01 2 1990-01-15 23 1
#> 4 1990-01-01 3 1990-01-22 8 1
#> 5 1990-01-01 4 1990-01-29 1 1
#> 6 1990-01-01 5 1990-02-05 0 1
#> 7 1990-01-01 6 1990-02-12 1 1
#> 8 1990-01-01 7 1990-02-19 0 1
#> 9 1990-01-01 8 1990-02-26 0 1
#> 10 1990-01-01 9 1990-03-05 1 1
#> # ℹ 45 more rows
# Complete one date when there was no onset week
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
report_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 7 × 5
#> onset_week .delay report_week n .tval
#> <date> <dbl> <date> <int> <dbl>
#> 1 1994-09-19 0 1994-09-19 1 1
#> 2 1994-09-19 1 1994-09-26 0 1
#> 3 1994-09-26 0 1994-09-26 0 2
#> 4 1994-09-26 1 1994-10-03 0 2
#> 5 1994-10-03 0 1994-10-03 1 3
#> 6 1994-10-03 1 1994-10-10 1 3
#> 7 1994-10-10 0 1994-10-10 1 4
# Complete one date when there was no report of delay 3 mostly
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
report_week = as.Date(c("1994-10-10", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 10 × 5
#> onset_week .delay report_week n .tval
#> <date> <dbl> <date> <int> <dbl>
#> 1 1994-09-19 0 1994-09-19 0 1
#> 2 1994-09-19 1 1994-09-26 0 1
#> 3 1994-09-19 2 1994-10-03 0 1
#> 4 1994-09-19 3 1994-10-10 1 1
#> 5 1994-09-26 0 1994-09-26 0 2
#> 6 1994-09-26 1 1994-10-03 0 2
#> 7 1994-09-26 2 1994-10-10 0 2
#> 8 1994-10-03 0 1994-10-03 1 3
#> 9 1994-10-03 1 1994-10-10 1 3
#> 10 1994-10-10 0 1994-10-10 1 4
# Get counts by onset date and report week stratifying by gender and state
df <- data.frame(
onset_week = sample(as.Date(c("1994-09-19", "1994-10-03", "1994-10-10")), 100, replace = TRUE),
gender = sample(c("Male", "Female"), 100, replace = TRUE),
state = sample(c("A", "B", "C", "D"), prob = c(0.5, 0.2, 0.2, 0.1), size = 100, replace = TRUE)
)
df$report_week <- df$onset_week +
sample(c(lubridate::weeks(1), lubridate::weeks(2)), 100, replace = TRUE)
preprocess_for_nowcast(df, "onset_week", "report_week", c("gender", "state"),
units = "weeks",
now = as.Date("1994-09-26")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 8 × 7
#> onset_week .delay gender state report_week n .tval
#> <date> <dbl> <chr> <chr> <date> <int> <dbl>
#> 1 1994-09-19 1 Male A 1994-09-26 2 1
#> 2 1994-09-19 1 Male D 1994-09-26 2 1
#> 3 1994-09-19 1 Male B 1994-09-26 0 1
#> 4 1994-09-19 1 Male C 1994-09-26 2 1
#> 5 1994-09-19 1 Female A 1994-09-26 2 1
#> 6 1994-09-19 1 Female D 1994-09-26 2 1
#> 7 1994-09-19 1 Female B 1994-09-26 3 1
#> 8 1994-09-19 1 Female C 1994-09-26 5 1