
Preprocess data.frame for generating a nowcast
Source:R/preprocess_for_nowcast.R
preprocess_for_nowcast.RdFunction that takes a data frame with true_date and report_date and generates all
possible combinations of true_dates and report_dates observable controlling by the covariates
specified in ...
Usage
preprocess_for_nowcast(
.disease_data,
true_date,
report_date,
strata = NULL,
now,
units,
max_delay = Inf,
data_type = c("auto", "linelist", "count"),
verbose = TRUE
)Arguments
- .disease_data
A time series of reporting data in aggregated line list format such that each row has a column for onset date, report date, and (optionally) strata
- true_date
In quotations, the name of the column of datatype
Datedesignating the date of case onset. e.g. "onset_week"- report_date
In quotations, the name of the column of datatype
Datedesignating the date of case report. e.g. "report_week"- strata
Character vector of names of the strata included in the data.
- now
An object of datatype
Dateindicating the date at which to perlform the nowcast.- units
Time scale of reporting. Options: "1 day", "1 week".
- max_delay
Maximum possible delay observed or considered for estimation of the delay distribution (numeric). Default:
Inf- data_type
Either
linedataif each row represents a test orcountsif there is a column namednwith counts of how many tests had that onset and report dates- verbose
Boolean. Whether to print the data type assumptions.
Value
A data.frame with all possible counts for all delay-onset combinations.
The new column with the counts is named n. Additional columns .tval and .delay
are added where .tval codifies the dates as numbers (starting at 0) and delay
codifies the difference between onset and report.
Examples
data(denguedat)
# Get counts by onset date and report week consider all possible delays
preprocess_for_nowcast(denguedat, "onset_week", "report_week",
units = "weeks", now = as.Date("1990-03-05")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 55 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 3 1 0 1990-01-01 1990-01-01
#> 2 24 1 1 1990-01-01 1990-01-08
#> 3 23 1 2 1990-01-01 1990-01-15
#> 4 8 1 3 1990-01-01 1990-01-22
#> 5 1 1 4 1990-01-01 1990-01-29
#> 6 0 1 5 1990-01-01 1990-02-05
#> 7 1 1 6 1990-01-01 1990-02-12
#> 8 0 1 7 1990-01-01 1990-02-19
#> 9 0 1 8 1990-01-01 1990-02-26
#> 10 1 1 9 1990-01-01 1990-03-05
#> # ℹ 45 more rows
# Complete one date when there was no onset week
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-03")),
report_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 7 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 1 1 0 1994-09-19 1994-09-19
#> 2 0 1 1 1994-09-19 1994-09-26
#> 3 0 2 0 1994-09-26 1994-09-26
#> 4 0 2 1 1994-09-26 1994-10-03
#> 5 1 3 0 1994-10-03 1994-10-03
#> 6 2 3 1 1994-10-03 1994-10-10
#> 7 0 4 0 1994-10-10 1994-10-10
# Complete one date when there was no report of delay 3 mostly
df <- data.frame(
onset_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
report_week = as.Date(c("1994-10-10", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
units = "weeks",
now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 10 × 5
#> n .tval .delay onset_week report_week
#> <int> <dbl> <dbl> <date> <date>
#> 1 0 1 0 1994-09-19 1994-09-19
#> 2 0 1 1 1994-09-19 1994-09-26
#> 3 0 1 2 1994-09-19 1994-10-03
#> 4 1 1 3 1994-09-19 1994-10-10
#> 5 0 2 0 1994-09-26 1994-09-26
#> 6 0 2 1 1994-09-26 1994-10-03
#> 7 0 2 2 1994-09-26 1994-10-10
#> 8 1 3 0 1994-10-03 1994-10-03
#> 9 1 3 1 1994-10-03 1994-10-10
#> 10 1 4 0 1994-10-10 1994-10-10
# Get counts by onset date and report week stratifying by gender and state
df <- data.frame(
onset_week = sample(as.Date(c("1994-09-19", "1994-10-03", "1994-10-10")), 100, replace = TRUE),
gender = sample(c("Male", "Female"), 100, replace = TRUE),
state = sample(c("A", "B", "C", "D"), prob = c(0.5, 0.2, 0.2, 0.1), size = 100, replace = TRUE)
)
df$report_week <- df$onset_week +
sample(c(lubridate::weeks(1), lubridate::weeks(2)), 100, replace = TRUE)
preprocess_for_nowcast(df, "onset_week", "report_week", c("gender", "state"),
units = "weeks",
now = as.Date("1994-09-26")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 8 × 7
#> n .tval .delay onset_week gender state report_week
#> <int> <dbl> <dbl> <date> <chr> <chr> <date>
#> 1 2 1 1 1994-09-19 Female B 1994-09-26
#> 2 4 1 1 1994-09-19 Female A 1994-09-26
#> 3 0 1 1 1994-09-19 Female D 1994-09-26
#> 4 2 1 1 1994-09-19 Female C 1994-09-26
#> 5 3 1 1 1994-09-19 Male B 1994-09-26
#> 6 3 1 1 1994-09-19 Male A 1994-09-26
#> 7 1 1 1 1994-09-19 Male D 1994-09-26
#> 8 2 1 1 1994-09-19 Male C 1994-09-26