Preprocess data.frame for generating a nowcast — preprocess_for

Function that takes a data frame with true_date and report_date and generates all possible combinations of true_dates and report_dates observable controlling by the covariates specified in ...

Usage

preprocess_for_nowcast(
  .disease_data,
  true_date,
  report_date,
  strata = NULL,
  now,
  units,
  max_delay = Inf,
  data_type = c("auto", "linelist", "count"),
  verbose = TRUE
)

Arguments

.disease_data: A time series of reporting data in aggregated line list format such that each row has a column for onset date, report date, and (optionally) strata
true_date: In quotations, the name of the column of datatype Date designating the date of case onset. e.g. "onset_week"
report_date: In quotations, the name of the column of datatype Date designating the date of case report. e.g. "report_week"
strata: Character vector of names of the strata included in the data.
now: An object of datatype Date indicating the date at which to perlform the nowcast.
units: Time scale of reporting. Options: "1 day", "1 week".
max_delay: Maximum possible delay observed or considered for estimation of the delay distribution (numeric). Default: Inf
data_type: Either linedata if each row represents a test or counts if there is a column named n with counts of how many tests had that onset and report dates
verbose: Boolean. Whether to print the data type assumptions.

Value

A data.frame with all possible counts for all delay-onset combinations. The new column with the counts is named n. Additional columns .tval and .delay are added where .tval codifies the dates as numbers (starting at 0) and delay codifies the difference between onset and report.

Examples

data(denguedat)

# Get counts by onset date and report week consider all possible delays
preprocess_for_nowcast(denguedat, "onset_week", "report_week",
  units = "weeks", now = as.Date("1990-03-05")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 55 × 5
#>        n .tval .delay onset_week report_week
#>    <int> <dbl>  <dbl> <date>     <date>     
#>  1     3     1      0 1990-01-01 1990-01-01 
#>  2    24     1      1 1990-01-01 1990-01-08 
#>  3    23     1      2 1990-01-01 1990-01-15 
#>  4     8     1      3 1990-01-01 1990-01-22 
#>  5     1     1      4 1990-01-01 1990-01-29 
#>  6     0     1      5 1990-01-01 1990-02-05 
#>  7     1     1      6 1990-01-01 1990-02-12 
#>  8     0     1      7 1990-01-01 1990-02-19 
#>  9     0     1      8 1990-01-01 1990-02-26 
#> 10     1     1      9 1990-01-01 1990-03-05 
#> # ℹ 45 more rows

# Complete one date when there was no onset week
df <- data.frame(
  onset_week  = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-03")),
  report_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
  units = "weeks",
  now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 7 × 5
#>       n .tval .delay onset_week report_week
#>   <int> <dbl>  <dbl> <date>     <date>     
#> 1     1     1      0 1994-09-19 1994-09-19 
#> 2     0     1      1 1994-09-19 1994-09-26 
#> 3     0     2      0 1994-09-26 1994-09-26 
#> 4     0     2      1 1994-09-26 1994-10-03 
#> 5     1     3      0 1994-10-03 1994-10-03 
#> 6     2     3      1 1994-10-03 1994-10-10 
#> 7     0     4      0 1994-10-10 1994-10-10 

# Complete one date when there was no report of delay 3 mostly
df <- data.frame(
  onset_week  = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
  report_week = as.Date(c("1994-10-10", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
  units = "weeks",
  now = as.Date("1994-10-10")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 10 × 5
#>        n .tval .delay onset_week report_week
#>    <int> <dbl>  <dbl> <date>     <date>     
#>  1     0     1      0 1994-09-19 1994-09-19 
#>  2     0     1      1 1994-09-19 1994-09-26 
#>  3     0     1      2 1994-09-19 1994-10-03 
#>  4     1     1      3 1994-09-19 1994-10-10 
#>  5     0     2      0 1994-09-26 1994-09-26 
#>  6     0     2      1 1994-09-26 1994-10-03 
#>  7     0     2      2 1994-09-26 1994-10-10 
#>  8     1     3      0 1994-10-03 1994-10-03 
#>  9     1     3      1 1994-10-03 1994-10-10 
#> 10     1     4      0 1994-10-10 1994-10-10 

# Get counts by onset date and report week stratifying by gender and state
df <- data.frame(
  onset_week = sample(as.Date(c("1994-09-19", "1994-10-03", "1994-10-10")), 100, replace = TRUE),
  gender = sample(c("Male", "Female"), 100, replace = TRUE),
  state = sample(c("A", "B", "C", "D"), prob = c(0.5, 0.2, 0.2, 0.1), size = 100, replace = TRUE)
)
df$report_week <- df$onset_week +
  sample(c(lubridate::weeks(1), lubridate::weeks(2)), 100, replace = TRUE)
preprocess_for_nowcast(df, "onset_week", "report_week", c("gender", "state"),
  units = "weeks",
  now = as.Date("1994-09-26")
)
#> ℹ Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 8 × 7
#>       n .tval .delay onset_week gender state report_week
#>   <int> <dbl>  <dbl> <date>     <chr>  <chr> <date>     
#> 1     2     1      1 1994-09-19 Female B     1994-09-26 
#> 2     4     1      1 1994-09-19 Female A     1994-09-26 
#> 3     0     1      1 1994-09-19 Female D     1994-09-26 
#> 4     2     1      1 1994-09-19 Female C     1994-09-26 
#> 5     3     1      1 1994-09-19 Male   B     1994-09-26 
#> 6     3     1      1 1994-09-19 Male   A     1994-09-26 
#> 7     1     1      1 1994-09-19 Male   D     1994-09-26 
#> 8     2     1      1 1994-09-19 Male   C     1994-09-26