Skip to contents

Function that takes a data frame with true_date and report_date and generates all possible combinations of true_dates and report_dates observable controlling by the covariates specified in ...

Usage

preprocess_for_nowcast(
  .disease_data,
  true_date,
  report_date,
  strata = NULL,
  now,
  units,
  max_delay = Inf,
  data_type = c("auto", "linelist", "count"),
  verbose = TRUE
)

Arguments

.disease_data

A time series of reporting data in aggregated line list format such that each row has a column for onset date, report date, and (optionally) strata

true_date

In quotations, the name of the column of datatype Date designating the date of case onset. e.g. "onset_week"

report_date

In quotations, the name of the column of datatype Date designating the date of case report. e.g. "report_week"

strata

Character vector of names of the strata included in the data.

now

An object of datatype Date indicating the date at which to perlform the nowcast.

units

Time scale of reporting. Options: "1 day", "1 week".

max_delay

Maximum possible delay observed or considered for estimation of the delay distribution (numeric). Default: Inf

data_type

Either linedata if each row represents a test or counts if there is a column named n with counts of how many tests had that onset and report dates

verbose

Boolean. Whether to print the data type assumptions.

Value

A data.frame with all possible counts for all delay-onset combinations. The new column with the counts is named n. Additional columns .tval and .delay are added where .tval codifies the dates as numbers (starting at 0) and delay codifies the difference between onset and report.

Examples

data(denguedat)

# Get counts by onset date and report week consider all possible delays
preprocess_for_nowcast(denguedat, "onset_week", "report_week",
  units = "weeks", now = as.Date("1990-03-05")
)
#>  Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 55 × 5
#>        n .tval .delay onset_week report_week
#>    <int> <dbl>  <dbl> <date>     <date>     
#>  1     3     1      0 1990-01-01 1990-01-01 
#>  2    24     1      1 1990-01-01 1990-01-08 
#>  3    23     1      2 1990-01-01 1990-01-15 
#>  4     8     1      3 1990-01-01 1990-01-22 
#>  5     1     1      4 1990-01-01 1990-01-29 
#>  6     0     1      5 1990-01-01 1990-02-05 
#>  7     1     1      6 1990-01-01 1990-02-12 
#>  8     0     1      7 1990-01-01 1990-02-19 
#>  9     0     1      8 1990-01-01 1990-02-26 
#> 10     1     1      9 1990-01-01 1990-03-05 
#> # ℹ 45 more rows

# Complete one date when there was no onset week
df <- data.frame(
  onset_week  = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-03")),
  report_week = as.Date(c("1994-09-19", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
  units = "weeks",
  now = as.Date("1994-10-10")
)
#>  Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 7 × 5
#>       n .tval .delay onset_week report_week
#>   <int> <dbl>  <dbl> <date>     <date>     
#> 1     1     1      0 1994-09-19 1994-09-19 
#> 2     0     1      1 1994-09-19 1994-09-26 
#> 3     0     2      0 1994-09-26 1994-09-26 
#> 4     0     2      1 1994-09-26 1994-10-03 
#> 5     1     3      0 1994-10-03 1994-10-03 
#> 6     2     3      1 1994-10-03 1994-10-10 
#> 7     0     4      0 1994-10-10 1994-10-10 

# Complete one date when there was no report of delay 3 mostly
df <- data.frame(
  onset_week  = as.Date(c("1994-09-19", "1994-10-03", "1994-10-03", "1994-10-10")),
  report_week = as.Date(c("1994-10-10", "1994-10-03", "1994-10-10", "1994-10-10"))
)
preprocess_for_nowcast(df, "onset_week", "report_week",
  units = "weeks",
  now = as.Date("1994-10-10")
)
#>  Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 10 × 5
#>        n .tval .delay onset_week report_week
#>    <int> <dbl>  <dbl> <date>     <date>     
#>  1     0     1      0 1994-09-19 1994-09-19 
#>  2     0     1      1 1994-09-19 1994-09-26 
#>  3     0     1      2 1994-09-19 1994-10-03 
#>  4     1     1      3 1994-09-19 1994-10-10 
#>  5     0     2      0 1994-09-26 1994-09-26 
#>  6     0     2      1 1994-09-26 1994-10-03 
#>  7     0     2      2 1994-09-26 1994-10-10 
#>  8     1     3      0 1994-10-03 1994-10-03 
#>  9     1     3      1 1994-10-03 1994-10-10 
#> 10     1     4      0 1994-10-10 1994-10-10 

# Get counts by onset date and report week stratifying by gender and state
df <- data.frame(
  onset_week = sample(as.Date(c("1994-09-19", "1994-10-03", "1994-10-10")), 100, replace = TRUE),
  gender = sample(c("Male", "Female"), 100, replace = TRUE),
  state = sample(c("A", "B", "C", "D"), prob = c(0.5, 0.2, 0.2, 0.1), size = 100, replace = TRUE)
)
df$report_week <- df$onset_week +
  sample(c(lubridate::weeks(1), lubridate::weeks(2)), 100, replace = TRUE)
preprocess_for_nowcast(df, "onset_week", "report_week", c("gender", "state"),
  units = "weeks",
  now = as.Date("1994-09-26")
)
#>  Assuming data is linelist-data where each observation is a test. If you are working with count-data set `data_type = "count"`
#> # A tibble: 8 × 7
#>       n .tval .delay onset_week gender state report_week
#>   <int> <dbl>  <dbl> <date>     <chr>  <chr> <date>     
#> 1     2     1      1 1994-09-19 Female B     1994-09-26 
#> 2     4     1      1 1994-09-19 Female A     1994-09-26 
#> 3     0     1      1 1994-09-19 Female D     1994-09-26 
#> 4     2     1      1 1994-09-19 Female C     1994-09-26 
#> 5     3     1      1 1994-09-19 Male   B     1994-09-26 
#> 6     3     1      1 1994-09-19 Male   A     1994-09-26 
#> 7     1     1      1 1994-09-19 Male   D     1994-09-26 
#> 8     2     1      1 1994-09-19 Male   C     1994-09-26