Introduction to diseasenowcasting • diseasenowcasting

The diseasenowcasting package provides a disease-agnostic framework for nowcasting the number of cases affected by certain disease.

What is nowcasting?

In many applications, it is of interest to predict the current number of cases as there might be a lag in the registry. Examples of lags can be administrative such as tests taking time to yield results, sociological such as patients taking time to report at their healthcare facility after symptom onset or biological: symptoms not happening at the same time as colonization. In such cases you might end up with registries looking like the one in Figure 1: where the number of cases observed by today is an incomplete fraction of the overall number of cases that will eventually be registered.

Current number of cases for a certain disease (left) vs the eventual number of registered cases for that disease (right)

The difference between the current observed number of cases and the final number of cases that will eventually be reported is accounted for by a model. By nowcasting, we refer to predicting the number of cases for a specific disease in a date that has already happened (or is currently happening) and for which we have incomplete information. The most basic example involves predicting the number of cases for now given that we have already seen some of today’s cases arrive at our system but not all of them yet. Hence the nowcast refers to the number of cases that we will eventually observe for a certain time frame as depicted in Figure 2:

$Nowcasted number of cases for the disease of the previous image given that we had only observed a fraction of them$

Nowcasted number of cases for the disease of the previous image given that we had only observed a fraction of them

This vignette will give you an overview of the basics of the diseasenowcasting package. The typical workflow involves the following steps:

Getting the data.
Fitting a model.
Evaluating the model.
Updating with new data.

In this tutorial we will focus only on the steps in a reproducible manner and we will refer to specific vignettes for advanced usage on each of the steps.

set.seed(26758)
library(diseasenowcasting)
library(dplyr)

1. Getting the data

Data for the nowcasting function should be either individual-level data or line-list data. In any case at least two columns in Date format are required: the true_date and the report_date.

true_date Refers to the date when the case happened.
report_date Refers to the date when the case was reported.

In general we assume that the true_date always happened before the report_date as you cannot report something that hasn’t already happened.

Types of data

Individual-level data

By individual-level data we mean a dataset for which each row represents a case. For example, the denguedat dataset included in the package contains individual-level data:

#Call the dataset
data("denguedat")

#Preview 
denguedat |> head()
#>   onset_week report_week gender
#> 1 1990-01-01  1990-01-01   Male
#> 2 1990-01-01  1990-01-01 Female
#> 3 1990-01-01  1990-01-01 Female
#> 4 1990-01-01  1990-01-08 Female
#> 5 1990-01-01  1990-01-08   Male
#> 6 1990-01-01  1990-01-15 Female

Notice that beyond the two dates (here true_date = "onset_week" and report_date = "report_week") there can be additional covariates in the data (such as gender in this example).

Line-list data

By line-list data we mean a dataset for which each row represents a count of cases For example, the mpoxdat dataset included in the package contains line-list data:

#Call the dataset
data("mpoxdat")

#Preview 
mpoxdat |> head()
#> # A tibble: 6 × 4
#> # Rowwise: 
#>   dx_date    dx_report_date race                   n
#>   <date>     <date>         <chr>              <int>
#> 1 2022-07-08 2022-07-12     Asian                  4
#> 2 2022-07-08 2022-07-12     Black                  6
#> 3 2022-07-08 2022-07-12     Hispanic               6
#> 4 2022-07-08 2022-07-12     Non-Hispanic White     6
#> 5 2022-07-08 2022-07-13     Asian                  2
#> 6 2022-07-08 2022-07-13     Black                  3

Both line-list and individual-level data can be used in the package. Also, the package allows for different time granularity. The denguedat contains counts per week while mpoxdat has counts per day. Both weekly and daily time frames are usable in diseasenowcasting.

2. Fitting the nowcast

You can use the nowcast() function to generate nowcasts for your data. In the following example we use only a fraction of denguedat: until October 1990 to speed things up (you can use the whole denguedat if you want):

#Reduce the dataset
denguedat_fraction <- denguedat |> 
  filter(report_week <= as.Date("1990/10/01", format = "%Y/%m/%d"))

For the nowcast we specify which one is the true_date and which one corresponds to the report_date. Remember that the true_date should always be smaller or equal to the report_date as the assumption is that no cases can be reported before happening:

Note For this example we are setting the method to variational just to improve the speed of the example. The sampling method should always be prefered.

#Generate the nowcast. Set verbose option t
ncast <- nowcast(denguedat_fraction, true_date = "onset_week", 
                 report_date = "report_week",
                 method = "variational") #<- Change to sampling when in production

The ncast object itself shows the characteristics of the fitted model:

ncast
#> 
#> ── Nowcast for 1990-10-01 ──
#> 
#> • Column with `true_date` = "onset_week"
#> • Column with `report_date` = "report_week"
#> • units = "weeks"
#> 
#> ── Epidemic effects:
#> The following effects are in place:
#>   • week_of_year
#> 
#> ── Delay effects:
#> No temporal effects are considered
#> 
#> Use the `summary` function to obtain the summary of predictions or `plot` to
#> generate an image
#>

You can generate plots of the nowcast with the plot function:

plot(ncast, title = "Nowcast")

And get the predictions with the summary function

summary(ncast)
#> # A tibble: 40 × 7
#>    onset_week  mean    sd median  `5%` `95%` Strata_unified
#>    <date>     <dbl> <dbl>  <dbl> <dbl> <dbl> <chr>         
#>  1 1990-01-01    61     0     61    61    61 No strata     
#>  2 1990-01-08    50     0     50    50    50 No strata     
#>  3 1990-01-15    44     0     44    44    44 No strata     
#>  4 1990-01-22    46     0     46    46    46 No strata     
#>  5 1990-01-29    39     0     39    39    39 No strata     
#>  6 1990-02-05    34     0     34    34    34 No strata     
#>  7 1990-02-12    24     0     24    24    24 No strata     
#>  8 1990-02-19    17     0     17    17    17 No strata     
#>  9 1990-02-26    17     0     17    17    17 No strata     
#> 10 1990-03-05    16     0     16    16    16 No strata     
#> # ℹ 30 more rows

You can see how to add holiday effects, change the priors, and add further options to the model, in the Advanced nowcast options.

For now we will show you how to evaluate a nowcast once it has been prepared:

3. Evaluating the model

Once a nowcast has been constructed you can use the [backtest()] function to evaluate what the current model would have predicted in the past vs what was actually observed. Here, for example, we are evaluating what the model predicted for May, June and July:

#This will take a while as it refits the model several times
btest <- backtest(ncast, start_date = as.Date("1990/05/01"), end_date = as.Date("1990/07/31"), refresh = 0, model_name='model_NB')

The backtest_metrics() function calculates the mean absolute error (mae), root mean squared error (rmse) and weighted interval scoring (wis):

#This will take a while as it refits the model several times
mtr <- backtest_metrics(btest)
mtr
#>       model        now horizon Strata_unified    mae   rmse       ape      wis
#> 1  model_NB 1990-05-07       0      No strata 16.587 16.587 3.3174000 7.821429
#> 2  model_NB 1990-05-14       0      No strata 13.524 13.524 1.9320000 5.000000
#> 3  model_NB 1990-05-21       0      No strata 16.621 16.621 5.5403333 7.672143
#> 4  model_NB 1990-05-28       0      No strata 14.775 14.775 2.9550000 6.564464
#> 5  model_NB 1990-06-04       0      No strata 16.846 16.846 5.6153333 8.278750
#> 6  model_NB 1990-06-11       0      No strata 14.631 14.631 2.4385000 6.786429
#> 7  model_NB 1990-06-18       0      No strata 14.240 14.240 2.3733333 6.671429
#> 8  model_NB 1990-06-25       0      No strata 11.011 11.011 1.0010000 4.128571
#> 9  model_NB 1990-07-02       0      No strata 13.246 13.246 2.2076667 5.942857
#> 10 model_NB 1990-07-09       0      No strata 13.130 13.130 1.6412500 5.064286
#> 11 model_NB 1990-07-16       0      No strata 13.359 13.359 2.2265000 5.500000
#> 12 model_NB 1990-07-23       0      No strata 12.644 12.644 1.8062857 4.757143
#> 13 model_NB 1990-07-30       0      No strata  3.393  3.393 0.1995882 1.714464
#>    overprediction underprediction dispersion bias interval_coverage_50
#> 1       6.1428571               0   1.678571 1.00                    0
#> 2       3.2857143               0   1.714286 0.90                    0
#> 3       6.0000000               0   1.672143 1.00                    0
#> 4       5.0000000               0   1.564464 1.00                    0
#> 5       6.7142857               0   1.564464 1.00                    0
#> 6       5.2714286               0   1.515000 1.00                    0
#> 7       5.2857143               0   1.385714 1.00                    0
#> 8       2.5714286               0   1.557143 0.90                    0
#> 9       4.5714286               0   1.371429 1.00                    0
#> 10      3.4285714               0   1.635714 0.90                    0
#> 11      4.0000000               0   1.500000 0.95                    0
#> 12      3.2857143               0   1.471429 0.90                    0
#> 13      0.2857143               0   1.428750 0.50                    1
#>    interval_coverage_90 ae_median
#> 1                     0        15
#> 2                     1        11
#> 3                     0        14
#> 4                     0        13
#> 5                     0        15
#> 6                     0        13
#> 7                     0        13
#> 8                     1        10
#> 9                     0        12
#> 10                    1        12
#> 11                    0        12
#> 12                    1        11
#> 13                    1         2

You can generate a summary and and plot backtest_metrics() object with thesummary and plot functions.

summary(mtr)
#> # A tibble: 11 × 8
#>    model    Strata_unified horizon metric                 avg    sd n_now     se
#>    <chr>    <chr>          <chr>   <chr>                <dbl> <dbl> <int>  <dbl>
#>  1 model_NB No strata      0       ae_median          11.8    3.30     13 0.914 
#>  2 model_NB No strata      0       ape                 2.56   1.56     13 0.431 
#>  3 model_NB No strata      0       bias                0.927  0.136    13 0.0378
#>  4 model_NB No strata      0       dispersion          1.54   0.111    13 0.0309
#>  5 model_NB No strata      0       interval_coverage…  0.0769 0.277    13 0.0769
#>  6 model_NB No strata      0       interval_coverage…  0.385  0.506    13 0.140 
#>  7 model_NB No strata      0       mae                13.4    3.45     13 0.958 
#>  8 model_NB No strata      0       overprediction      4.30   1.75     13 0.484 
#>  9 model_NB No strata      0       rmse               13.4    3.45     13 0.958 
#> 10 model_NB No strata      0       underprediction     0      0        13 0     
#> 11 model_NB No strata      0       wis                 5.84   1.78     13 0.493
plot(mtr, title = "Backtest")

backtest_metrics works with multiple backtest objects

#creating a new nowcast using a Poisson distribution
ncast2 <- nowcast(denguedat_fraction, true_date = "onset_week", 
                 report_date = "report_week",
                method = "variational", dist = "Poisson") 
btest2 <- backtest(ncast2, start_date = as.Date("1990/05/01"), end_date = as.Date("1990/07/31"), refresh = 0, model_name='model_Poisson') 

comparison_mtr <- backtest_metrics(btest, btest2)

generate a summary of the comparison with summary and generate a plot with plot

summary(comparison_mtr)
#> # A tibble: 22 × 8
#>    model    Strata_unified horizon metric                 avg    sd n_now     se
#>    <chr>    <chr>          <chr>   <chr>                <dbl> <dbl> <int>  <dbl>
#>  1 model_NB No strata      0       ae_median          11.8    3.30     13 0.914 
#>  2 model_NB No strata      0       ape                 2.56   1.56     13 0.431 
#>  3 model_NB No strata      0       bias                0.927  0.136    13 0.0378
#>  4 model_NB No strata      0       dispersion          1.54   0.111    13 0.0309
#>  5 model_NB No strata      0       interval_coverage…  0.0769 0.277    13 0.0769
#>  6 model_NB No strata      0       interval_coverage…  0.385  0.506    13 0.140 
#>  7 model_NB No strata      0       mae                13.4    3.45     13 0.958 
#>  8 model_NB No strata      0       overprediction      4.30   1.75     13 0.484 
#>  9 model_NB No strata      0       rmse               13.4    3.45     13 0.958 
#> 10 model_NB No strata      0       underprediction     0      0        13 0     
#> # ℹ 12 more rows
plot(comparison_mtr, title = "Backtest comparison")

4. Updating with new data

The update() function takes an existing nowcast and updates it with new_data to generate predictions for a more recent now. For example, consider now the denguedat information with one additional new week:

#This new dataset contains a new additional week
denguedat_fraction_updated <- denguedat |> 
  filter(report_week <= as.Date("1990/10/08", format = "%Y/%m/%d"))

We can update our nowcast with this new data:

ncast_2 <- update(ncast, new_data = denguedat_fraction_updated)

And do the same operations as before:

plot(ncast_2, title = "Updated nowcast")