
Introduction to diseasenowcasting
Source:vignettes/articles/diseasenowcasting.Rmd
diseasenowcasting.Rmd
The diseasenowcasting
package provides a
disease-agnostic framework for nowcasting the number of cases affected
by certain disease.
What is nowcasting?
In many applications, it is of interest to predict the current number of cases as there might be a lag in the registry. Examples of lags can be administrative such as tests taking time to yield results, sociological such as patients taking time to report at their healthcare facility after symptom onset or biological: symptoms not happening at the same time as colonization. In such cases you might end up with registries looking like the one in Figure 1: where the number of cases observed by today is an incomplete fraction of the overall number of cases that will eventually be registered.

Current number of cases for a certain disease (left) vs the eventual number of registered cases for that disease (right)
The difference between the current observed number of cases and the final number of cases that will eventually be reported is accounted for by a model. By nowcasting, we refer to predicting the number of cases for a specific disease in a date that has already happened (or is currently happening) and for which we have incomplete information. The most basic example involves predicting the number of cases for now given that we have already seen some of today’s cases arrive at our system but not all of them yet. Hence the nowcast refers to the number of cases that we will eventually observe for a certain time frame as depicted in Figure 2:

Nowcasted number of cases for the disease of the previous image given that we had only observed a fraction of them
This vignette will give you an overview of the basics of the
diseasenowcasting
package. The typical workflow involves
the following steps:
In this tutorial we will focus only on the steps in a reproducible manner and we will refer to specific vignettes for advanced usage on each of the steps.
set.seed(26758)
library(diseasenowcasting)
library(dplyr)
1. Getting the data
Data for the nowcasting
function should be either
individual-level data or line-list data. In any case at least two
columns in Date
format are required: the
true_date
and the report_date
.
-
true_date
Refers to the date when the case happened. -
report_date
Refers to the date when the case was reported.
In general we assume that the
true_date
always happened before thereport_date
as you cannot report something that hasn’t already happened.
Types of data
Individual-level data
By individual-level data we mean a dataset for which each row
represents a case. For example, the denguedat
dataset
included in the package contains individual-level data:
#Call the dataset
data("denguedat")
#Preview
denguedat |> head()
#> onset_week report_week gender
#> 1 1990-01-01 1990-01-01 Male
#> 2 1990-01-01 1990-01-01 Female
#> 3 1990-01-01 1990-01-01 Female
#> 4 1990-01-01 1990-01-08 Female
#> 5 1990-01-01 1990-01-08 Male
#> 6 1990-01-01 1990-01-15 Female
Notice that beyond the two dates (here
true_date = "onset_week"
and
report_date = "report_week"
) there can be additional
covariates in the data (such as gender
in this
example).
Line-list data
By line-list data we mean a dataset for which each row represents a
count of cases For example, the mpoxdat
dataset included in
the package contains line-list data:
#Call the dataset
data("mpoxdat")
#Preview
mpoxdat |> head()
#> # A tibble: 6 × 4
#> # Rowwise:
#> dx_date dx_report_date race n
#> <date> <date> <chr> <int>
#> 1 2022-07-08 2022-07-12 Asian 4
#> 2 2022-07-08 2022-07-12 Black 6
#> 3 2022-07-08 2022-07-12 Hispanic 6
#> 4 2022-07-08 2022-07-12 Non-Hispanic White 6
#> 5 2022-07-08 2022-07-13 Asian 2
#> 6 2022-07-08 2022-07-13 Black 3
Both line-list and individual-level data can be used in the package.
Also, the package allows for different time granularity. The
denguedat
contains counts per week while
mpoxdat
has counts per day. Both weekly and daily time
frames are usable in diseasenowcasting
.
2. Fitting the nowcast
You can use the nowcast()
function to generate nowcasts
for your data. In the following example we use only a fraction of
denguedat
: until October 1990 to speed things up (you can
use the whole denguedat if you want):
#Reduce the dataset
denguedat_fraction <- denguedat |>
filter(report_week <= as.Date("1990/10/01", format = "%Y/%m/%d"))
For the nowcast we specify which one is the true_date
and which one corresponds to the report_date
. Remember that
the true_date
should always be smaller or equal to the
report_date
as the assumption is that no cases can be
reported before happening:
Note For this example we are setting the method to variational just to improve the speed of the example. The sampling method should always be prefered.
#Generate the nowcast. Set verbose option t
ncast <- nowcast(denguedat_fraction, true_date = "onset_week",
report_date = "report_week",
method = "variational") #<- Change to sampling when in production
The ncast
object itself shows the characteristics of the
fitted model:
ncast
#>
#> ── Nowcast for 1990-10-01 ──
#>
#> • Column with `true_date` = "onset_week"
#> • Column with `report_date` = "report_week"
#> • units = "weeks"
#>
#> ── Epidemic effects:
#> The following effects are in place:
#> • week_of_year
#>
#> ── Delay effects:
#> No temporal effects are considered
#>
#> Use the `summary` function to obtain the summary of predictions or `plot` to
#> generate an image
#>
You can generate plots of the nowcast
with the
plot
function:
plot(ncast, title = "Nowcast")
And get the predictions with the summary function
summary(ncast)
#> # A tibble: 40 × 7
#> onset_week mean sd median `5%` `95%` Strata_unified
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1990-01-01 61 0 61 61 61 No strata
#> 2 1990-01-08 50 0 50 50 50 No strata
#> 3 1990-01-15 44 0 44 44 44 No strata
#> 4 1990-01-22 46 0 46 46 46 No strata
#> 5 1990-01-29 39 0 39 39 39 No strata
#> 6 1990-02-05 34 0 34 34 34 No strata
#> 7 1990-02-12 24 0 24 24 24 No strata
#> 8 1990-02-19 17 0 17 17 17 No strata
#> 9 1990-02-26 17 0 17 17 17 No strata
#> 10 1990-03-05 16 0 16 16 16 No strata
#> # ℹ 30 more rows
You can see how to add holiday effects, change the priors, and add further options to the model, in the Advanced nowcast options.
For now we will show you how to evaluate a nowcast once it has been prepared:
3. Evaluating the model
Once a nowcast has been constructed you can use the [backtest()] function to evaluate what the current model would have predicted in the past vs what was actually observed. Here, for example, we are evaluating what the model predicted for May, June and July:
#This will take a while as it refits the model several times
btest <- backtest(ncast, start_date = as.Date("1990/05/01"), end_date = as.Date("1990/07/31"), refresh = 0, model_name='model_NB')
The backtest_metrics()
function calculates the mean
absolute error (mae
), root mean squared error
(rmse
) and weighted interval scoring
(wis
):
#This will take a while as it refits the model several times
mtr <- backtest_metrics(btest)
mtr
#> model now horizon Strata_unified mae rmse ape wis
#> 1 model_NB 1990-05-07 0 No strata 16.587 16.587 3.3174000 7.821429
#> 2 model_NB 1990-05-14 0 No strata 13.524 13.524 1.9320000 5.000000
#> 3 model_NB 1990-05-21 0 No strata 16.621 16.621 5.5403333 7.672143
#> 4 model_NB 1990-05-28 0 No strata 14.775 14.775 2.9550000 6.564464
#> 5 model_NB 1990-06-04 0 No strata 16.846 16.846 5.6153333 8.278750
#> 6 model_NB 1990-06-11 0 No strata 14.631 14.631 2.4385000 6.786429
#> 7 model_NB 1990-06-18 0 No strata 14.240 14.240 2.3733333 6.671429
#> 8 model_NB 1990-06-25 0 No strata 11.011 11.011 1.0010000 4.128571
#> 9 model_NB 1990-07-02 0 No strata 13.246 13.246 2.2076667 5.942857
#> 10 model_NB 1990-07-09 0 No strata 13.130 13.130 1.6412500 5.064286
#> 11 model_NB 1990-07-16 0 No strata 13.359 13.359 2.2265000 5.500000
#> 12 model_NB 1990-07-23 0 No strata 12.644 12.644 1.8062857 4.757143
#> 13 model_NB 1990-07-30 0 No strata 3.393 3.393 0.1995882 1.714464
#> overprediction underprediction dispersion bias interval_coverage_50
#> 1 6.1428571 0 1.678571 1.00 0
#> 2 3.2857143 0 1.714286 0.90 0
#> 3 6.0000000 0 1.672143 1.00 0
#> 4 5.0000000 0 1.564464 1.00 0
#> 5 6.7142857 0 1.564464 1.00 0
#> 6 5.2714286 0 1.515000 1.00 0
#> 7 5.2857143 0 1.385714 1.00 0
#> 8 2.5714286 0 1.557143 0.90 0
#> 9 4.5714286 0 1.371429 1.00 0
#> 10 3.4285714 0 1.635714 0.90 0
#> 11 4.0000000 0 1.500000 0.95 0
#> 12 3.2857143 0 1.471429 0.90 0
#> 13 0.2857143 0 1.428750 0.50 1
#> interval_coverage_90 ae_median
#> 1 0 15
#> 2 1 11
#> 3 0 14
#> 4 0 13
#> 5 0 15
#> 6 0 13
#> 7 0 13
#> 8 1 10
#> 9 0 12
#> 10 1 12
#> 11 0 12
#> 12 1 11
#> 13 1 2
You can generate a summary and and plot
backtest_metrics()
object with thesummary
and
plot
functions.
summary(mtr)
#> # A tibble: 11 × 8
#> model Strata_unified horizon metric avg sd n_now se
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <dbl>
#> 1 model_NB No strata 0 ae_median 11.8 3.30 13 0.914
#> 2 model_NB No strata 0 ape 2.56 1.56 13 0.431
#> 3 model_NB No strata 0 bias 0.927 0.136 13 0.0378
#> 4 model_NB No strata 0 dispersion 1.54 0.111 13 0.0309
#> 5 model_NB No strata 0 interval_coverage… 0.0769 0.277 13 0.0769
#> 6 model_NB No strata 0 interval_coverage… 0.385 0.506 13 0.140
#> 7 model_NB No strata 0 mae 13.4 3.45 13 0.958
#> 8 model_NB No strata 0 overprediction 4.30 1.75 13 0.484
#> 9 model_NB No strata 0 rmse 13.4 3.45 13 0.958
#> 10 model_NB No strata 0 underprediction 0 0 13 0
#> 11 model_NB No strata 0 wis 5.84 1.78 13 0.493
plot(mtr, title = "Backtest")
backtest_metrics
works with multiple
backtest
objects
#creating a new nowcast using a Poisson distribution
ncast2 <- nowcast(denguedat_fraction, true_date = "onset_week",
report_date = "report_week",
method = "variational", dist = "Poisson")
btest2 <- backtest(ncast2, start_date = as.Date("1990/05/01"), end_date = as.Date("1990/07/31"), refresh = 0, model_name='model_Poisson')
comparison_mtr <- backtest_metrics(btest, btest2)
generate a summary of the comparison with summary
and
generate a plot with plot
summary(comparison_mtr)
#> # A tibble: 22 × 8
#> model Strata_unified horizon metric avg sd n_now se
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <dbl>
#> 1 model_NB No strata 0 ae_median 11.8 3.30 13 0.914
#> 2 model_NB No strata 0 ape 2.56 1.56 13 0.431
#> 3 model_NB No strata 0 bias 0.927 0.136 13 0.0378
#> 4 model_NB No strata 0 dispersion 1.54 0.111 13 0.0309
#> 5 model_NB No strata 0 interval_coverage… 0.0769 0.277 13 0.0769
#> 6 model_NB No strata 0 interval_coverage… 0.385 0.506 13 0.140
#> 7 model_NB No strata 0 mae 13.4 3.45 13 0.958
#> 8 model_NB No strata 0 overprediction 4.30 1.75 13 0.484
#> 9 model_NB No strata 0 rmse 13.4 3.45 13 0.958
#> 10 model_NB No strata 0 underprediction 0 0 13 0
#> # ℹ 12 more rows
plot(comparison_mtr, title = "Backtest comparison")
4. Updating with new data
The update()
function takes an existing nowcast and
updates it with new_data
to generate predictions for a more
recent now. For example, consider now the denguedat
information with one additional new week:
#This new dataset contains a new additional week
denguedat_fraction_updated <- denguedat |>
filter(report_week <= as.Date("1990/10/08", format = "%Y/%m/%d"))
We can update our nowcast with this new data:
ncast_2 <- update(ncast, new_data = denguedat_fraction_updated)
And do the same operations as before:
plot(ncast_2, title = "Updated nowcast")