Nowcasting methods
Nowcasting-methods.Rmd
Intuitive explanation
We consider the problem of reporting the number of cases for a certain disease. Results of disease’s tests are reported with a delay. So for any day we will see only some disease-cases in that day; other disease-cases for the same day will be reported in the upcoming days.
Figure 1 shows this idea where we assume that the maximum delay corresponds to days. We can see that by today, April 11th, all of the data for April the 2nd has already arrived: the data that arrived with zero delay (report Apr. 2) as well as the data that arrived with a delay of (report Apr. 3), (report Apr. 4), (report Apr. 5), etc until the data that was reported on April 8th with the maximum delay of 6. For other dates not all data has yet arrived. We can see that for April 9th we only see the data with delay 0 up till 3 (corresponding to April 11th). Data with delay larger than 3 will be seen in the future.
The idea of the model is to predict the number of cases that will be seen at time denoted as the sum of the delayed cases. This includes both the cases we’ve already observed and the cases we haven’t yet observed .
The main objective is then to model , the number of cases at time that will appear with delay . What we actually model is the log expected value of , denoted . The average of this variable is driven by a process composed of two elements: a time-dependent process and a delay-time-dependent process. You can think of the time-dependent process as the process that drives the epidemic curve while the delay-time-dependent process is the process that drives the testing (or changes in the testing).
Each of the processes can be further decomposed into a trend, a season (or multiple seasons), and a cycle:
The model thus captures seasonality both for the epidemic curve and for the delay curve as well as their own trends and noise.
Mathematical explanation
Let denote the number of incident cases (individuals) in stratum (say race/gender) at time who were reported with a delay . That is, denotes the the number of individuals diseased at time that were reported at moment , the number of individuals diseased at time that were reported at moment and in general the number of individuals diseased at time that were reported at moment .
We assume that the expected value of is given by: where follows a linear state-space model with covariates (see Durbin and Koopman (2012)):
where represents the time-dependent latent process and the delay-dependent latent process. The system is defined for (time), (delays) and (strata). Table 1 describes each of the variables and their dimensions.
Variable(s) | Dimension |
---|---|
, | |
, | |
, | |
In this model, ,, ,,, and are given. Variables , , and represent random (correlated) noise; is a vector of known covariates, is a vector of unknown parameters. Additionally, and are also unknown parameters.
The total number of incident cases for stratum expected to at time is given by: At time , the predicted number of cases with delay of stratum is denoted by . Finally, the predicted number of cases (nowcasted cases) in stratum for time is estimated by: where denotes the latest delay observed with the current data.
We should add the option for zero-inflation
Construction of the , , , and
The , , , , , , , and matrices can be constructed by blocks. In what follows we’ll only show how the construction works for general matrices , , , which stand for either , , and ; or , , and .
The general idea is that the vectors and , and the matrices and can be constructed by three blocks: a trend, a seasonality, and a cyclical component:
In this notation, if a section of the model is not specified then that empty block is not considered. For example, a model without seasonality might have the following : The definitions for , and in this case follow the same pattern.
Trend
The trend describes the general direction of . There are three trend options:
Constant trend
The constant trend model is given by: in which case , , , and .
Local linear trend
The simplest local linear trend model is given by: in which case , , , and . Notice that if we recover the constant trend model.
Local trend of degree
In general (for smoothing purposes), we can adjust the local linear trend of degree by fitting the model: where and in general . The model can be rewritten using the general formula for higher order (backward) differences as: where , , is a vector of zeroes with a one in the first entry, and
Delvelopers The Local trend of degree encompasses both the constant trend when and the local linear trend when and . As this is the general option this is the one programmed.
Seasonality
There are two types of seasonality considered in this model:
Discrete seasonality
We assume that there are given seasons of length . For example if represents days we can have (weekly) seasons each of length (as each week contains days). This is represented in the following baseline model: where is a gaussian white noise. This expression can be represented in matrix form with and with if is an integer and otherwise. The vector has length . It is defined initially for such that
Matrix is given by the following expression:
where the first row of has a every th column starting from column until column . The last entry of the first row of is . For the rest of the entries, has zeroes except in the entries () where it has value . And the last entry of the last row column of is also .
Example
Consider a seasonality of seasons each of length . We then have and then:
Substituting we obtain that next component is: defining and using the fact that the sum of independent gaussians is independent from one of its summands we recover the original expression.
Trigonometric seasonality
A different approach is to use harmonic functions. In particular, let define the period (number of time frames in a cycle) and define and use the following baseline model Following Durbin and Koopman (2012) we can write: with , the identity, and where and
Multiple seasons
Multiple seasonalities can be adjusted into blocks to construct the seasonal block:
which are then used in the definitions of , and respectively.
Cycle
Cycles represent fluctuations of rises and falls which are not of fixed period. Usually they are of greater length than seasonal cycles. For example an epidemic wave might be a cycle while daily effects are seasonal. A cycle is modelled as trigonometric seasons with unknown and a damping factor . Hence: with