FasteR
functions with memoise
memoise
.
Hey, stop calling me!
Memoisation is a technique for speeding up functions via memorization of previously calculated results. To better explain the idea letâs consider a recursive formulation of the Fibonacci sequence:
\[ f(n) = f(n- 1) + f(n-2) \]
with \(f(1) = 1 = f(2)\). 1
An implementation of the function is given by:
<- function(n){
fibonacci if(n <= 2){
return(1)
else {
} return(
fibonacci(n - 1) + fibonacci(n - 2)
)
} }
We can calculate the time it takes to estimate the number up to 20
:
::microbenchmark(fibonacci(20)) microbenchmark
Unit: milliseconds
expr min lq mean median uq max neval
fibonacci(20) 4.633172 5.810971 6.92568 6.418399 7.803796 17.74069 100
Larger values (like fibonacci(100)
) start taking a lot of time. And thatâs because fibonacci(100)
estimates the same values several times! To illustrate this point, consider fibonacci(5)
. You can see that fibonacci(3)
is estimated twice: once under fibonacci(5)
itself and one under fibonacci(4)
. This is extremely inefficient!
graph TD f5[fibonacci 5] --> f4[fibonacci 4] f5[fibonacci 5] --> f3[fibonacci 3] f3 --> f13[fibonacci 1] f3 --> f23[fibonacci 2] f4 --> f32[fibonacci 3] f4 --> f2[fibonacci 2] f32 --> f21[fibonacci 2] f32 --> f11[fibonacci 1]
We can calculate how many times the function fibonacci
is called when estimating different numbers. In theory it should scale linearly i.e. fibonacci(20)
should only calculate fibonacci(1)
, fibonacci(2)
, etc up to fibonacci(19)
once. Hence the function should be called at most 20 times. However this isnât the case:
<- 0
calls_f <- function(n){
fibonacci_calls <<- calls_f + 1
calls_f if(n <= 2){
return(1)
else {
} return(
fibonacci_calls(n - 1) + fibonacci_calls(n - 2)
)
}
}invisible(fibonacci_calls(20))
cat(paste("fibonacci_calls was called",calls_f,"times"))
fibonacci_calls was called 13529 times
The algorithm is extremely inefficient because it doesnât remember when estimating fibonacci_calls(8)
that it already has estimated fibonacci_calls(7)
during its estimation of fibonacci_calls(9)
. Memoisation is a solution for this forgetfulness.
Hard coded memoisation
As we have seen, fibonacci(20)
calls the fibonacci
function 13,529 times. This inefficiency could be saved if the function could memorize that it has already calculated the previous results. That is the utility of the memoisation trick. To save these memorizations (memoise!), letâs create a global vector where weâll keep the previous results of the fibonacci
function. That is, the j
-th entry of our vector will correspond to fibonacci(j)
.
#Fibonacci cache vector up til fibonacci 1000
#fibonacci_cache[j] = fibonacci(j)
<- rep(NA, 1000)
fibonacci_cache
#fibonacci_cache[1] = fibonacci(1) and fibonacci_cache[2] = fibonacci(2)
1:2] <- c(1, 1)
fibonacci_cache[
<- function(n){
memoised_fibonacci #Only calculate the values we haven't previously estimated
if (is.na(fibonacci_cache[n])){
<<- memoised_fibonacci(n - 1) + memoised_fibonacci(n - 2)
fibonacci_cache[n]
} return(fibonacci_cache[n])
}
Letâs compare the speed of the previous function with this one:
::microbenchmark(fibonacci(20), memoised_fibonacci(20)) microbenchmark
Unit: nanoseconds
expr min lq mean median uq max
fibonacci(20) 3942361 4796700.5 5059247.18 4838703.0 4977222 8489579
memoised_fibonacci(20) 398 589.5 2730.04 1642.5 3855 28640
neval
100
100
This speed-up happens because the new memoised_fibonacci
only calculates the value 18
times! In contrast with the previous 13,529. For any custom function you build you can memoise
this way ooooor you can let the memoise
package do it for you.
The memoise
package
The memoise
package does exactly what we did in the previous section by automatically memoising functions (with arguments that arenât necessarily numbers). It sets limits to the memory (our fibonacci_cache
) as well as the amount of time a function will remember previous results (time limit). To memoise a function you just need to call memoise
over it:
library(memoise)
<- function(n){
mfibo if(n <= 2){
return(1)
else {
} return(
mfibo(n - 1) + mfibo(n - 2) #It's important to call this with the memoized name
)
}
}<- memoise(mfibo) #Memoization line mfibo
Note that this memoisation
has additional overhead over the memoisation we did in Section 2 because memoise
works even for non numeric arguments (which would have failed in our vectorized example). However the speed-up over the original is still there:
::microbenchmark(fibonacci(20), mfibo(20), memoised_fibonacci(20)) microbenchmark
Unit: nanoseconds
expr min lq mean median uq max
fibonacci(20) 5291457 6031574.5 7138627.87 6599172 7929635.0 14128101
mfibo(20) 58632 72422.5 151487.39 139772 190614.5 569502
memoised_fibonacci(20) 606 1206.0 5186.56 3470 7506.5 39376
neval
100
100
100
The memoise
package uses cachem which allows for fine control over where the previous results of the function. You can substitute the #Memoization line
in the previous code for this memoise with finer control.
#Set memory to 100MB and time to memorizing only for 15 minutes
<- cachem::cache_mem(max_size = 100 * 1024^2, max_age = 15 * 60)
cm <- memoise(mfibo, cache = cm) mfibo
To keep previous computations across different R sessions you can cache directly to disk (slower) instead of memory:
<- cachem::cache_disk(max_size = 100 * 1024^2, max_age = 15 * 60)
cm <- memoise(mfibo, cache = cm) mfibo
And in other languages?
You can also memoise in the most recent versions of Python either vie the built in functools
or the memoization
project. In Julia you can Memoize.jl. Let me know if you are interested in an entry for any of these.
Footnotes
Some definitions start with \(f(0)\) but as
R
indexes vectors in \(1\) weâd better start with \(1\).â©ïž