FasteR functions with memoise

R
memoise
Advanced R / R avanzado
programming tips / tips de programaciĂłn
Author

Rodrigo Zepeda-Tello

Published

December 27, 2022

Abstract
In this entry I discuss how to speed up recursive R functions with the memory-trick of memoise.

Hey, stop calling me!

Memoisation is a technique for speeding up functions via memorization of previously calculated results. To better explain the idea let’s consider a recursive formulation of the Fibonacci sequence:

\[ f(n) = f(n- 1) + f(n-2) \]

with \(f(1) = 1 = f(2)\). 1

An implementation of the function is given by:

fibonacci <- function(n){
  if(n <= 2){
    return(1)
  } else {
    return(
      fibonacci(n - 1) + fibonacci(n - 2)
    )
  }
}

We can calculate the time it takes to estimate the number up to 20:

microbenchmark::microbenchmark(fibonacci(20))
Unit: milliseconds
          expr      min       lq    mean   median       uq      max neval
 fibonacci(20) 4.633172 5.810971 6.92568 6.418399 7.803796 17.74069   100

Larger values (like fibonacci(100)) start taking a lot of time. And that’s because fibonacci(100) estimates the same values several times! To illustrate this point, consider fibonacci(5). You can see that fibonacci(3) is estimated twice: once under fibonacci(5) itself and one under fibonacci(4). This is extremely inefficient!

graph TD
    f5[fibonacci 5] --> f4[fibonacci 4]
    f5[fibonacci 5] --> f3[fibonacci 3]
    f3 --> f13[fibonacci 1]
    f3 --> f23[fibonacci 2]
    f4 --> f32[fibonacci 3]
    f4 --> f2[fibonacci 2]
    f32 --> f21[fibonacci 2]
    f32 --> f11[fibonacci 1]

We can calculate how many times the function fibonacci is called when estimating different numbers. In theory it should scale linearly i.e. fibonacci(20) should only calculate fibonacci(1), fibonacci(2), etc up to fibonacci(19) once. Hence the function should be called at most 20 times. However this isn’t the case:

calls_f   <- 0
fibonacci_calls <- function(n){
  calls_f <<- calls_f + 1
  if(n <= 2){
    return(1)
  } else {
    return(
      fibonacci_calls(n - 1) + fibonacci_calls(n - 2)
    )
  }
}
invisible(fibonacci_calls(20))

cat(paste("fibonacci_calls was called",calls_f,"times"))
fibonacci_calls was called 13529 times

The algorithm is extremely inefficient because it doesn’t remember when estimating fibonacci_calls(8) that it already has estimated fibonacci_calls(7) during its estimation of fibonacci_calls(9). Memoisation is a solution for this forgetfulness.

Hard coded memoisation

As we have seen, fibonacci(20) calls the fibonacci function 13,529 times. This inefficiency could be saved if the function could memorize that it has already calculated the previous results. That is the utility of the memoisation trick. To save these memorizations (memoise!), let’s create a global vector where we’ll keep the previous results of the fibonacci function. That is, the j-th entry of our vector will correspond to fibonacci(j).

#Fibonacci cache vector up til fibonacci 1000 
#fibonacci_cache[j] = fibonacci(j)
fibonacci_cache <- rep(NA, 1000)

#fibonacci_cache[1] = fibonacci(1) and fibonacci_cache[2] = fibonacci(2) 
fibonacci_cache[1:2] <- c(1, 1)

memoised_fibonacci <- function(n){
  #Only calculate the values we haven't previously estimated
  if (is.na(fibonacci_cache[n])){
    fibonacci_cache[n] <<- memoised_fibonacci(n - 1) + memoised_fibonacci(n - 2)
  } 
  return(fibonacci_cache[n])
}

Let’s compare the speed of the previous function with this one:

microbenchmark::microbenchmark(fibonacci(20), memoised_fibonacci(20))
Unit: nanoseconds
                   expr     min        lq       mean    median      uq     max
          fibonacci(20) 3942361 4796700.5 5059247.18 4838703.0 4977222 8489579
 memoised_fibonacci(20)     398     589.5    2730.04    1642.5    3855   28640
 neval
   100
   100

This speed-up happens because the new memoised_fibonacci only calculates the value 18 times! In contrast with the previous 13,529. For any custom function you build you can memoise this way ooooor you can let the memoise package do it for you.

The memoise package

The memoise package does exactly what we did in the previous section by automatically memoising functions (with arguments that aren’t necessarily numbers). It sets limits to the memory (our fibonacci_cache) as well as the amount of time a function will remember previous results (time limit). To memoise a function you just need to call memoise over it:

library(memoise)
mfibo <- function(n){
  if(n <= 2){
    return(1)
  } else {
    return(
      mfibo(n - 1) + mfibo(n - 2) #It's important to call this with the memoized name
    )
  }
}
mfibo <- memoise(mfibo) #Memoization line

Note that this memoisation has additional overhead over the memoisation we did in Section 2 because memoise works even for non numeric arguments (which would have failed in our vectorized example). However the speed-up over the original is still there:

microbenchmark::microbenchmark(fibonacci(20), mfibo(20), memoised_fibonacci(20)) 
Unit: nanoseconds
                   expr     min        lq       mean  median        uq      max
          fibonacci(20) 5291457 6031574.5 7138627.87 6599172 7929635.0 14128101
              mfibo(20)   58632   72422.5  151487.39  139772  190614.5   569502
 memoised_fibonacci(20)     606    1206.0    5186.56    3470    7506.5    39376
 neval
   100
   100
   100

The memoise package uses cachem which allows for fine control over where the previous results of the function. You can substitute the #Memoization line in the previous code for this memoise with finer control.

#Set memory to 100MB and time to memorizing only for 15 minutes
cm    <- cachem::cache_mem(max_size = 100 * 1024^2, max_age = 15 * 60)
mfibo <- memoise(mfibo, cache = cm)

To keep previous computations across different R sessions you can cache directly to disk (slower) instead of memory:

cm    <- cachem::cache_disk(max_size = 100 * 1024^2, max_age = 15 * 60)
mfibo <- memoise(mfibo, cache = cm)

And in other languages?

You can also memoise in the most recent versions of Python either vie the built in functools or the memoization project. In Julia you can Memoize.jl. Let me know if you are interested in an entry for any of these.

Footnotes

  1. Some definitions start with \(f(0)\) but as R indexes vectors in \(1\) we’d better start with \(1\).↩


Stay in touch / Mantente en contacto

If you liked this post, subscribe to avoid missing future ones.

Si te gustĂł esta entrada, suscrĂ­bete para no perderte ninguna.