Rcpp is smoking fast for agent-based models in data frames
Here’s the blog post originally posted on babelgraph.org on July 11, 2012. Thanks to Hadley Wickham for referencing some of content here, and apologies for the broken URL. NB. The original C++ code didn’t seem to compile on my computer today. It required a small tweak with NumericVector::create – see below.
In particular, R data frames provide a simple framework for representing large cohorts of agents in stochastic epidemiological models, such as those representing disease transmission. This approach is much easier and likely faster than trying to implement cohorts of R objects. In this post we’ll explore a simple agent-based model, and then benchmark a few different approaches to iterating through the cohort. Rcpp outperforms all of them by a few orders of magnitude. Priceless.
Let’s say we are trying to predict the probability of someone choosing to receive a vaccination in a given year. The decision will be based on their age (age), gender (female), and whether or not they were infected with the virus last year (ily). Let’s make up some data:
The probability of choosing to be vaccinated will be given by the following function:
Try some iterators: for loop, apply, ddply
Let’s create a testable function for each strategy. The objective is to take a cohort data frame as input, calculate the vaccination probability for each member of the cohort, then return a data frame with the cohort data plus a new column for the vaccination probability (p).
Now rewrite the test using a traditional for-loop in C++ including a helper function to calculate the vaccination probability. I use the inline library so I can embed the C++ directly in the R script, thus obviating additional .cpp or .h files. Self-contained code is nice.