5 Rcpp
or how to easily embed C++
code into a package
Rcpp
(“R-C-Plus-Plus”) is a package which facilitates the interface between C++
and . is an interpreted language, a feature that makes a number of things easy (including giving us access to the console in which we can evaluate code and variables on the fly). Nevertheless, this ease of use is counterbalanced by larger computation times compared to lower level languages such as C
, Fortran
and C++
(but which require compilation).
The curious reader is directed towards the online book Rcpp for everyone by Masaki E. Tsuda, which is a very thorough and complete resource for understanding how to use Rcpp
, in complement to the introduction that can be found in the “Rcpp” section of Hadley Wickham’s book Advanced R4.
5.1 First function in Rcpp
👉 Your turn !
To make your package ready for use with
Rcpp
, start by running the following command:usethis::use_rcpp()
See the changes made
You should also add the following 2 Roxygen comments in the general help page of the package, as indicated in the console:
#' @useDynLib mypkgr #' @importFrom Rcpp sourceCpp, .registration = TRUE NULL
We are now going to create a first function in Rcpp
to invert a matrix. For this, we will use the C++
library Armadillo
. It is a modern and simple linear algebra library, highly optimized, and interfaced with via the RcppArmadillo
package.
C++
is not a very different language from . The main differences that will have an impact for us:
C++
is very efficient for for loops (including nested for loops – ⚠️ there is often one order that is faster than the other, due to the wayC++
allocates and walks through memory).Each command must end with a semicolon
;
.C++ is a typed language: you must declare the type of each variable before you can use it in the code.
👉 Your turn !
- Create a new
C++
file from RStudio (via theFile
>New File
>C++ File
menu), and save it in thesrc
folder. Take the time to read it and try to understand each line.- Compile and load your package (via the “Install and Restart” button) and try using the
timesTwo()
function from the console.- Install the
RcppArmadillo
👉 package, and don’t forget to make the necessary additions toDESCRIPTION
(useusethis::use_rcpp_armadillo()
)- Using Hadley Wickham’s introduction to
Rcpp
in his book Advanced R, as well as the documentation for theRcppArmadillo
package and for theC++
library Armadillo, try to write a short functioninvC()
inC++
that computes the inverse of a matrix.- When you have successfully compiled your
invC
function and it is accessible from , create amvnpdf_invC()
function from themvnpdfsmart
implementation replacing only the matrix inverse calculations with a call toinvC
.- Evaluate the performance gain of this new implementation
mvnpdf_invC
.
n <- 1000
mb <- microbenchmark(mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)),
mvnpdf(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmart(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfoptim(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdf_invC(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
times=100L)
mb
## Unit: microseconds
## expr min
## mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)) 43.706
## mvnpdf(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 3334.735
## mvnpdfsmart(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 2309.243
## mvnpdfoptim(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 1737.785
## mvnpdf_invC(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 2295.590
## lq mean median uq max neval cld
## 53.5255 66.97186 66.8095 76.465 101.926 100 a
## 3453.1430 3609.39687 3541.6005 3638.443 7066.104 100 b
## 2341.6330 2542.28823 2370.2100 2400.899 6528.512 100 c
## 1788.3585 1899.61774 1817.4070 1847.993 6017.037 100 d
## 2340.0955 2531.82134 2363.2605 2406.680 6656.678 100 c
5.2 Optimize thanks to C++
As a general rule, not much computational time is gained by replacing an optimized function with a C++
function. Indeed, most of the base
functions are in fact already wrappers around well optimized C
or Fortran
routines. The gain is then limited to the suppression of argument checking and type management (which is there for a reason!).
👉 Your turn !
From
mvnpdfsmart
, propose a complete implementation inC++
for computating the density of the multivariate Normal distributionmvnpdfsmartC()
.Evaluate the performance gain of this new implementation
mvnpdfsmartC
.
You can download our proposal for mvnpdfsmartC.cpp
here.
For (relatively small) additional speed gain (at the cost of code readability!), you can have a look at our optimized Armadillo C++
implementation in mvnpdfoptimC.cpp
.
n <- 1000
mb <- microbenchmark(mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)),
mvnpdf(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmart(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfoptim(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdf_invC(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmartC(x=matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log=FALSE),
mvnpdfoptimC(x=matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log=FALSE),
times=100L)
mb
## Unit: microseconds
## expr
## mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2))
## mvnpdf(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfsmart(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfoptim(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdf_invC(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfsmartC(x = matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log = FALSE)
## mvnpdfoptimC(x = matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log = FALSE)
## min lq mean median uq max neval cld
## 43.829 53.7100 64.21994 61.8895 73.9845 105.903 100 a
## 3284.797 3390.5770 3620.55092 3462.2655 3582.8465 7417.392 100 b
## 2300.387 2340.7925 2500.75113 2353.8305 2386.7535 5860.335 100 c
## 1728.601 1776.1405 1982.84569 1793.4220 1824.0285 9680.633 100 d
## 2297.353 2336.0160 2392.90760 2356.0240 2380.2550 5090.355 100 c
## 51.209 53.5050 58.07404 56.1905 60.2085 122.344 100 a
## 35.670 37.8635 42.37268 41.1230 44.4645 115.866 100 a
Note that Rcpp
functions can be used outside of a package architecture thanks to the Rcpp::sourceCpp()
function. But, as we have seen that it is always desirable to manage all of one’s code inside a package, it is unlikely that you will need this !