5 Rcpp or how to easily embed C++ code into a package
Rcpp (“R-C-Plus-Plus”) is a package which facilitates the interface between C++ and . is an interpreted language, a feature that makes a number of things easy (including giving us access to the console in which we can evaluate code and variables on the fly). Nevertheless, this ease of use is counterbalanced by larger computation times compared to lower level languages such as C, Fortran and C++ (but which require compilation).
The curious reader is directed towards the online book Rcpp for everyone by Masaki E. Tsuda, which is a very thorough and complete resource for understanding how to use Rcpp, in complement to the introduction that can be found in the “Rcpp” section of Hadley Wickham’s book Advanced R3.
5.1 First function in Rcpp
👉 Your turn !
To make your package ready for use with
Rcpp, start by running the following command:usethis::use_rcpp()See the changes made
You should also add the following 2 Roxygen comments in the general help page of the package, as indicated in the console:
#' @useDynLib mypkgr #' @importFrom Rcpp sourceCpp, .registration = TRUE NULL
We are now going to create a first function in Rcpp to invert a matrix. For this, we will use the C++ library Armadillo. It is a modern and simple linear algebra library, highly optimized, and interfaced with via the RcppArmadillo package.
C++ is not a very different language from . The main differences that will have an impact for us:
C++is very efficient for for loops (including nested for loops – ⚠️ there is often one order that is faster than the other, due to the wayC++allocates and walks through memory).Each command must end with a semicolon
;.C++ is a typed language: you must declare the type of each variable before you can use it in the code.
👉 Your turn !
- Create a new
C++file from RStudio (via theFile>New File>C++ Filemenu), and save it in thesrcfolder. Take the time to read it and try to understand each line.- Compile and load your package (via the “Install and Restart” button) and try using the
timesTwo()function from the console.- Install the
RcppArmadillo👉 package, and don’t forget to make the necessary additions toDESCRIPTION(useusethis::use_rcpp_armadillo())- Using Hadley Wickham’s introduction to
Rcppin his book Advanced R, as well as the documentation for theRcppArmadillopackage and for theC++library Armadillo, try to write a short functioninvC()inC++that computes the inverse of a matrix.- When you have successfully compiled your
invCfunction and it is accessible from , create amvnpdf_invC()function from themvnpdfsmartimplementation replacing only the matrix inverse calculations with a call toinvC.- Evaluate the performance gain of this new implementation
mvnpdf_invC.
n <- 1000
mb <- microbenchmark(mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)),
mvnpdf(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmart(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfoptim(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdf_invC(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
times=100L)
mb
## Unit: microseconds
## expr min
## mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)) 26.568
## mvnpdf(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 1624.830
## mvnpdfsmart(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 1205.195
## mvnpdfoptim(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 670.432
## mvnpdf_invC(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE) 1166.737
## lq mean median uq max neval
## 52.562 64.02273 64.0830 75.5630 130.462 100
## 1836.677 1989.70294 1955.1055 2053.3620 4688.022 100
## 1251.135 1341.50524 1267.0025 1293.5910 3291.357 100
## 714.261 785.75762 735.9295 759.0945 3363.558 100
## 1250.910 1391.14394 1268.7245 1296.1535 3995.983 100
5.2 Optimize thanks to C++
As a general rule, not much computational time is gained by replacing an optimized function with a C++ function. Indeed, most of the base functions are in fact already wrappers around well optimized C or Fortran routines. The gain is then limited to the suppression of argument checking and type management (which is there for a reason!).
👉 Your turn !
From
mvnpdfsmart, propose a complete implementation inC++for computating the density of the multivariate Normal distributionmvnpdfsmartC().Evaluate the performance gain of this new implementation
mvnpdfsmartC.
You can download our proposal for mvnpdfsmartC.cpp here.
For (relatively small) additional speed gain (at the cost of code readability!), you can have a look at our optimized Armadillo C++ implementation in mvnpdfoptimC.cpp.
n <- 1000
mb <- microbenchmark(mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2)),
mvnpdf(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmart(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfoptim(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdf_invC(x=matrix(1.96, nrow = 2, ncol = n), Log=FALSE),
mvnpdfsmartC(x=matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log=FALSE),
mvnpdfoptimC(x=matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log=FALSE),
times=100L)
mb## Unit: microseconds
## expr
## mvtnorm::dmvnorm(matrix(1.96, nrow = n, ncol = 2))
## mvnpdf(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfsmart(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfoptim(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdf_invC(x = matrix(1.96, nrow = 2, ncol = n), Log = FALSE)
## mvnpdfsmartC(x = matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log = FALSE)
## mvnpdfoptimC(x = matrix(1.96, nrow = 2, ncol = n), mean = rep(0, 2), varcovM = diag(2), Log = FALSE)
## min lq mean median uq max neval
## 27.347 53.9150 63.23389 66.2150 73.5745 93.439 100
## 1614.539 1857.1975 2031.68202 1974.0475 2073.8620 4048.422 100
## 1171.862 1256.5475 1369.85961 1276.4120 1295.7025 3715.379 100
## 656.533 715.4295 791.64604 735.1710 748.8240 3474.832 100
## 1156.774 1253.4110 1345.08823 1274.0955 1293.2630 4705.242 100
## 34.030 37.8225 41.17343 39.6060 42.4145 141.573 100
## 18.081 21.3200 24.79147 23.1035 25.4200 135.792 100

Note that Rcpp functions can be used outside of a package architecture thanks to the Rcpp::sourceCpp() function. But, as we have seen that it is always desirable to manage all of one’s code inside a package, it is unlikely that you will need this !