## Modernizing sugar in Rcpp11

I’m in the process of modernizing the implementation of sugar in Rcpp11. Previous work already improved performance of sugar by allowing sugar classes themselves to implement how to apply themselves into their target vector. For example the sugar class `SeqLen` leverages `std::iota` instead of a manual for loop.

``````template <typename Target>
inline void apply( Target& target ) const {
std::iota( target.begin(), target.end(), 1 ) ;
}
``````

sugar is based on the expression templates technique, a very popular pattern in C++ libraries that reduces the use of temporaries. Consider the following R code:

``````y <- exp(abs(x))
``````

To calculate `y` in R, we must first create a temporary vector to hold the result of `abs(x)` and then create a new vector to hold the result of `exp(abs(x))`. Then the vector we allocated to hold `abs(x)` is no longer referenced so becomes candidate for garbage collection, etc … we just allocated something to throw it away right after.

If we were to implement this manually in C++, we would typically write a for loop.

``````int n = x.size() ;
NumericVector y(n) ;
for( int i=0; i<n; i++){
y[i] = exp( abs( x[i] ) ) ;
}
``````

There are other ways to do it as well. For example using `std::transform`

``````NumericVector y(n) ;
std::transform( x.begin(), x.end(), y.begin(), [](double a){
return exp(abs(a)) ;
}) ;
``````

or using `Rcpp::transform` :

``````NumericVector y = transform( x.begin(), x.end(), []( double a){
return exp(abs(a));
}) ;
``````

But sugar definitely gives us the most expressive and closest to R solution:

``````NumericVector y = exp(abs(x)) ;
``````

Here `exp` and `abs` operate on the entire vector, not just the scalar `double`. As with other expression templates libraries, sugar delays the actual work as much as possible. The expression `exp(abs(x))` itself does not create a vector but creates an object that can be assigned to a `NumericVector` :

``````auto y = exp(abs(x)) ;
Rprintf( "type(y) = %s", DEMANGLE(y) ) ;
// type(y) = Rcpp::sugar::Sapply<14, true, Rcpp::sugar::Sapply<14, true, Rcpp::Vector<14, Rcpp::PreserveStorage>, double (*)(double)>, double (*)(double)>
``````

The expression `exp(abs(x))` has created the same object as the expression `sapply(sapply(x,::abs),::exp)`. The first benefit from this is that the code for `abs` and for `exp` is exactly the same. After all, we just vectorize a function that operate on scalar values.

The second benefit is that we can identify that we want function composition here. We don’t want to first `sapply` `abs` over `x` and then `sapply` `exp` over the previous result, what we really want is `sapply` the composition of the `abs` and `exp` scalar functions. That’s exactly how it is implemented in `Rcpp11`. We can even retrieve that function:

``````auto fun = y.fun ;
Rprintf( "type(fun) = %s\n", DEMANGLE(decltype(fun)) ) ;
// type(fun) = Rcpp::functional::Compose<int (*)(int), double (*)(double)>
``````

And the `Compose` class looks like this:

``````template <typename F1, typename F2>
class Compose : public Functoid<Compose<F1,F2>> {
private:
F1 f1 ;
F2 f2 ;

public:
Compose( F1 f1_, F2 f2_ ) : f1(f1_), f2(f2_){}

template <typename... Args>
inline auto operator()( Args&&... args ) const -> decltype( f2( f1( std::forward<Args>(args)... ) ) ) {
return f2( f1( std::forward<Args>(args)... ) );
}

} ;
``````

And we can just as easily compose lambda functions:

``````#include <Rcpp.h>
using namespace Rcpp ;

// [[export]]
NumericVector test(NumericVector x){
auto square = []( double a ){ return a*a ; } ;
auto twice  = []( double a ){ return a*2 ; } ;

auto y = sapply( sapply(x, square), twice ) ;
Rprintf( "type(y) = %s\n", DEMANGLE(decltype(y)) ) ;

auto fun = y.fun ;
Rprintf( "type(fun) = %s\n", DEMANGLE(decltype(fun)) ) ;

double val = fun(3.0);
Rprintf( "val = %5.3f\n", val ) ;

NumericVector res = y ;
return res ;
}

/*** R
test(1:10)
*/
``````

which gives:

``````\$ Rcpp11Script /tmp/exp.cpp

> test(1:10)
type(y) = Rcpp::sugar::Sapply<14, true, Rcpp::sugar::Sapply<14, true, Rcpp::Vector<14, Rcpp::PreserveStorage>, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::\$_0>, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::\$_1>
type(fun) = Rcpp::functional::Compose<test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::\$_0, test(Rcpp::Vector<14, Rcpp::PreserveStorage>)::\$_1>
val = 18.000
   2   8  18  32  50  72  98 128 162 200
``````

It becomes even more interesting for how missing values should be treated. Let’s now consider that the expression is used on an integer vector.

``````auto square = []( int a ){ return a*a ; } ;
auto twice  = []( int a ){ return a*2 ; } ;
auto y = sapply( sapply(x, square), twice ) ;
``````

In an iteration based implementation of sugar (like e.g. the implementation in Rcpp), to be correct we would have no choice but to check for `NA` twice because the two functions operate somewhat independently from one another. So the iteration based implementation in Rcpp would lead to code equivalent to this:

``````for( int i=0; i<n; i++){
int x_i = x[i] ;
int tmp_i = (x_i == NA_INTEGER) ? NA_INTEGER : square(i) ;
y[i] = (tmp_i == NA_INTEGER ) ? NA_INTEGER : twice(i) ;
}
``````

With the new composition based approach, we only have to check for `NA` once, which leads to code much closer to what we would intuitively write manually:

``````for( int i=0; i<n; i++){
int x_i = x[i] ;
y[i] = x_i == NA_INTEGER ? NA_INTEGER : twice(square(x_i)) ;
}
``````

The composition based approach is still a work in progress, but I believe it will be yet another way to achieve performance improvements for the modernized version of sugar.

We can also generalize the composition approach to several input vectors, via mapply. Consider the expression : `x + exp(y) + abs(sin(z))`. The challenge is to identify actual vectors we want to iterate over: `x`, `y` and `z` and generate the appropriate function composition. Should be fun.