gofast
Today was about adding some more confidence about the feasability of ergo
. In particular I was interested about these two problems that I decided to play with at the same time.
Returning a slice of go strings to the R side. We’ve already seen how to return a single string and how to return a slice of numbers but slices of strings are a different beast.
Using the ast package to manipulate go code as an abstract syntax tree (ast).
The motivation for the first item should be obvious, so let’s talk about the motivation for the second ⚫. We need some background first. Remember Rcpp attributes, most notoriously these // [[Rcpp::export]]
comments you add before a C++ function to express that you want this function available to the R side, something like this:
// [[Rcpp::export]]
double fahrenheit(double celcius){
return 32 + 1.8 * celcius ;
}
Attributes is one of the best things that ever happened to Rcpp, this was an amazing piece of engineering work by JJ that has changed for the better how we use C++ functions with Rcpp. But this was not the original plan. Initially we wanted something automatic, something that would identify automatically functions that could be exported, or perhaps based on some naming convention. The problem is C++ is very hard to parse.
Go on the other hand is a simpler language, and its standard library comes with all the tools to analyse Go code, transform it into an ast, manipulate it, and so on. There are many resources to understand how this works, e.g. I’ve watched this video:
and played with this blog post. In short once you have parsed a Go file with parser.ParseFile
you can traverse the nodes of the ast with ast.Inspect
function.
I’ve used it to make a simple function that gives the names of the functions in a Go file:
package gofast
import (
"go/ast"
"go/parser"
"go/token"
"log"
)
func Gofast( code string ) []string {
fset := token.NewFileSet()
node, err := parser.ParseFile(fset, "", code, parser.ParseComments)
if err != nil {
log.Fatal(err)
}
functions := []string{} ;
ast.Inspect(node, func(n ast.Node) bool {
// Find Functions
fn, ok := n.(*ast.FuncDecl)
if ok {
functions = append( functions, fn.Name.Name )
}
return true
})
return functions
}
So we have the Gofast
function that takes some code in a string and return a slice of strings.
This brings us back to the first problem, we want to call this function from R, so we want to be able to return a slice of strings as an R character vector. We need a few tools from the R api. - Rf_allocVector
to create a vector of the right type, here STRSXP
- Rf_protect
to protect that vector from the garbage collector - Rf_unprotect
to lift that protection - SET_STRING_ELT
to set an individual R string (i.e. a CHARSXP
) in the vector - Rf_mkCharLenCE
to create an R string from a sequence of char
and a size.
So following the pattern I’ve used in my previous go adventures, here is another Go function that sits between the real pure go code we’ve seen before, and the R things. Don’t worry if it looks ugly, it involves both low level stuff from Go and R apis.
package main
/*
#define USE_RINTERNALS
#include <R.h>
#include <Rinternals.h>
*/
import "C"
import "gofast"
//export Gofast
func Gofast( x string ) C.SEXP {
functions := gofast.Gofast(x)
n := len(functions)
var out C.SEXP = C.Rf_allocVector( C.STRSXP, C.long(n) )
C.Rf_protect(out)
defer C.Rf_unprotect(1)
for i, s := range functions {
C.SET_STRING_ELT( out, C.R_xlen_t(i), C.Rf_mkCharLenCE( C._GoStringPtr(s), C.int(len(s)), C.CE_UTF8 ) )
}
return out ;
}
func main() {}
This starts nicely by calling the other Gofast
function to get the slice of strings functions := gofast.Gofast(x)
and then the rest of it is just mitigating R and Go low level interfaces. I found out about C._GoStringPtr
by asking this question on stack overflow.
The whole purpose of ergo
is that we will not have to care about that as eventually it will be generated automatically.
We also need an R api compatible C function, i.e. a function that only used SEXP
in and out. It’s a bit less intimidating than the other one, although I would not want to write this one manually either, I did, but I’m weird.
#include "_cgo_export.h"
SEXP gofast( SEXP x ){
if( TYPEOF(x) != STRSXP ) error("expecting an string") ;
SEXP sx = STRING_ELT(x, 0) ;
GoString gos = { (char*)CHAR(sx), SHORT_VEC_LENGTH(sx) } ;
return Gofast(gos) ;
}
Finally, we need an R function to call gofast
, it looks like this:
#' @useDynLib gofast
#' @export
gofast <- function(x) {
.Call("gofast", x, PACKAGE = "gofast")
}
and then finally we can all it:
code <- '
package foo
func Test() int{ return 3 }
func Bla(){}
'
gofast::gofast(code)
## [1] "Test" "Bla"
If we take a step back from all the layers, we started from a Go function that takes a single string and returns a slice of strings and we call that function from R.
Incidentally, that function’s job is to list the Go functions in the Go file. This is just a first step, down the line, we’ll be able to get more information from the ast and use it to generate all the boiler plate intermediate functions. It’s been done with Rcpp from an approximate parser, so there’s no reason why this can’t be done too with exact parsing from Go.
Mission accomplished, I have some more confidence about the potential of ergo
. The code discussed here is in the gofast repo in the rstats-go organisation we created to structure the development of ergo
.