arrow, rrrow, rcher, spurrrow
The naming conundrum
Here I am again at the conundrum of choosing a name for a thing. This is hard, I like when it’s over and I have the perfect name, and I feel finally free to try to match the personnality of code to the name.
The python front end is called
pyarrow, but I guess names are less of an issue with python as the first thing I’ve seen on many scripts using
import pyarrow as pa.
So it looks like
[rR]arrow is the natural pattern to use to name the bindings. I have mixed feelings about this, so I coined
rrrow instead, making the regex
That syntax looks solid to me. pic.twitter.com/vdCd3VN0uY— JD Long (@CMastication) March 6, 2018
Here is an extract from my conversation with Wes about it:
arrow makes a lot of sense actually, we’re already in R so we don’t need a prefix reminder.
I kind of like
rcher too, without the capital R though in the interest of saving ⌨️ time, and I ❤️ the idea of pretty much outsource the marketing to Mara who has the super power to tweet archer gifs faster than … well I don’t remember the typical expression for something fast, but pretty fast …
More things to learn
I don’t just sit around and think about naming things all day, I also sometimes procrastinate, but not today, I’ll procrastinate tomorrow.
Arrow is already a mature and somewhat complex project with many moving parts, so being tasked to “do the r thing” is kind of intimidating at first, I’ll try to not let myself go to anxiety too soon.
I spent my first
#arrowtuesday reading documentation, installing things, and generally get a feel of the project, mostly through the python front end.
current status 🐍 pic.twitter.com/Ra9c0bxUOL— Romain François 🦄 (@romain_francois) March 6, 2018
I need to learn about python, here’s my current amazon cart. I’ve been meaning to read Wes’s book for some time and I’m not the onmly member of my 👪 who wants to learn about 🐍
In essence the task is to make the arrow data structures accessible to R, and be inline with the principles of Arrow of limiting the copies to a minimum.
The tools we have at our disposal in R for this are external pointers, they let us get hold of an instance of a C++ class with enough hooks to destruct the object once the wrapping R object around it (the external pointer, aka
EXTPTRSXP) goes out of scope.
Rcpp has modules around external pointers, but I’m not really satisfied with it because they take forever to compile and still at the moment require a lot of boiler plate work when used with a C++ library that gies beyond hello world.
But we need to go further, because external pointers only give you ways to get hold of an object and maintain its life cycle, as soon as you want to do anything in R with the data, you have to convert it to R data types. However, there’s ALTREP on the horizon.
ALTREP is a big deal, it makes it possible to decouple the metadata of R objects (all the stuff that goes in the
SEXPREC bits) from the actual data, so whereas now the actual data directly follows the header, ALTREP adds abstractions that we can use to add indirections.
This is still somewhat obscure to me, but in short if the data can be elsewhere, it can definitely come from some Arrow structure. Exciting times ahead, I’m leaving this here, the thread has some references about 📦 using ALTREP.
is there a minimal 📦 using ALTREP ?— Romain François 🦄 (@romain_francois) March 6, 2018
See you next tuesday for more R and Arrow stuff.