Pro Grammar and Devel Hoper

7 min read

R grammar

I’ve been teasing about this post for some time now.

So I’m going to try prove that this is not just for the pun, however proud of it I might be, and believe me I love it when a language plays tricks with my brain and a pun presents itself to me. This is valid for most of the languages I use daily, including French, that is described in Wikipedia as a Romance language,
English our current lingua franca, C++ this weird high maintenance language that lets you accomplish amazing things once you’ve demonstrated that you can manage countless hours of head banging and investigate Sherlock class compiler error messages, and finally R.

There is no describing how much I love and care about R. Well first of all, I make a living out of selling R services, that’s got to count for something right. But beyond that, I’ve been using R for more than 13 years now. It might seem a bit irrationnal to be addicted to such a quirky language especially given all the temptations around for better, faster, cleaner, whatever, languages out there. Sure I can play with these fancy languages, but for some reason I always come back to R, R makes me tick.

Something else that gives me pride is that I’ve been able to make a difference in the R community, mostly by my involvement in bridging R and C++. When you use a language for that long you get to become familiar with the most intimate details. This had led me a few years back to play with the parser for the purpose of my syntax highlighter package, just one of those why not
projects, which evidently by some random course of action had the effect of landing some of my code directly into the R source. So it’s like there is always a part of me in R.

So yes, I’m proud of that. Just however, realize that me being proud of something does not necessarily make it a great accomplishment. For example, lately I’ve been proud I successfully managed to grow tomatoes.

Anyway, the gram.y file in which my name is, is perhaps one of the deepest component, it defines the grammar, the set of rules that make up the language. R’s grammar is not that complicated and it has not changed much over the years.

It is one thing to add vocabulary, functions or verbs as the hipsters call them now ;) but messing with the grammar is a whole new dimension.

R does not really let you change its grammar from the perspective of say a package. There have been some things getting close to grammatical changes, i.e. adoption from data.table of the phantom := to do stuff like that:

DT[, p := x/sum(x), by=group]  

However biased I might be about data.table, I got to admit that the use of := here is beautiful.

This has only been possible because := is syntactically valid, i.e. it parses, but it does not mean anything in standard R, so data.table was able to reclaim the meaning.

Another thing that comes close to grammatical change is the weird operator construct between two %. What a weird and lovely feature. You can just define whatever binary operator you like, as long as it is between two % :

`%w/o%` <- function(x, y){ setdiff(x, y) }
letters %w/o% c("a", "b")

This is what makes possible the latest storm in the R world, at least as I felt it at useR!2014, the piping operators as introduced by magrittr and excessively promoted by dplyr.

We’ve seen lots of look how piping is awesome in association with dplyr and again this is probably something I’m biased about, so I’ll use a dplyr free piping example, taken from @stefanbache’s presentation in the CopenhagenR meetup.

The example is about expressing a cooking recipe, something that appeals naturally to me as a pretentious frenchman.

First expressed with regular R nesting of function calls. It has obviously been made cryptic and perhaps some indenting could help. But fundamentally you can’t read that. What is degress an argument of ?

The pipe operator %>% turns this around into something that looks a lot like more what a cook would do. It reads as a story.

Fine, enough of blind admiration for now. I love piping and I truly think this is one of the most exciting recent R innovations, but I still stand by this. %>% is ugly.

Since I have the skills for that sort of weird stuff, I’ve been playing with the grammar in this branch of a fork of @winston_chang’s read only mirror of the R source. In a less convoluted way: I’ve been messing with the grammar.

Change the grammar has been on my list of fantasies for some time, and this gave be enough incentive to actually look into it. And this is a pretty simple change, but I’ll get into other stuff later.

For now, what I’ve done in the branch is legalize some tokens. I’ve legalized >>, |> and <<, but their meaning is not defined, similarly to what R does with :=. So we can e.g. define >> and |> to be synonyms of magrittr’s %>% :

`|>` <-  magrittr::`%>%`

iris |> summary  

The other one >> could be used e.g. to stream some content into a file.

`<<` <- function(x, y){ ... }
f <- file( "/tmp/yada.txt", open = "w" )  
f << "yada" << "yada" << "yada" << "I'm really tired today"  

There are other things I’d like to play with in terms of the R grammar, which very likely transform into something more challenging than what I anticipate.

I’d like some syntax for optional typing. R is dynamically typed, and that’s not going to change, we all love that, yada, yada, yada, but optional typing could help and remove some clutter. Consider:

x <- foo()  
if( ! is(x, "bar") ){  
  stop( "nope, this not a bar" )
}

We see many variations of that littering the code we write and the packages we use. What about something that would let you express more elegantly the type you require, i.e. something like this:

bar x <- foo()  

This could make its way into function arguments too and give us some method dispatch that would be for once nice to use. What’s the deal with setMethod, setGeneric and all that verbosity. Other languages have come up with nicer ways to deal with overloading. We could have something like this:

foo <- function( character x ){  
   # do some characer stuff
}
foo <- function( data.frame x) {  
   # do some data frame stuff
}

This could sit on top of what already exists, i.e. generate appropriate setGeneric and setMethod calls underneath, but give us some syntax that would be much nicer to use.

Furthermore with the verbosity, the way we define classes in R, from various implementations (S4, ref classes, R6, …) is trapped into the whole “everything is a function call” mantra.

Most languages that started without OO and incorporated it later (e.g. php) had their grammar improved to deal with expressing classes. In R, we are stuck with coming up with new vocabulary. I strongly believe that we could come up with a much better way to express classes if we were willing to think about it. This is blurry to me still, but what about something like this, borrowing the Person class from Introduction to R6

class Person {  
  character name = NA
  character hair = NA

  Person(character name, character hair){
     if (!missing(name)) self$name <- name
     if (!missing(hair)) self$hair <- hair
     self$greet()
  }

  set_hair(character val) {
      self$hair <- val
  }

  greet() {
      cat(paste0("Hello, my name is ", self$name, ".\n"))
  }

}

As I said higher up, implementing this could materialize into a big challenge, but I think R deserves a grammar update.

To come back to the title of this post, I hope I’ve demonstrated that I’m a pro grammar, i.e. I believe that we could benefit greatly from upgrading the grammar, and a devel hoper ad I hope to be able to commit enough enthusiasm to play further with this.