filter with context

Romain François

Wednesday, Jun 28, 2017

6 min read

The new programming tools that arrived with the 0.7 series of dplyr are pretty cool. Good bye old clunky functions suffixed by the underscore and their weird use of lazyeval::interp …

The tidyeval framework give us a way to make new verbs with dplyr-like syntax. It takes some getting used to, and might not be the easiest thing to teach, although compared to the old approach it makes a lot more sense. In the webinar Hadley said that he does not yet really know how to teach tidyeval. I don’t pretend I have it covered, but here’s an example of using tidy eval to filter with context.

The idea is to add context to filter, similar to the -A and -B options for the unix grep, in other words we want the lines that match the filter condition (as usual), a given number of lines before and a given number of lines after.

To make things simple for now, let’s first consider a single filter condition, so we want a function with this interface:

context_filter <- function( data, expr, before = 0L, after = 0L){
  ...
}
context_filter( mtcars, cyl == 4 )

What the tidy eval framework give us is the ability to pass cyl == 4 by expression so that we can inline it into some other expression. The game is to get the indices that match the condition, expand those to add before and after indices, and then use these in a slice call.

First we need a tool to do the expanding. Nothing fancy here, just plain old regular rep and seq stuff. For each element in idx we add the context, and then we just make sure the indices appear only once and are restricted to the extent of the rows

context <- function(idx, n, before = 0L, after = 0L){
  span <- seq( -before, after )
  res <- unique( rep( idx, each = length(span) ) + span )
  res[ res >= 1L & res <= n]
}
context( c(4, 8), 10, before= 1, after = 1)

## [1] 3 4 5 7 8 9

context( c(4, 5), 10, before= 1, after = 1)

## [1] 3 4 5 6

context( c(1, 10), 10, before= 1, after = 1)

## [1]  1  2  9 10

Now we just need to feed that context function with indices:

context_filter <- function( data, expr, before = 0, after = 0){
  expr <- enquo(expr)
  slice( data, context(which(!!expr), n(), before, after) ) 
}

The tidyeval magic is to:

first capture the expression with enquo
then inline it into another expression with the unqoting operator !!

So that we can let R do the copy and paste for us:

context_filter( mtcars, cyl == 4, before = 1, after = 1)

## # A tibble: 20 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4
##  2  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
##  3  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3     1
##  4  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4
##  5  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
##  6  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2
##  7  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4     4
##  8  14.7     8 440.0   230  3.23 5.345 17.42     0     0     3     4
##  9  32.4     4  78.7    66  4.08 2.200 19.47     1     1     4     1
## 10  30.4     4  75.7    52  4.93 1.615 18.52     1     1     4     2
## 11  33.9     4  71.1    65  4.22 1.835 19.90     1     1     4     1
## 12  21.5     4 120.1    97  3.70 2.465 20.01     1     0     3     1
## 13  15.5     8 318.0   150  2.76 3.520 16.87     0     0     3     2
## 14  19.2     8 400.0   175  3.08 3.845 17.05     0     0     3     2
## 15  27.3     4  79.0    66  4.08 1.935 18.90     1     1     4     1
## 16  26.0     4 120.3    91  4.43 2.140 16.70     0     1     5     2
## 17  30.4     4  95.1   113  3.77 1.513 16.90     1     1     5     2
## 18  15.8     8 351.0   264  4.22 3.170 14.50     0     1     5     4
## 19  15.0     8 301.0   335  3.54 3.570 14.60     0     1     5     8
## 20  21.4     4 121.0   109  4.11 2.780 18.60     1     1     4     2

Now we can generalise this to multiple filter conditions with quos and !!!. Each of the filter condition gives us a logical vector and we want to & them all. That’s a job for Reduce :

Reduce( "&", list( c(T,T,F), c(T,F,F), c(T,T,T) )  )

## [1]  TRUE FALSE FALSE

Now we can capture all the conditions given in the ... by expression and splice them into a list via the !!! operator:

context_filter <- function( data, ..., before = 0, after = 0){
  dots <- quos(...)
  slice( data, context(which( Reduce("&", list(!!!dots) )  ), n(), before, after) ) 
}
context_filter( mtcars, cyl == 4, disp > 100, before = 1)

## # A tibble: 11 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4     4
##  2  22.8     4 108.0    93  3.85 2.320 18.61     1     1     4     1
##  3  14.3     8 360.0   245  3.21 3.570 15.84     0     0     3     4
##  4  24.4     4 146.7    62  3.69 3.190 20.00     1     0     4     2
##  5  22.8     4 140.8    95  3.92 3.150 22.90     1     0     4     2
##  6  33.9     4  71.1    65  4.22 1.835 19.90     1     1     4     1
##  7  21.5     4 120.1    97  3.70 2.465 20.01     1     0     3     1
##  8  27.3     4  79.0    66  4.08 1.935 18.90     1     1     4     1
##  9  26.0     4 120.3    91  4.43 2.140 16.70     0     1     5     2
## 10  15.0     8 301.0   335  3.54 3.570 14.60     0     1     5     8
## 11  21.4     4 121.0   109  4.11 2.780 18.60     1     1     4     2

context_filter( starwars, skin_color == "gold", eye_color == "yellow", before = 1, after = 1)

## # A tibble: 3 x 13
##             name height  mass hair_color  skin_color eye_color birth_year
##            <chr>  <int> <dbl>      <chr>       <chr>     <chr>      <dbl>
## 1 Luke Skywalker    172    77      blond        fair      blue         19
## 2          C-3PO    167    75       <NA>        gold    yellow        112
## 3          R2-D2     96    32       <NA> white, blue       red         33
## # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

More about tidy eval on the dplyr programming vignette.