filter with context
The new programming tools that arrived with the 0.7
series of dplyr are pretty cool. Good bye old clunky functions suffixed by the underscore and their weird use of lazyeval::interp
…
The tidyeval
framework give us a way to make new verbs with dplyr
-like syntax. It takes some getting used to, and might not be the easiest thing to teach, although compared to the old approach it makes a lot more sense. In the webinar Hadley said that he does not yet really know how to teach tidyeval. I don’t pretend I have it covered, but here’s an example of using tidy eval to filter with context.
The idea is to add context to filter
, similar to the -A
and -B
options for the unix grep
, in other words we want the lines that match the filter
condition (as usual), a given number of lines before and a given number of lines after.
To make things simple for now, let’s first consider a single filter
condition, so we want a function with this interface:
context_filter <- function( data, expr, before = 0L, after = 0L){
...
}
context_filter( mtcars, cyl == 4 )
What the tidy eval framework give us is the ability to pass cyl == 4
by expression so that we can inline it into some other expression. The game is to get the indices that match the condition, expand those to add before
and after
indices, and then use these in a slice
call.
First we need a tool to do the expanding. Nothing fancy here, just plain old regular rep
and seq
stuff. For each element in idx
we add the context, and then we just make sure the indices appear only once and are restricted to the extent of the rows
context <- function(idx, n, before = 0L, after = 0L){
span <- seq( -before, after )
res <- unique( rep( idx, each = length(span) ) + span )
res[ res >= 1L & res <= n]
}
context( c(4, 8), 10, before= 1, after = 1)
## [1] 3 4 5 7 8 9
context( c(4, 5), 10, before= 1, after = 1)
## [1] 3 4 5 6
context( c(1, 10), 10, before= 1, after = 1)
## [1] 1 2 9 10
Now we just need to feed that context
function with indices:
context_filter <- function( data, expr, before = 0, after = 0){
expr <- enquo(expr)
slice( data, context(which(!!expr), n(), before, after) )
}
The tidyeval magic is to:
- first capture the expression with
enquo
- then inline it into another expression with the unqoting operator
!!
So that we can let R do the copy and paste for us:
context_filter( mtcars, cyl == 4, before = 1, after = 1)
## # A tibble: 20 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 5 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 6 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 7 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## 9 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 10 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 11 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 12 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## 13 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## 14 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## 15 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## 16 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## 17 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## 18 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## 19 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## 20 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Now we can generalise this to multiple filter
conditions with quos
and !!!
. Each of the filter
condition gives us a logical vector and we want to &
them all. That’s a job for Reduce
:
Reduce( "&", list( c(T,T,F), c(T,F,F), c(T,T,T) ) )
## [1] TRUE FALSE FALSE
Now we can capture all the conditions given in the ...
by expression and splice them into a list via the !!!
operator:
context_filter <- function( data, ..., before = 0, after = 0){
dots <- quos(...)
slice( data, context(which( Reduce("&", list(!!!dots) ) ), n(), before, after) )
}
context_filter( mtcars, cyl == 4, disp > 100, before = 1)
## # A tibble: 11 x 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 3 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 4 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 5 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 7 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## 8 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## 9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## 10 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## 11 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
context_filter( starwars, skin_color == "gold", eye_color == "yellow", before = 1, after = 1)
## # A tibble: 3 x 13
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## 2 C-3PO 167 75 <NA> gold yellow 112
## 3 R2-D2 96 32 <NA> white, blue red 33
## # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
More about tidy eval on the dplyr programming vignette.