Emojis at #useR2017

Because of imminent birth of my 2nd daughter, I did not get to go to useR, so I’ve been feeling a mix of frustration, impatience and fomo. I was really pleased to find out that for the second year, part of the conference would be streamed live and most of it will be available later.

I combined watching the live stream and constantly update the column with the #useR2017 filter on my tweetbot, until this started to be overwhelming enough for me to come up with a way to visualise the tweet storm from a few angles. This was a good opportunity to learn the rtweet package to grab the tweets, and shinydashboard for the visualisation.

A few hours and a few tweets later, we released this as the pet project tweetstorm, a shiny dashboard to glimpse activity on twitter, to answer some questions:

Some of use use emojis in our tweets, and I’ve learned from Sean Kross how to extract them from the tweet, borrowing a regex from his article: Which Emojis Does Lucy Use in Commit Messages?

There are now two emojis things in the app, the first one to appear packs emojis together based on how many times they have been used, through the tweetstorm::extract_emojis function.

extract_emojis <- function(text){
  str_extract_all(text, emoji_regex ) %>% 
    flatten_chr() %>% 
    str_split("") %>% 
    flatten_chr() %>% 
    not_equal("-") %>% 
    table() %>% 
    sort(decreasing = TRUE) %>% 
    as_tibble() %>% 
    set_names( c("Emoji", "n") )

There’s still some work to deal with skin tone modifiers, but it’s great to see that the most popular emojis spread love and package use.

The other more emoji related display on the app groups them by users, so that we can verify that Lucy also makes extensive use of emojis on twitter. That’s the job of the tweetstorm::extract_emojis_users function:

extract_emojis_users <- function(tweets){
  data <- tweets %>% 
    select( user_id, text ) %>% 
      emojis = str_extract_all(text, emoji_regex ) %>% map( not_equal, "-" )
    ) %>% 
    filter( map_int(emojis, length) > 0 ) %>% 
    group_by( user_id ) %>% 
      emojis = map(emojis, ~ flatten_chr(str_split(., "") ) ) %>% flatten_chr() %>% table() %>% list()
    ) %>% 
      total = map_int(emojis, sum), 
      distinct = map_int(emojis, length), 
      emojis = map_chr( emojis, ~ paste( names(.)[ order(., decreasing = TRUE)], collapse = "") )
    ) %>% 
    arrange( desc(total) )

  left_join( data, lookup_users(data$user_id), by = "user_id" ) %>% 
    mutate( img = sprintf('<img src="%s" />', profile_image_url ) ) %>% 
    select( img, name, total, distinct, emojis )

I am first there, but that’s not fair because at some point while developping the app I tweeted the list of all the emojis then used so far.

But apart from that, there’s no doubt that Lucy uses lots of emojis:

Here’s an example:

The app is live here, it may still change and/or move to a new url at some point.

Right now it always uses the data I have extracted between these two instants before it’s too late.

> tweetstorm::useR2017 %>% pull(created_at) %>% range
[1] "2017-06-29 10:07:14 UTC" "2017-07-11 09:55:06 UTC"

I’ll keep updating the dataset for some time as there are likely to have new tweets, e.g. when we go through the slides or videos, …