Emojis at #useR2017

Romain François

Tuesday, Jul 11, 2017

4 min read

emojis tweetstorm useR useR2017 rtweet shiny shinydashboard

Because of imminent birth of my 2nd daughter, I did not get to go to useR, so I’ve been feeling a mix of frustration, impatience and fomo. I was really pleased to find out that for the second year, part of the conference would be streamed live and most of it will be available later.

I combined watching the live stream and constantly update the column with the #useR2017 filter on my tweetbot, until this started to be overwhelming enough for me to come up with a way to visualise the tweet storm from a few angles. This was a good opportunity to learn the rtweet package to grab the tweets, and shinydashboard for the visualisation.

A few hours and a few tweets later, we released this as the pet project tweetstorm, a shiny dashboard to glimpse activity on twitter, to answer some questions:

How many tweets from how manuy twitter users
Related hashtags
Which tweets were the most fav’ed or retweeted
Who tweeted the most, was quoted or replied to the most
…

Some of use use emojis in our tweets, and I’ve learned from Sean Kross how to extract them from the tweet, borrowing a regex from his article: Which Emojis Does Lucy Use in Commit Messages?

There are now two emojis things in the app, the first one to appear packs emojis together based on how many times they have been used, through the tweetstorm::extract_emojis function.

#' Extract emojis
#'
#' @param text text with emojis
#'
#' @return a tibble with the columns `Emoji` and `n`
#' 
#' @references 
#' Inspired from [](http://seankross.com/2017/05/30/Which-Emojis-Does-Lucy-Use-in-Commit-Messages.html)
#' 
#' @export
#' @importFrom stringr str_extract_all str_split
#' @importFrom tibble as_tibble
#' @importFrom magrittr %>% set_names
extract_emojis <- function(text){
  str_extract_all(text, emoji_regex ) %>% 
    flatten_chr() %>% 
    str_split("") %>% 
    flatten_chr() %>% 
    not_equal("-") %>% 
    table() %>% 
    sort(decreasing = TRUE) %>% 
    as_tibble() %>% 
    set_names( c("Emoji", "n") )
}

There’s still some work to deal with skin tone modifiers, but it’s great to see that the most popular emojis spread love and package use.

The other more emoji related display on the app groups them by users, so that we can verify that Lucy also makes extensive use of emojis on twitter. That’s the job of the tweetstorm::extract_emojis_users function:

#' Extract information about users that use emojis
#' 
#' @param tweets tweets data set
#' @export
extract_emojis_users <- function(tweets){
  
  data <- tweets %>% 
    select( user_id, text ) %>% 
    mutate( 
      emojis = str_extract_all(text, emoji_regex ) %>% map( not_equal, "-" )
    ) %>% 
    filter( map_int(emojis, length) > 0 ) %>% 
    group_by( user_id ) %>% 
    summarise( 
      emojis = map(emojis, ~ flatten_chr(str_split(., "") ) ) %>% flatten_chr() %>% table() %>% list()
    ) %>% 
    mutate( 
      total = map_int(emojis, sum), 
      distinct = map_int(emojis, length), 
      emojis = map_chr( emojis, ~ paste( names(.)[ order(., decreasing = TRUE)], collapse = "") )
    ) %>% 
    arrange( desc(total) )

  left_join( data, lookup_users(data$user_id), by = "user_id" ) %>% 
    mutate( img = sprintf('<img src="%s" />', profile_image_url ) ) %>% 
    select( img, name, total, distinct, emojis )
  
}

I am first there, but that’s not fair because at some point while developping the app I tweeted the list of all the emojis then used so far.

#useR2017 emojis: 💜😊🏻🙌📦👏🔥👍😻😂🌍🎉💻😍🤔🚀😁🙈🇧🇪👩😉🍻🎶🏆👀👉👶💕🤓🤗😀😎😱🌌🌎🌳🌻🍺🏀🏖👇👯💁💝💩🦄😃😅🙏🚄🚦🤞🇫🇷🌧🌾🍀🍁🍓🍕🍾🍿🎾🏈🏗🏸🏼🏽🐈🐍🐑🐒🐓🐔🐩🐱🐶👈👬👴👵👷💎💙💪📈📉📑📕📚🔍🔎🔑🔷🕵️🗻🗾🤖🤘🦁😄😇😒😕😩😮🙁🙃🙉🙋🚂🚲🤦
— Romain François 🦄 (@romain_francois) July 7, 2017

But apart from that, there’s no doubt that Lucy uses lots of emojis:

Here’s an example:

so grateful for inclusion initiatives at #useR2017 including
👯 newbie session
💰 diversity scholarships
🍼 breast feeding area
👶 child care
— 𝓛𝓾𝓬𝔂::maternity_leave🤱 (@LucyStats) July 7, 2017

The app is live here, it may still change and/or move to a new url at some point.

Right now it always uses the data I have extracted between these two instants before it’s too late.

> tweetstorm::useR2017 %>% pull(created_at) %>% range
[1] "2017-06-29 10:07:14 UTC" "2017-07-11 09:55:06 UTC"

I’ll keep updating the dataset for some time as there are likely to have new tweets, e.g. when we go through the slides or videos, …