Sharla Gelfand

Are people obsessed with Virgos?

As you may know, I like astrology. Astro Poets is a great twitter account run by two poets who post painfully accurate descriptions and commentary on the twelve zodiac signs.

Today they tweeted that a Viirgo wrote in asking if people like tweets about Virgos more.

Now, I’m not a Virgo, and they definitely made fun of the person for asking – but now I gotta know.

Are people obsessed with Virgos? Which signs garner the most attention?

We’ll take a look at Astro Poets’ tweet activity using the excellent rtweet package. I’ve already set up my access token, but if you’ve never used this package before, the instructions are here.

# install.packages("rtweet")
library(rtweet)

The Astro Poets account doesn’t have too many tweets, so we’ll use rtweet to pull them all in, excluding retweets and replies. The function lookup_users() grabs information on a certain user, including their number of tweets, so I don’t have to manually check the account and hard code that number in (reproducibility!).

library(dplyr)

n_tweets <- lookup_users("poetastrologers") %>% with(statuses_count)
## Warning: Please set env var TWITTER_PAT to your Twitter personal access
## token(s)
tweets <- get_timeline("poetastrologers", n = n_tweets)

tweets <- tweets %>%
  dplyr::filter(!is_retweet & is.na(in_reply_to_status_status_id)) %>%
  dplyr::select(created_at, text, retweet_count, favorite_count)

tweets %>% glimpse()
## Observations: 3,175
## Variables: 4
## $ created_at     <dttm> 2017-12-21 16:02:11, 2017-12-21 00:19:08, 2017...
## $ text           <chr> "Your Sun Sign is only part of your personality...
## $ retweet_count  <int> 58, 152, 957, 796, 33, 1100, 1763, 866, 836, 83...
## $ favorite_count <int> 323, 962, 4141, 3541, 394, 4802, 6446, 4090, 27...

To see if people are interacting with sign’s tweets differently, first we’ll figure out which sign each tweet is about, by checking for the presence of each sign in each tweet. For simplicity’s sake, I’ll only look at tweets that mention one sign (not zero, and not more than one).

library(stringr)
library(fuzzyjoin)

signs <- data_frame(sign = c("Aries", "Taurus", "Gemini", "Cancer",
                             "Leo", "Virgo", "Libra", "Scorpio",
                             "Sagittarius", "Capricorn", "Aquarius", "Pisces"))

tweets_with_sign <- tweets %>%
  regex_inner_join(signs, by = c("text" = "sign"), ignore_case = TRUE) %>%
  group_by(text) %>%
  mutate(n = n()) %>%
  filter(n == 1) %>%
  ungroup() %>%
  select(-created_at, -n)

tweets_with_sign %>% glimpse()
## Observations: 2,610
## Variables: 4
## $ text           <chr> "Rolling into Capricorn season like https://t.c...
## $ retweet_count  <int> 957, 836, 834, 1012, 792, 1230, 733, 2009, 703,...
## $ favorite_count <int> 4141, 2701, 2522, 2906, 2513, 3981, 2416, 4944,...
## $ sign           <chr> "Capricorn", "Pisces", "Aquarius", "Capricorn",...

First things first – do they write about each sign equally? Typically their style is serialized; a bunch of tweets at once, with one about each sign. So, we expect each sign to have a 1/12 chance of being written about, around 8.3%.

library(forcats)
library(ggplot2)

count_by_sign <- tweets_with_sign %>%
  group_by(sign) %>%
  count() %>%
  ungroup() %>%
  mutate(total = sum(n),
         prop = n/total)

ggplot(count_by_sign, aes(x = reorder(sign, prop), y = prop)) + 
  geom_col() + 
  scale_y_continuous(labels = scales::percent, name = "Percent of Tweets") + 
  scale_x_discrete(name = "Sign") + 
  coord_flip() +  
  theme_minimal()

Not so! We see some fairly big discrepancies here – in particular, Sagittarius is written about almost 10% of the time, while Aquarius squeaks in at just over 7%. Favouritism is clear!

Next, we’ll look at how people interact with each of the signs’ tweets, with a focus on likes.

library(tidyr)

likes_by_sign <- tweets_with_sign %>%
  group_by(sign) %>%
  summarise(Median = median(favorite_count),
         Max = max(favorite_count)) %>%
  gather(Measure, Likes, Median, Max) %>%
  mutate(Measure = factor(Measure, c("Median", "Max")))

ggplot(likes_by_sign, aes(x = reorder(sign, Likes), y = Likes)) + 
  geom_col() + 
  facet_grid(.~Measure, scales = "free") + 
  scale_x_discrete(name = "Sign") + 
  coord_flip() + 
  theme_minimal()

While Virgo comes in fourth when we look at the median number of likes for each sign, we can see that it takes first place in terms of the most likes on a single tweet – 1.120210^{4} likes, a whopping ~2500 more likes than any other sign. What’s the tweet?

tweets_with_sign %>%
  filter(favorite_count == likes_by_sign %>% filter(sign == "Virgo" 
                                                    & Measure == "Max") 
         %>% with(Likes)) %>%
  with(text)
## [1] "We are just days away from Virgo season. Clean your apartments, be on time, get ready to be criticized."

😬

I’m not sure if this tweet having the most likes supports the claim that Virgos get the most love, but it’s certainly… something.