Trump and Twitter

Donald Trump likes to tweet a lot. Some tweets have an angry sentiment or contain insults, and some are not. Trump seems to send tweets from a Samsung Galaxy when he is insulting people, places, and things, and from other devices, such as an iPhone, when he is not.

Trump’s staff members, like many politicians, often post tweets to his account. Do tweets that Trump write have an angry sentiment compared to tweets written by his staff?

Trump’s last tweet from an Android was on March 25, 2017.

Brendan Brown created an archive of Trump’s twitter posts. that can be used as a data source.

This analysis is based on David Robinson’s post Trump’s Android and iPhone tweets, one year later.

Data

I downloaded a JSON file of all Trump’s tweets from Browns’s archive of Trump’s twitter posts. on September 28, 2017.

Question: Is there evidence that the device used to tweet has a different usage pattern?

library(jsonlite)
library(tidyverse)
library(stringr)
library(lubridate)
url <- "http://utstat.toronto.edu/~nathan/teaching/sta4002/Class4/trumptweets.JSON"
trumptweets <- fromJSON(url)
tweets <- trumptweets %>% select(source, text, created_at,id_str) %>% 
  mutate(created_at = parse_date_time(created_at, "a b! d! H!:M!:S! z!* Y!")) %>% 
  filter(created_at < "2017-03-26") %>%
  mutate(source=ifelse(str_detect(source,"iPhone"),"iPhone",                    ifelse(str_detect(source,"Android"),"Android","NA"))) %>%
  filter(source %in% c("iPhone", "Android"))

Exploring the data

Questions:

Interpret the plots below.
What information do the plots below give you about the patterns and content of Trump’s tweets?

library(lubridate)
library(scales)
tweets %>% count(source, hour = hour(with_tz(created_at, "EST"))) %>%
  mutate(percent = n / sum(n)) %>%
  ggplot(aes(hour, percent, color = source)) +
  geom_line() +
  scale_y_continuous(labels = percent_format()) +
  labs(x = "Hour of day (EST)",
       y = "% of tweets",
       color = "")

library(tidyverse)
tweet_picture_counts <- tweets %>% filter(!str_detect(text, '^"')) %>% count(source, picture = ifelse(str_detect(text, "t.co"), "Picture/link", "No picture/link"))

ggplot(tweet_picture_counts, aes(source, n, fill = picture)) + geom_bar(stat = "identity", position = "dodge") + labs(x = "", y = "Number of tweets", fill = "")

Comparison of words

Trump’s tweets can be tokenized with stop words removed.

library(tidytext)
library(tidyverse)
library(stringr)

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"

tweet_words <- tweets %>% filter(created_at < "2017-03-26") %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))

head(tweet_words)

## # A tibble: 6 x 4
##    source          created_at             id_str             word
##     <chr>              <dttm>              <chr>            <chr>
## 1 Android 2013-02-06 01:53:40 298972696438521857 @sherrieshepherd
## 2 Android 2013-02-06 01:53:40 298972696438521857             nice
## 3 Android 2013-02-06 01:53:40 298972696438521857         comments
## 4 Android 2013-02-06 01:53:40 298972696438521857             view
## 5 Android 2013-02-06 01:53:40 298972696438521857         terrific
## 6 Android 2013-02-15 04:18:36 302270661387231232      @rosscooker

tweet_words %>% count(word) %>% top_n(50) %>% ggplot(aes(word,n))+geom_col()+coord_flip()+labs(x="Word",y="Number of times word ocurres in a Tweet")

Clean up the plot.

tweet_words %>% count(word) %>% mutate(word=reorder(word,n)) %>% top_n(20) %>% ggplot(aes(word,n))+geom_col()+coord_flip()+labs(x="Word",y="Number of times word ocurres in a Tweet")

Question: Which words are more commonly used when the source is Android versus iPhone?

Sentiment analysis of words used in Trump’s Tweets

The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.

Words can be classified into two sentiments (positive or negative) and eight emotions (sentiments: anger, anticipation, disgust, fear, joy, sadness, surprise, trust). Words can have a sentiment association and several emotional associations. Abandon, for example falls into categories:

library(tidytext)
sentiments %>% filter(lexicon=="nrc") %>% filter(word=="abandon") %>% select(-score) %>% head()

## # A tibble: 3 x 3
##      word sentiment lexicon
##     <chr>     <chr>   <chr>
## 1 abandon      fear     nrc
## 2 abandon  negative     nrc
## 3 abandon   sadness     nrc

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  dplyr::select(word, sentiment)

sources <- tweet_words %>%
  group_by(source) %>%
  mutate(total_words = n()) %>%
  ungroup() %>%
  distinct(id_str, source, total_words)

by_source_sentiment <- tweet_words %>%
  inner_join(nrc, by = "word") %>%
  count(sentiment, id_str) %>%
  ungroup() %>%
  complete(sentiment, id_str, fill = list(n = 0)) %>%
  inner_join(sources) %>%
  group_by(source, sentiment, total_words) %>%
  summarize(words = sum(n)) %>%
  ungroup()

head(by_source_sentiment)

## # A tibble: 6 x 4
##    source    sentiment total_words words
##     <chr>        <chr>       <int> <dbl>
## 1 Android        anger       36134  2228
## 2 Android anticipation       36134  2240
## 3 Android      disgust       36134  1537
## 4 Android         fear       36134  2057
## 5 Android          joy       36134  1777
## 6 Android     negative       36134  4040

Modelling