Donald Trump likes to tweet a lot. Some tweets have an angry sentiment or contain insults, and some are not. Trump seems to send tweets from a Samsung Galaxy when he is insulting people, places, and things, and from other devices, such as an iPhone, when he is not.
Trump’s staff members, like many politicians, often post tweets to his account. Do tweets that Trump write have an angry sentiment compared to tweets written by his staff?
Trump’s last tweet from an Android was on March 25, 2017.
Brendan Brown created an archive of Trump’s twitter posts. that can be used as a data source.
This analysis is based on David Robinson’s post Trump’s Android and iPhone tweets, one year later.
I downloaded a JSON file of all Trump’s tweets from Browns’s archive of Trump’s twitter posts. on September 28, 2017.
Question: Is there evidence that the device used to tweet has a different usage pattern?
library(jsonlite)
library(tidyverse)
library(stringr)
library(lubridate)
url <- "http://utstat.toronto.edu/~nathan/teaching/sta4002/Class4/trumptweets.JSON"
trumptweets <- fromJSON(url)
tweets <- trumptweets %>% select(source, text, created_at,id_str) %>%
mutate(created_at = parse_date_time(created_at, "a b! d! H!:M!:S! z!* Y!")) %>%
filter(created_at < "2017-03-26") %>%
mutate(source=ifelse(str_detect(source,"iPhone"),"iPhone", ifelse(str_detect(source,"Android"),"Android","NA"))) %>%
filter(source %in% c("iPhone", "Android"))
Questions:
Interpret the plots below.
What information do the plots below give you about the patterns and content of Trump’s tweets?
library(lubridate)
library(scales)
tweets %>% count(source, hour = hour(with_tz(created_at, "EST"))) %>%
mutate(percent = n / sum(n)) %>%
ggplot(aes(hour, percent, color = source)) +
geom_line() +
scale_y_continuous(labels = percent_format()) +
labs(x = "Hour of day (EST)",
y = "% of tweets",
color = "")
library(tidyverse)
tweet_picture_counts <- tweets %>% filter(!str_detect(text, '^"')) %>% count(source, picture = ifelse(str_detect(text, "t.co"), "Picture/link", "No picture/link"))
ggplot(tweet_picture_counts, aes(source, n, fill = picture)) + geom_bar(stat = "identity", position = "dodge") + labs(x = "", y = "Number of tweets", fill = "")
Trump’s tweets can be tokenized with stop words removed.
library(tidytext)
library(tidyverse)
library(stringr)
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
tweet_words <- tweets %>% filter(created_at < "2017-03-26") %>% filter(!str_detect(text, '^"')) %>% mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>% filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
head(tweet_words)
## # A tibble: 6 x 4
## source created_at id_str word
## <chr> <dttm> <chr> <chr>
## 1 Android 2013-02-06 01:53:40 298972696438521857 @sherrieshepherd
## 2 Android 2013-02-06 01:53:40 298972696438521857 nice
## 3 Android 2013-02-06 01:53:40 298972696438521857 comments
## 4 Android 2013-02-06 01:53:40 298972696438521857 view
## 5 Android 2013-02-06 01:53:40 298972696438521857 terrific
## 6 Android 2013-02-15 04:18:36 302270661387231232 @rosscooker
tweet_words %>% count(word) %>% top_n(50) %>% ggplot(aes(word,n))+geom_col()+coord_flip()+labs(x="Word",y="Number of times word ocurres in a Tweet")
Clean up the plot.
tweet_words %>% count(word) %>% mutate(word=reorder(word,n)) %>% top_n(20) %>% ggplot(aes(word,n))+geom_col()+coord_flip()+labs(x="Word",y="Number of times word ocurres in a Tweet")
Question: Which words are more commonly used when the source is Android versus iPhone?
The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.
Words can be classified into two sentiments (positive or negative) and eight emotions (sentiments: anger, anticipation, disgust, fear, joy, sadness, surprise, trust). Words can have a sentiment association and several emotional associations. Abandon, for example falls into categories:
library(tidytext)
sentiments %>% filter(lexicon=="nrc") %>% filter(word=="abandon") %>% select(-score) %>% head()
## # A tibble: 3 x 3
## word sentiment lexicon
## <chr> <chr> <chr>
## 1 abandon fear nrc
## 2 abandon negative nrc
## 3 abandon sadness nrc
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
dplyr::select(word, sentiment)
sources <- tweet_words %>%
group_by(source) %>%
mutate(total_words = n()) %>%
ungroup() %>%
distinct(id_str, source, total_words)
by_source_sentiment <- tweet_words %>%
inner_join(nrc, by = "word") %>%
count(sentiment, id_str) %>%
ungroup() %>%
complete(sentiment, id_str, fill = list(n = 0)) %>%
inner_join(sources) %>%
group_by(source, sentiment, total_words) %>%
summarize(words = sum(n)) %>%
ungroup()
head(by_source_sentiment)
## # A tibble: 6 x 4
## source sentiment total_words words
## <chr> <chr> <int> <dbl>
## 1 Android anger 36134 2228
## 2 Android anticipation 36134 2240
## 3 Android disgust 36134 1537
## 4 Android fear 36134 2057
## 5 Android joy 36134 1777
## 6 Android negative 36134 4040
Questions:
Describe and fit at least one statistical model to compare word sentiments and emotions in Android versus iPhone?
What do you conclude?
What are the limitations of your model?