Using Twitter API to Solve the GOAT Debate: Michael Jordan vs. LeBron James

Claremont Colleges Claremont Colleges

Scholarship @ Claremont Scholarship @ Claremont

CMC Senior Theses CMC Student Scholarship

2021

Using Twitter API to Solve the GOAT Debate: Michael Jordan vs. Using Twitter API to Solve the GOAT Debate: Michael Jordan vs.

LeBron James LeBron James

Jordan Trey Leonard

Follow this and additional works at: https://scholarship.claremont.edu/cmc_theses

Part of the Applied Mathematics Commons, Data Science Commons, and the Statistics and

Probability Commons

Recommended Citation Recommended Citation

Leonard, Jordan Trey, "Using Twitter API to Solve the GOAT Debate: Michael Jordan vs. LeBron James"

(2021).

CMC Senior Theses

. 2733.

https://scholarship.claremont.edu/cmc_theses/2733

This Open Access Senior Thesis is brought to you by Scholarship@Claremont. It has been accepted for inclusion in

this collection by an authorized administrator. For more information, please contact

[email protected].

Claremont McKenna College

Using Twitter API to Solve the GOAT Debate:

Michael Jordan vs. LeBron James

submitted to

Professor Mark Huber

Jordan Leonard

for

Senior Thesis in Mathematics

May 3, 2021

Introduction

What is Sentiment Analysis?

Sentiment analysis (Feldman 2013) is a unique data mining (Hand and Adams

2014) tool that refers to the use of natural language processing (Chowdhury 2003),

text analysis (Bernard and Ryan 1998), computational linguistics (Grishman 1986),

and biometrics (Jain, Flynn, and Ross 2007) to identify, extract, quantify, and study

subjective information. It is commonly used to gather information on public opinion

by breaking down text to determine whether it contains positive or negative sentiment.

Many studies tend to gather their text data from social media platforms due to the

large number of users and available content. In this case, I use sentiment analysis

(Feldman 2013) to analyze tweet data collected from Twitter in RStudio (Allaire 2012).

Problem Description

In the following paper, I gather and analyze Twitter tweets from real users to

compare the social sentiment of professional athletes in the National Football League

(NFL), National Basketball Association (NBA), Major League Baseball (MLB), as

well as athletes who play National Collegiate Athletic Association (NCAA) Division 1

basketball. The reasoning for my analysis of the social sentiment of athletes on Twitter

began with my interest in solving the dispute of labeling athletes as the GOAT or the

“Greatest of All Time.” Granted that every professional athlete is extremely talented

and made it to the professional level for a reason, the label of GOAT is reserved to the

best of the best. In the NBA, the discussion tends to come down to comparing LeBron

James and Michael Jordan. With that being said, I decided that I would employ the

technique of sentiment analysis (Feldman 2013) and a Twitter API (Makice 2009) in

an attempt to ﬁnd some sort of resolution. I also believed this analysis to be vital for

the fact that Michael Jordan released a documentary of his NBA career called The

Last Dance (“Everything You Need to Know about ’The Last Dance”’ 2020) in April

of 2020, and LeBron James won an NBA Championship with the Los Angeles Lakers

in October of 2020. With these two major events occurring in the same year, I hoped

that it would produce enough substance to compare both athletes on a seemingly even

scale.

Data Gathering Process

In order to publicly and freely access Twitter tweets, it is required to go through an

application process in which one is granted conﬁdential keys to access the Twitter API

(Makice 2009) for your speciﬁc project. Once these keys are granted, you are able to

use the keys as

search tokens

within the rtweet package in RStudio (Allaire 2012)

to run the API that enables tweet collection. Using the

search_tweets()

function, I

was able to input a given keyword that I would like to search for across the Twitter

database and get an output of tweets that include that keyword. However, the the

Twitter API (Makice 2009) that I was granted is limited in that I am only able to

search for 18,000 tweets in a given 15-minute period, and the tweets that are searched

over for a given keyword had to have been tweeted in the past 6-9 days. Hence, the

API granted me a limited time frame to work with which was unfortunate as I had

hoped to access tweets ranging over the past couple of years. A larger time frame

would allow me to see how the social sentiment revolving around athletes ﬂuctuated

due to their athletic performances and achievements within their respective sporting

seasons.

Given that the tweet data collected only ranges over the past 6-9 days from when

the API is employed using the

search_tweets()

function, I was still able to gather

important results for the athletes in my study as the NBA, MLB, and NCAA basketball

seasons were still ongoing. Despite the NFL season not being in progress like the

other sports, there was still valuable tweet data to be collected and analyzed. In the

data gathering code, it can be seen that the

search_tweets()

function is simple and

readable in that it uses a search query argument,

q = [Name] GOAT OR [Name] Goat

for every athlete. It also includes the arguments

include_rts = FALSE

-filter

= “replies"

, and

lang = en

. These arguments specify the desired number of tweets

to be returned while ﬁltering out retweets and replies, and only collecting tweets that

are in English. These arguments assist in keeping my data concise and focused on the

portions that are necessary for further analysis.

Analysis

NBA

Exploring Tweets

Read in Data

Using the

search_tweets()

function from the rtweet package in

the following code chunk, we are able to collect tweet data regarding Michael Jordan.

Given that the focus is on tweets that contain the term GOAT, the search query

q =

“Michael Jordan GOAT OR Michael Jordan Goat”

is adopted to narrow the scope

of the search. The values from the search are stored in the variable

mj_goat_tw

and

contain 279 observations of 91 variables. This means there is a total of 279 tweets

available that are sorted into 91 column variables such as “user_id,” “created_at,”

“text,” etc. To ensure that the analysis is done on the same set of data instead of

consistently recollecting new tweet data, the

write_as_csv()

function stores the

values from the

mj_goat_tw

variable into a CSV ﬁle. This not only saves the data in

a safe, readable format but grants the ability to read in the data after each session

using the read.csv() function.

### Michael Jordan

mj_goat_tw <- read.csv("mj_goat_tw.csv", fill = TRUE)

mj_goat_tw2 <- read.csv("mj_goat_tw.csv", fill = TRUE)

Now that the tweet data for Michael Jordan is collected, it is time to dive into the

data by performing EDA. This enables the ability to fully understand the data that

has been collected in order to perform further analyses later on. In the following code,

we are using the pipe operator from the data variable so that we can see a sample

of size 3 for the selected column variables of “created_at,” “screen_name,” “text,”

“favorite_count,” “retweet_count.” From the output, it can be seen that the “text”

column contains the terms “Michael Jordan” and GOAT. This is important because

it ensures that the

search_tweets()

function from the data gathering process is

working properly by producing valuable results.

mj_goat_tw %>%

sample_n(3) %>%

select(created_at, screen_name, favorite_count)

## created_at screen_name favorite_count

## 1 2021-03-30 02:16:03 not_andrew____ 1

## 2 2021-03-31 03:31:48 ulforicks 1

## 3 2021-04-02 22:48:08 Jimmyrealdeal 0

This process can be replicated for the other NBA players within our sample to

conﬁrm that the tweets we analyze are in fact referencing each speciﬁed player.

Below, we can see the sample of tweets for LeBron James, James Harden, Kevin

Durant, and Kobe Bryant which used the same coding process on their respective

dataset of tweets.

Since the data outputs contain the “created_at” column variable which labels

the date and time that each tweet was published, it would be interesting to take a look

at the tweet frequency by users who are invested in the NBA “GOAT ” conversation.

Timeline of Tweets - Frequency Plot

The

ts_plot()

function allows us to investigate the frequency of tweets as they

were tweeted between the dates of “2021-03-26 06:25:39” and “2021-04-03 03:12:00.”

In the code below, we are able to specify a desired time interval to model which is

where the hours and days arguments come into eﬀect. Both models display a spike in

frequency of tweets on “2021-03-28” which total to 50+ tweets for that day.

### Michael Jordan

ts_plot(mj_goat_tw, "hours") +

labs(x = NULL, y = NULL,

title = "Frequency of tweets with Michael Jordan GOAT Keyword",

caption = "Data collected from Twitter's API via rtweet") +

theme_minimal()

Mar 27 Mar 29 Mar 31 Apr 02

Frequency of tweets with Michael Jordan GOAT Keyword

Data collected from Twitter's API via rtweet

ts_plot(mj_goat_tw2, "days") +

labs(x = NULL, y = NULL,

title = "Frequency of tweets with Michael Jordan GOAT Keyword",

caption = "Data collected from Twitter's API via rtweet") +

theme_minimal()

Mar 27 Mar 29 Mar 31 Apr 02

Frequency of tweets with Michael Jordan GOAT Keyword

Data collected from Twitter's API via rtweet

Similarly, we are able to investigate LeBron James’ tweet frequency for tweets

between the dates of “2021-03-26 23:49:59” and “2021-04-04 02:17:50.” The following

frequency plot has a spike in frequency of tweets on “2021-03-28” and “2021-03-31”

which have total of about 70 and 55 tweets for those days, respectively.

Mar 28 Mar 30 Apr 01 Apr 03

Frequency of tweets with LeBron James GOAT Keyword

Data collected from Twitter's API via rtweet

Next, we can take a look at James Harden’s tweet frequency between the dates

of “2021-03-26 23:49:59” and “2021-04-04 02:17:50.” The following frequency plot has

a spike in frequency of tweets on “2021-03-31” which total to about 25 tweets for that

day.

Mar 28 Mar 30 Apr 01 Apr 03

Frequency of tweets with James Harden GOAT Keyword

Data collected from Twitter's API via rtweet

Then, we can take a look at Kevin Durant’s tweet frequency between the dates

of “2021-03-27 21:57:47” and “2021-04-04 02:26:39.” The following frequency plot has

a spike in frequency of tweets on “2021-03-28” and “2021-03-30” which have total of

about 20 and 25 tweets for those days, respectively.

Mar 29 Mar 31 Apr 02 Apr 04

Frequency of tweets with Kevin Durant GOAT Keyword

Data collected from Twitter's API via rtweet

Finally, Kobe Bryant’s tweet frequency takes place between the dates of “2021-

04-05 06:16:15” and “2021-04-13 03:00:21.” The following frequency plot has a spike

in frequency of tweets on “2021-04-06,” “2021-04-09,” and “2021-04-12” which have

total of about 6, 10, and 12 tweets for those days, respectively.

0.0

0.5

1.0

1.5

2.0

Apr 06 Apr 08 Apr 10 Apr 12

Frequency of tweets with Kobe Bryant GOAT Keyword

Data collected from Twitter's API via rtweet

When comparing the time intervals of hours versus days, the days interval provides

an interesting visual of the daily frequency but the hours argument provides a better

insight since we are dealing with a time-frame of only 6-9 days. If we were to be

dealing with a dataset of tweets which span the course of a month or more, then the

days interval would be an eﬀective model.

The above frequency plots are oﬀer valuable insight into the NBA GOAT

conversation as we are able to notice that these athletes are consistently being “talked”

about throughout their dataset time-frames. Since each athlete had at least one

spike in tweet frequency, it may be beneﬁcial to understand the sentiment/sentiment

polarity during those periods. This would allow us to determine if those spikes were

positive or negative, and how it compares to the other days in the dataset.

Top Tweeting Location

Another variable that could potentially play a factor in the public sentiment towards

an athlete is the location of where a Twitter user lives. When it comes to sports,

fans tend to develop competitive attitudes which may lead to a resentment towards

opponents. Some fans follow the teams that reside in their hometown or state while

others may not, either way, location is present.

In the following code chunk, we can ﬁlter the

mj_goat_tw

variable to remove

any NA values from consideration in the location column variable, while including

the count of tweets from each location in the output. The reason for excluding NA

values is that they do not provide any useful information and removing them presents

a more eﬃcient model. From the output, we can see that there were 104 tweets from a

blank location, 16 tweets from Chicago, IL, and 4 tweets from both the United States

and Washington, D.C.. In this case, the 104 tweets from a blank location were not

represented as NA values nor were they removed due to the fact that the users did

not enable the location feature when they published the tweet. Therefore, the missing

location value was replaced with a blank cell when the data was read into the CSV ﬁle

using the

fill == TRUE

argument. To correct this, we can run the following function

to replace those blank cells with NA values so that the ﬁlter argument properly selects

the non-NA values. When comparing the two outputs it is obvious that the blank

cells are removed.

mj_goat_tw2[mj_goat_tw2 == ""] <- NA

mj_goat_tw %>%

filter(!is.na(location)) %>%

count(location, sort = TRUE) %>%

top_n(5) %>%

kable()

location n

104

Chicago, IL 16

United States 4

Washington, DC 4

Boston, MA 3

mj_goat_tw2 %>%

filter(!is.na(location)) %>%

count(location, sort = TRUE) %>%

top_n(5) %>%

kable()

location n

Chicago, IL 16

United States 4

Washington, DC 4

Boston, MA 3

Charlotte, NC 2

Dallas, TX 2

Detroit, MI 2

Downtown 2

Lagos, Nigeria 2

Los Angeles, CA 2

Miami, FL 2

San Francisco, CA 2

Somewhere 2

Your head rent free 2

From the following bar chart, we observe that our dataset includes Twitter users

all over the United States and even reaches users as far as Lagos, Nigeria. It does

make sense that Chicago would hold be the top location since Michael Jordan played

for the Chicago Bulls for 14 years which was essentially his entire career.

# Omits NA Locations

mj_goat_tw2 %>%

count(location, sort = TRUE) %>%

mutate(location = reorder(location, n)) %>%

na.omit() %>%

top_n(12) %>%

ggplot(aes(x = location, y = n)) +

geom_col(fill = "red", color = "black") +

coord_flip() +

labs(x = "Count", y = "Location",

title = "Top Locations of Michael Jordan GOAT Tweets")

Charlotte, NC

Dallas, TX

Detroit, MI

Downtown

Lagos, Nigeria

Los Angeles, CA

Miami, FL

San Francisco, CA

Somewhere

Your head rent free

Boston, MA

United States

Washington, DC

Chicago, IL

0 5 10 15

Location

Count

Top Locations of Michael Jordan GOAT Tweets

Similar to the bar chart for Michael Jordan, every other NBA athlete has top

locations that spread the U.S. and even extend to other countries/continents. However,

some locations listed by users are not real locations but were frequented enough to

make the list. Regardless, it is worthwhile to explore all aspects of our data even if

that leads to locations such as “Your head rent free.”

2011 and 2007 finals

Boise, ID

Florida, USA

Lagos, Nigeria

Michigan, USA

San Antonio, TX

Chicago, IL

Iowa, USA

Washington, DC

United States

0 2 4 6

Location

Count

Top Locations of LeBron James GOAT Tweets

Houston, TX

nfl

Texas, USA

NEVADA

Your head rent free

0 3 6 9

Location

Count

Top Locations of James Harden GOAT Tweets

16 he/him

Boise, ID

Cleveland, OH

Los Angeles, CA

0 1 2 3

Location

Count

Top Locations of Kevin Durant GOAT Tweets

Agoura Hills, CA

Buena Park, CA

0.0 0.5 1.0 1.5 2.0

Location

Count

Top Locations of Kobe Bryant GOAT Tweets

Similar to other social media applications, Twitter allows users to like/favorite

tweets. So, if a tweet has a considerable number of likes it is safe to assume that

others share the same opinion and agree with what is being communicated. Using

the code below, we can see the top-3 tweets with the most likes/favorites for Michael

Jordan. From the output, each tweet relays a positive attitude when it comes to

Michael Jordan being considered the GOAT, however, this may not always be the

case. Eventually, we will investigate the overall attitude towards Michael Jordan and

the other NBA players to see just how positive and/or negative they are.

## created_at screen_name favorite_count

## 1 2021-03-28 22:22:59 AllThingsSnyder 387

## 2 2021-03-26 21:50:08 BurnerKhris 356

## 3 2021-04-01 15:17:43 undisputed 252

Word Cloud Analysis

Another text analysis that we are able to observe involves creating a word cloud

(Heimerl et al. 2014) which allows us to visualize common words within tweets. What

makes this visualization method unique is that the sizing of each word is determined

by their frequency which implies their importance/relevance to the overall twitter

dataset. So as to gather the dataset containing the top tweeted words, each tweet

must be cleaned by removing unnecessary characters and symbols while detecting

the strings that are characterized as individual words. In this case we are ﬁltering

by regular expressions (Li et al. 2008). Also, stop words which are commonly used

words that are viewed as unimportant to the text analysis must also be ﬁltered out to

shift the focus of the word networks onto the more important word groupings. Once

the frequency of each word is accounted for, we can apply the

wordcloud()

function,

from the wordcloud package.

## Top Words

### Michael Jordan

data("stop_words")

words_mj_goat <- mj_goat_tw %>%

mutate(text = str_remove_all(text, "&|<|>"),

text = str_remove_all(text,

"\\s?(f|ht)(tp)(s?)(://)

([ˆ\\.]*)[\\.|/](\\S*)"),

text = str_remove_all(text, "[ˆ\x01-\x7F]")) %>%

unnest_tokens(word, text, token = "tweets") %>%

filter(!word %in% stop_words$word,

!word %in% str_remove_all(stop_words$word, "'"),

str_detect(word, "[a-z]"),

!str_detect(word, "ˆ#"),

!str_detect(word, "@\\S+")) %>%

count(word, sort = TRUE)

Given that frequency correlates to word size, we can conﬁdently say that “Michael,”

“Jordan,” and GOAT are the top words within the dataset. These top words occur

287, 287, and 267 times, respectively, with the next top word being “LeBron” with

a frequency of 73. In some sense this outcome was expected, especially with the

N.B.A. GOAT debate usually comparing Michael Jordan and LeBron James. Taking

a look into LeBron’s dataset of top words, Michael Jordan’s name appears to be the

seventh-most frequent word with 57 occurrences.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

words_mj_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "red"))

jordan

michael

goat

lebron

basketball

james

nba

player

brady

played

tom

undisputed

<u+0001f410>

football

messi

nets

sports

time

ali

baseball

kobe

ronaldo

babe

boxing

gretzky

hockey

muhammed

phelps

rings

ruth

soccer

swimming

team

tennis

track

bolt

debate

federer

cried

love

beats

real

wanna

air

club

finals

people

play

top

wade

<u+0001f602>

dwyane

jordans

jwill

steph

watch

won

<u+0001f972>

ago

bubble

check

curry

dance

disney

documentary

game

jeffrey

kareem

kjz

lakers

lefraud

lemickey

lost

magic

mike

players

star

super

win

bill

bryant

era

fuck

god

hear

history

legend

literally

million

shit

stop

bulls

espn

golf

legacy

peak

russell

space

title

words_lebron_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "purple"))

lebron

james

goat

team

nets

basketball

jordan

finals

beat

nba

michael

love

win

lakers

player

harden

stop

<u+0001f410>

basketballer

people

club

debate

wins

play

players

ring

beats

time

wade

kobe

kyrie

league

lol

yall

brooklyn

game

lost

super

top

curry

king

season

undisputed

aldridge

clubs

dislike

durant

fan

kevin

rings

teams

wanna

won

hate

lebrons

played

real

shot

star

steph

warriors

care

dufraud

hear

heat

history

jam

skip

space

teammates

biggest

bosh

bro

career

chili

considered

conversation

dwyane

games

guys

lbj

legacy

level

magic

miami

mvp

ppg

question

stars

blake

funny

griffin

lose

true

wouldve

words_harden_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "black"))

james

harden

goat

plays

ass

lebron

win

player

ring

2000stht

armstht

beardharden

born

clubharden

fatharden

franchisetht

goattht

pays

practice

refsharden

ringstht

skips

strip

tht

travelsharden

ugly

kyrie

basketball

curry

nets

love

club

durant

aldridge

basketballer

beat

blake

finals

griffin

steph

team

created

irving

king

shots

jordan

kevin

kobe

top

guy

lamarcus

lillard

nice

players

shooters

<u+0001f410>

<u+0001f640><u+0001f480>

<u+2604><u+fe0f><u+2604><u+fe0f><u+2604><u+fe0f>

ball

brooklyn

bucket

china

damian

enjoy

finesse

foot

harder

incoming

lakers

lol

mvp

nba

pounds

reason

shoot

stop

time

watching

wins

anthony

assists

bosh

clubs

considered

deandre

dislike

game

giannis

hear

klay

niggas

paul

play

playing

playmaking

playoffs

season

wade

words_kd_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "gray1"))

durant

kevin

goat

lebron

basketball

james

nba

harden

nets

beat

rings

team

irving

shit

finals

kyrie

player

aldridge

beats

blake

conversation

jordan

lamarcus

love

griffin

hooper

lol

michael

players

time

twitter

win

blah

brooklyn

burner

curry

deandre

era

game

idc

kobe

lefraud

lmao

nah

rapaport

steph

superteam

words_kobe_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "yellow2"))

kobe

bryant

goat

lebron

jordan

bean

lakers

michael

nba

quit

draft

james

<u+0001f410>

basketball

player

team

watching

It is interesting to note that most of the word clouds contain words referencing

other elite athletes that could be considered the GOAT in their respective sports.

Word Networks

After performing some initial exploratory data analysis on the datasets of our NBA

athletes, we can dive deeper into various text analyses such as bigram and trigram

(Martin, Liermann, and Ney 1998) analysis. We’ve used the

unnest_tokens()

function

to tokenize by a word, but we can also use the function to tokenize by consecutive

sequences of words, called n-grams (Cavnar, Trenkle, and others 1994). By determining

how often word

is followed by word

, we can model the relationship between them.

This can be done by adding the

token = “ngrams”

argument to

unnest_tokens()

and setting the

argument to the number of words we wish to capture in each n-gram.

## Michael Jordan

### Bigram Analysis

mj_goat_tw_paired_words <- mj_goat_tw %>%

select(stripped_text) %>%

unnest_tokens(paired_words, stripped_text, token = "ngrams", n = 2)

head(mj_goat_tw_paired_words %>%

count(paired_words, sort = TRUE), 10) %>%

kable()

paired_words n

michael jordan 256

the goat 155

is the 92

jordan is 73

lebron james 29

of all 27

â â 24

u 0001f410 23

goat michael 22

goat u 22

mj_goat_tw_sep_words <- mj_goat_tw_paired_words %>%

separate(paired_words, c("word1", "word2"), sep = " ")

mj_goat_tw_filtered_01 <- mj_goat_tw_sep_words %>%

filter(!word1 %in% stop_words$word) %>%

filter(!word2 %in% stop_words$word)

# new bigram counts:

mj_goat_tw_bigram_counts <- mj_goat_tw_filtered_01 %>%

count(word1, word2, sort = TRUE)

head(mj_goat_tw_bigram_counts) %>%

kable()

word1 word2 n

michael jordan 256

lebron james 29

â â 24

goat michael 22

undisputed goat 17

tom brady 16

To create Michael Jordan’s bigram word network (Zuo, Zhao, and Xu 2016), we

must set

n = 2

. In the ﬁgure below, we can visualize the relationships between two

words whose pairing forms a bigram. Each node represents a word within the ﬁltered

dataset and the connection between them is represented by an arrow which begins at

word

and points to word

. The frequency of each bigram can be distinguished by

the size/boldness of the arrow, like the the arrow connecting “Michael” and “Jordan”

as compared to the arrow connecting “Steph” and “Curry.” It is fascinating to see

that there are multiple bigram chains with the largest located in the bottom left of

the ﬁgure.

Note: The following ﬁgures contain inappropriate language, but is included to

illustrate the prevalence of such terms within the dataset

mj_goat_tw_bigram_counts %>%

filter(n >= 3) %>%

graph_from_data_frame() %>%

ggraph(layout = "fr") +

geom_edge_link(aes(edge_alpha = n, edge_width = n), arrow = a) +

geom_node_point(color = "red", size = 3) +

geom_node_text(aes(label = name), vjust = 1.8, size = 3) +

labs(title = "Bigram Word Network",

subtitle = "Tweets using Michael Jordan GOAT Keyword",

x = "", y = "")

michael

lebron

goat

undisputed

tom

ali

babe

baseball

basketball

boxing

brady

football

gretzky

hockey

jordan

muhammed

phelps

ronaldo

bolt

federer

soccer

sports

swimming

tennis

track

bleacherreport

real

dwyane

nets

bubble

jwill

lost

won

cried

jeffrey

kobe

nba

steph

air

abdul

bill

chicago

childhood

fuck

hero

imliterallarry1

kareem

larry

magic

mike

officialj0nn

pal

penny

reasons

space

time

wanna

wilt

james

ruth

messi

debate

rings

club

wade

player

kjz

lefraud

jordanâ

bryant

curry

jabbar

russell

bulls

status

amp

bird

johnson

jordan's

legends

hardaway

jam

hear

chamberlain

100

150

200

250

Tweets using Michael Jordan GOAT Keyword

Bigram Word Network

Like before, Michael Jordan’s trigram word network can be found by adjusting

the

unnest_tokens()

function such that

n = 3

. Thus, the resulting ﬁgure visualizes

the relationships between three words whose pairing forms a trigram. Each node

represents a word within the ﬁltered dataset and the connection between them is

represented by an arrow which begins at word

, points to word

, and then points

to word

. The frequency of each trigram is also distinguished by the size/boldness of

the arrow. Unlike the bigram ﬁgure, there are less trigram chains and half of them are

extremely bolded. In the table containing the paired words for the bigram and trigram

analysis, “â â” and “â â â” occur due to the fact that they are special characters that

were included in the “stripped_text” column during the tweet gathering process.

### Trigram Analysis

mj_goat_tw_tri_paired_words <- mj_goat_tw %>%

select(stripped_text) %>%

unnest_tokens(paired_words, stripped_text, token = "ngrams", n = 3)

head(mj_goat_tw_tri_paired_words %>%

count(paired_words, sort = TRUE), 10) %>%

kable()

paired_words n

michael jordan is 70

is the goat 54

paired_words n

jordan is the 53

â â â 20

goat michael jordan 20

the undisputed goat 16

is the undisputed 15

goat of all 14

lebron is the 14

of all sports 14

mj_goat_tw_sep_words_3 <- mj_goat_tw_tri_paired_words %>%

separate(paired_words, c("word1", "word2", "word3"), sep = " ")

mj_goat_tw_filtered_02 <- mj_goat_tw_sep_words_3 %>%

filter(!word1 %in% stop_words$word) %>%

filter(!word2 %in% stop_words$word) %>%

filter(!word3 %in% stop_words$word)

# new trigram counts:

mj_goat_tw_trigram_counts <- mj_goat_tw_filtered_02 %>%

count(word1, word2, word3, sort = TRUE)

head(mj_goat_tw_trigram_counts) %>%

kable()

word1 word2 word3 n

â â â 20

goat michael jordan 20

baseball babe ruth 13

basketball michael jordan 13

boxing muhammed ali 13

football tom brady 13

goat

baseball

basketball

boxing

football

hockey

jordan

michael

muhammed

phelps

ronaldo

tom

ali

bolt

brady

federer

gretzky

sports

swimming

tennis

track

nets

bubble

won

childhood

fuck

hero

imliterallarry1

kareem

pal

real

reasons

time

babe

soccer

jwill

jeffrey

abdul

lebron

Tweets using Michael Jordan GOAT Keyword

Trigram Word Network

Replicating the bigram and trigram analysis for LeBron James, James Harden,

Kevin Durant, and Kobe Bryant produces the following word network ﬁgures.

lebron

michael

goat

basketball

ryan85313260

ronnpeezie

james

brooklyn

goat__james

super

coachb_allen3

undisputed

beat

goatjordan_23

mikehirsch3

space

dwyane

lakers

steph

nets

real

blake

finals

kevin

love

raymone

stop

top

wouldâ

jordan

club

shannonsharpe

debate

harden

beats

clubs

team

bullsgotnext

thefrankisola

jam

wade

wins

curry

conversation

win

œlebron

griffin

appearances

durant

100

200

Tweets using LeBron James GOAT

Bigram Word Network

goat

goat__james

lebron

coachb_allen3

lakers

love

beat

bro

dislike

ring

time4change916

blake

brooklyn

gerard_papa

griffin

irving

king

kyrie

minnesota

nets

steph

ryan85313260

james

ronnpeezie

goatjordan_23

basketball

raymone

youâ

mikehirsch3

lamarcus

timberwolves

curry

Tweets using LeBron James GOAT Keyword

Trigram Word Network

james

2000s:tht

arms:tht

ass

beard:harden

club:harden

fat:harden

franchise:tht

pays

refs:harden

ring

rings:tht

skips

strip

ugly

lebron

basketball

shots

steph

blake

kevin

goat

lamarcus

270

9â

bucket

china

curry

damian

enjoy

fe0f

finals

finesse

foot

harden

king

kyrie

pounds

shooters

watching

travels:harden

plays

born

practice

club

created

griffin

durant

aldridge

lillard

incoming

players

irving

kobe

shoot

Tweets using James Harden GOAT

Bigram Word Network

2000s:tht

arms:tht

ass

beard:harden

club:harden

pays

refs:harden

strip

ugly

270

9â

bucket

curry

enjoy

fe0f

foot

goat

harden

james

steph

watching

blake

griffin

irving

kyrie

love

aldridge

badgeplug

brooklyn

mattdegennaro14

rings:tht

skips

fat:harden

pounds

finals

damian

shooters

lebron

finesse

shots

lamarcus

deandre

nets

kevin

king

goatjordan_23

Tweets using James Harden GOAT Keyword

Trigram Word Network

kevin

goat

james

kyrie

lamarcus

lebron

basketball

blake

brooklyn

deandre

irving

love

nah

steph

anthony

bill

blah

burner

griffin

harden

jay

michael

nba

sean

williams

durant

conversation

aldridge

team

nets

jordan

shit

curry

rings

davis

player

russell

accounts

beats

rapaport

finals

marks

claims

Tweets using Kevin Durant GOAT

Bigram Word Network

goat

irving

kyrie

love

nah

blake

griffin

harden

james

jay

lebron

kevin

lamarcus

williams

Tweets using Kevin Durant GOAT Keyword

Trigram Word Network

kobe

bean

michael

bryant

lebron

1996

bccg

choice

collector's

draft

ganhou

goat

jordan

la2

lakers

real

rookie

set

team

topps

james

mint

class

chrome

Tweets using Kobe Bryant GOAT

Bigram Word Network

kobe

1996

bccg

bryant

choice

collector's

jordan

la2

lakers

michael

rookie

set

team

bean

lebron

Tweets using Kobe Bryant GOAT Keyword

Trigram Word Network

The varying shapes of the word networks and the bigram/trigram word rela-

tionships among the athletes is intriguing to interpret. It is expected that the most

bold or one of the most bold n-gram arrow involves the relationship between each

player’s ﬁrst and last name, but the fact that other professional athletes’ names are

also present emphasizes the intertwinement of the GOAT debate amongst sports. As

long as the search for the GOAT goes on, athletes will continue to be compared and

grouped together with those from other sports as Twitter users, fans, and sports media

voice their opinion.

Sentiment Analysis

To truly understand the connotation behind the GOAT tweets in the datasets

involving the NBA athletes, we can perform a text analysis known as sentiment

analysis (Feldman 2013). The

get_sentiments()

function oﬀered by the tidytext

package enables us to retrieve data frames containing words and their corresponding

sentiment within a given lexicon (Ding, Liu, and Yu 2008). The available lexicons

within the

get_sentiments()

function include “bing,” “aﬁnn,” “loughran,” and “nrc”

arguments. What diﬀerentiates the lexicons is their word list, the size of each word

list, and how the sentiment is evaluated. For example, the

bing

lexicon categorizes

sentiment as either “positive” or “negative”; the

afinn

lexicon labels sentiment as

numeric values ranging from [

−

5]; the

loughran

lexicon categorizes sentiment as

“negative,” “positive,” “litigious,” “uncertainty,” “constraining,” or “superﬂuous”; the

nrc

lexicon assigns sentiment values consisting of the 8 emotions from Plutchik’s

Wheel of Emotions (Tromp and Pechenizkiy 2014) to each word.

To see this in action, we can randomly sample 5 rows from each lexicon data frame

using the following code. This grants us a glimpse into the lexicons’ word variety and

the sentiment values associated with each lexicon.

set.seed(12345)

sample_n(get_sentiments("bing"), 5) %>% kable()

word sentiment

undisputably positive

accursed negative

bump negative

buoyant positive

senseless negative

sample_n(get_sentiments("afinn"), 5) %>% kable()

word value

empathetic 2

delighting 3

trauma -3

protected 1

aﬀectionate 3

sample_n(get_sentiments("loughran"), 5) %>% kable()

word sentiment

frivolous negative

drag negative

quitting negative

mediators litigious

injures negative

sample_n(get_sentiments("nrc"), 5) %>% kable()

word sentiment

peaceful trust

unhealthy negative

cultivate anticipation

crowning positive

alien fear

When we are ready to perform sentiment analysis (Feldman 2013) on our dataset

of tweets, we are only interested in the literal text of the tweet so that the analysis

runs smoothly and does not encounter any unnecessary errors. In the following code,

it can be seen that a new column variable was created within the original dataset

to centralize each tweet’s text, while substituting out the letters that occur in the

beginning of a web browser search. Once the text column has been identiﬁed, we can

create a cleaned dataset that breaks down each tweet in the stripped_text column

by word to create a list. Finally, we remove any stop words from the dataset and we

arrive at the ﬁnal product which is listed as mj_goat_tw_clean_02.

#### Data Cleaning

##### Michael Jordan

mj_goat_tw$stripped_text <- gsub("http.*","", mj_goat_tw$text)

mj_goat_tw$stripped_text <- gsub("https.*","", mj_goat_tw$stripped_text)

mj_goat_tw_clean_01 <- mj_goat_tw %>%

select(stripped_text) %>%

unnest_tokens(word, stripped_text)

mj_goat_tw_clean_02 <- mj_goat_tw_clean_01 %>%

anti_join(stop_words)

In order to see the sentiment frequencies among the four sentiment lexicons, we

can create bar charts that group by each lexicon’s sentiment values and output the most

frequent words within those values. Using the cleaned dataset,

mj_goat_tw_clean_02

we can perform an inner join with the

bing

lexicon word list which matches sentiment

values to the cleaned dataset if a word occurs in both sets. Using that knowledge, we

can adjust the dataset to include the number of times each word takes place. In the

following ﬁgure, we can see the most frequent words being grouped into the “negative”

and “positive” sentiment values that the bing lexicon evaluates upon.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

negative

positive

0 5 10 15 20 25 0 5 10 15 20 25

easy

hero

winning

wow

magic

super

win

won

top

love

undisputed

dislike

fleer

hard

hate

jam

doubt

fucking

wilt

fuck

shit

lost

Word Frequency

Sentiment of Michael Jordan Goat Tweets using Bing Lexicon

If we slightly change the above code such that the inner join is operated on the

afinn

loughran

, and

nrc

lexicons, we are able to have more words represented in the

ﬁgures since the previously noted lexicons oﬀer a greater variety of sentiment values.

−2

−1

−5

−4

−3

0 3 6 9 0 3 6 9 0 3 6 9

bad

ridiculous

hate

lost

dream

matter

yeah

easy

god

wins

winning

wow

win

asshole

damn

fraud

fucking

fuck

shit

admit

hard

doubt

stop

lol

super

won

love

niggas

nigger

crazy

stupid

wrong

dislike

cried

care

hero

true

top

Word Frequency

Sentiment of Michael Jordan Goat Tweets using AFINN Lexicon

****

positive

uncertainty

litigious

negative

0 2 4 6 0 2 4 6

bad

cancel

lying

question

quit

wrong

dropped

doubt

lost

risky

sudden

suggested

doubt

claim

claims

court

jury

offense

prosecution

testimony

witness

contract

contracts

dream

favorite

honoring

easy

winning

win

Word Frequency

Sentiment of Michael Jordan Goat Tweets using Loughran Lexicon

surprise

trust

joy

negative

positive

sadness

anger

anticipation

disgust

fear

0 10 20 3040 0 10 20 3040

hate

shot

doubt

god

watch

hate

shot

doubt

winning

lost

bad

lying

ridiculous

dislike

hate

winning

shit

love

debate

ruth

football

basketball

top

track

football

time

basketball

dislike

hate

hit

shot

skip

doubt

shit

lost

player

dance

star

top

real

team

dislike

hate

hit

shot

shit

boxing

dance

star

love

football

basketball

expect

score

hero

shot

winning

Word Frequency

Sentiment of Michael Jordan Goat Tweets using NRC Lexicon

Now that we have seen each lexicon’s word frequency variation for Michael

Jordan, let us conduct the visualization process for the others as well. Later, we will

be determining the sentiment polarity values for each player using the bing lexicon

By creating a function called

sentiment_bing_score()

, we can input the “text”

values from Michael Jordan’s tweet dataset and receive an ordered list of

bing

sentiment polarities that we can then transform into a readable tibble. One important

facet of the function is that it creates a column score of

−

1 for words with “negative”

sentiment, 1 for words with “positive” sentiment, and 0 in the case that there are

no words in the “text” column for a tweet after being cleaned and ﬁltered. The new

tibble can be displayed in a histogram to understand the statistical distribution of

the

bing

sentiment polarities. Repeating this procedure for LeBron James, James

Harden, Kevin Durant, and Kobe Bryant assists in comparing the sentiment polarity

distributions, visually.

ggplot(mj_goat_tw_sent_score_bing2, aes(x = Score)) +

geom_histogram(bins = 15, alpha = 0.9,

fill = "red", color = "black") +

xlab("Sentiment Polarity: Michael Jordan") + ylab("Count") +

theme_minimal()

100

150

−2.5 0.0 2.5

Sentiment Polarity: Michael Jordan

Count

From the histogram, we see that Michael Jordan’s

bing

sentiment polarity is

fairly neutral with a slight advantage on the right which may bring his overall score to

being positive. The following code aims to interpret the above histogram by ﬁnding

the range of values, the overall mean score, and the standard error of that score.

mj_goat_tw_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -4.00000 0.00000 0.00000 0.07885 1.00000 4.00000

tibble(

sent_mean = mean(mj_goat_tw_sent_score_bing2$Score),

sent_err =

sd(mj_goat_tw_sent_score_bing2$Score) /

sqrt(length(mj_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.078853 0.0585912

For Michael Jordan, the

bing

sentiment polarity scores range from [

−

4] with a

mean and standard error of 0

079

059. The purpose of the standard error is to

measure the statistical accuracy of the mean, so, the mean is estimated to be between

the values of [0

020

138]. Therefore, the dataset we gathered and analyzed using

the bing lexicon indicates a minor positive sentiment polarity for Michael Jordan.

Moving to LeBron James, we see that he also has a fairly even histogram shape

with the majority at 0. Yet, he has a higher frequency of negative scores which may

dock his overall polarity.

100

150

200

−4 −2 0 2

Sentiment Polarity: LeBron James

Count

For LeBron James, the

bing

sentiment polarity scores range from [

−

3] with a

mean and standard error of

−

005

056. This means that the mean is estimated

to be between the values of [

−

061

051]. Therefore, the dataset we gathered and

analyzed using the

bing

lexicon indicates a neutral sentiment polarity with a slight

lean in the negative direction for LeBron James.

lebron_goat_tw_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -4.000000 0.000000 0.000000 -0.004706 1.000000 3.000000

tibble(

sent_mean = mean(lebron_goat_tw_sent_score_bing2$Score),

sent_err =

sd(lebron_goat_tw_sent_score_bing2$Score) /

sqrt(length(lebron_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.0047059 0.0559446

Next, James Harden’s histogram is somewhat even but has been shifted in the

negative direction such that it now as a center at about

−

1. This diﬀers from the the

previous histograms which leads us to believe that his polarity is likely to be negative.

−2 0 2

Sentiment Polarity: James Harden

Count

James Harden’s

bing

sentiment polarity scores range from [

−

3] with a mean

and standard error of

−

758

093. As a result, the mean is estimated to be between

the values of [

−

851

, −

665]. Hence, the dataset we gathered and analyzed using

the bing lexicon indicates a negative sentiment polarity for James Harden.

harden_goat_tw_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -3.0000 -1.0000 -1.0000 -0.7576 0.0000 3.0000

tibble(

sent_mean = mean(harden_goat_tw_sent_score_bing2$Score),

sent_err =

sd(harden_goat_tw_sent_score_bing2$Score) /

sqrt(length(harden_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.7575758 0.0931491

Unlike James Harden, Kevin Durant’s histogram continued the trend of main-

taining a distribution that is centered at 0, however, he does have a higher frequency

of negative polarity values.

−5.0 −2.5 0.0 2.5

Sentiment Polarity: Kevin Durant

Count

Kevin Durant’s sentiment polarity scores can be seen to range from [

−

4] with

a mean and standard error of

−

12. Then, the mean can be estimated to

be between the values of [

−

, −

15]. Consequently, the dataset we gathered and

analyzed using the

bing

lexicon indicates a slightly negative sentiment polarity for

Kevin Durant.

kd_goat_tw_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -5.0000 -1.0000 0.0000 -0.2667 0.0000 4.0000

tibble(

sent_mean = mean(kd_goat_tw_sent_score_bing2$Score),

sent_err =

sd(kd_goat_tw_sent_score_bing2$Score) /

sqrt(length(kd_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.2666667 0.1180364

−2.5 0.0 2.5 5.0

Sentiment Polarity: Kobe Bryant

Count

Kobe Bryant’s sentiment polarity scores range from [

−

5] with a mean and

standard error of

−

16. The overall mean sentiment polarity can be estimated

to be between the values of [

−

08]. Thus, the dataset we gathered and analyzed

using the

bing

lexicon indicates a slightly negative sentiment polarity with some

neutral inﬂuence for Kobe Bryant.

kobe_goat_tw_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -4.00000 0.00000 0.00000 -0.07843 0.00000 5.00000

tibble(

sent_mean = mean(kobe_goat_tw_sent_score_bing2$Score),

sent_err =

sd(kobe_goat_tw_sent_score_bing2$Score) /

sqrt(length(kobe_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.0784314 0.1604971

With each NBA athlete’s bing sentiment polarity score having been calculated,

they can be ordered from ﬁrst to last as Michael Jordan, LeBron James, Kobe Bryant,

Kevin Durant, and James Harden, with the ﬁrst being the most positive and last

being the least. There are many factors that can be attributed to a player receiving a

positive or negative sentiment polarity score based on tweets but this would require a

larger dataset that covers a longer period than 6 − 9 days.

Bing Sentiment Polarities of Tweet Frequency Plots

Earlier in the paper, we plotted the tweet frequency by Twitter users for each of

the NBA players and their tweet datasets. For each player, there were at least one

noticeable spike in tweet frequency which raised interest to understand why it took

place and if it was beneﬁcial or detrimental to the sentiment. Using the following code,

we can create new datasets which solely contain information applying to the dates

of the frequency spikes. Once that is done, we can ﬁnd the

bing

sentiment polarity

score like we did in the previous section by taking the mean and standard error.

### Bing Sentiment Score

#### Michael Jordan

mj_goat_tw_freq <-

mj_goat_tw[(mj_goat_tw$created_at >= "2021-03-28 00:00:00" &

mj_goat_tw$created_at < "2021-03-29 00:00:00"), ]

mj_freq_sent_score_bing <-

lapply(mj_goat_tw_freq$text,

function(x){sentiment_bing_score(x)})

mj_freq_sent_score_bing2 <- rbind(

tibble(

Name = "Michael Jordan",

Score = unlist(map(mj_freq_sent_score_bing, "score")),

Type = unlist(map(mj_freq_sent_score_bing, "type"))

)

The

bing

sentiment polarity score for the day of Michael Jordan’s tweet frequency

spike, “2021-03-28,” ranges from [

−

2] with a mean and standard error of 0

11. This means that the true mean polarity is estimated to be within the values

[

−

18]. This produces a similar mean estimate to that of the overall polarity

score, but the spike does have a lower bottom estimate and higher upper estimate.

While this frequency spike has a greater chance of producing a negative

bing

sentiment

polarity, it also has a greater chance for a positive sentiment polarity.

mj_freq_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -3.00000 0.00000 0.00000 0.07692 0.00000 2.00000

tibble(

sent_mean = mean(mj_freq_sent_score_bing2$Score),

sent_err =

sd(mj_freq_sent_score_bing2$Score) /

sqrt(length(mj_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.0769231 0.1093176

Tweets referring to LeBron James as the GOAT experienced a spike on days of

“2021-03-28” and “2021-03-31.” On these dates, the sentiment polarities range between

[

−

3] with a mean and standard error of

−

11 such that the true mean is

estimated to be within the values of [

−

, −

03]. This frequency spike produced a

much more negative

bing

sentiment polarity when compared to the polarity of his

entire dataset. On the above dates the Los Angeles Lakers, who LeBron James plays

for, had two games in which they won one and lost the other. LeBron did not play

in either game so the negative sentiment does not seem to be the result of his own

personal performance, but could have been caused by his own team’s performance

and the fact that he did not participate.

lebron_freq_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -4.0000 -1.0000 0.0000 -0.1368 0.0000 3.0000

tibble(

sent_mean = mean(lebron_freq_sent_score_bing2$Score),

sent_err =

sd(lebron_freq_sent_score_bing2$Score) /

sqrt(length(lebron_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.1367521 0.1064567

Tweets about James Harden had a spike on “2021-03-31.” On this date, the

sentiment polarities have a range between [

−

0] with a mean and standard error

−

14 such that the true mean is estimated to be within the values of

[

−

, −

90]. On “2021-03-31” the team that James Harden plays for, the Brooklyn

Nets, had a game that they won against his former team, the Houston Rockets.

He participated in the game and had a decent performance in which he scored 17

points, had 6 assists, and 8 rebounds in 27 minutes of play. Despite the victory and

performance, the estimated sentiment polarity on this day is about 0

2 more negative

than his dataset as a whole.

harden_freq_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -3.000 -1.000 -1.000 -1.042 -1.000 0.000

tibble(

sent_mean = mean(harden_freq_sent_score_bing2$Score),

sent_err =

sd(harden_freq_sent_score_bing2$Score) /

sqrt(length(harden_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-1.041667 0.1408973

Kevin Durant’s GOAT tweets experienced a spike on days of “2021-03-28” and

“2021-03-30.” On these dates, the sentiment polarities range between [

−

4] with a

mean and standard error of

−

14 such that the true mean is estimated to

be within the values of [

−

12]. Kevin Durant also plays on the Brooklyn Nets

with James Harden, but there was not a game on the above dates so the increase

in frequency was not related to any game performance. However, on “2021-03-30”

Kevin Durant and actor, Michael Rapaport, exchanged direct messages which were

screenshotted and posted to Twitter by Rapaport. The contents of the messages were

not necessarily friendly, yet Kevin Durant’s sentiment polarity is much more neutral

and is approximately 0.2 more positive than the original dataset.

kd_freq_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -2.00000 0.00000 0.00000 -0.02128 0.00000 4.00000

tibble(

sent_mean = mean(kd_freq_sent_score_bing2$Score),

sent_err =

sd(kd_freq_sent_score_bing2$Score) /

sqrt(length(kd_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.0212766 0.1442367

Kobe Bryant’s tweet dataset encountered a frequency spike on days of “2021-04-

06,” “2021-04-09,” and “2021-04-12.” On these dates, the sentiment polarities range

between [

−

1] with a mean and standard error of

−

15 such that the true

mean is estimated to be within the values of [

−

11]. The spike in frequency

presented a polarity score which is almost identical to that of Kobe’s entire dataset,

just a touch more positive. The increase of tweets on “04-12-21” is most likely due

to it being the ﬁve-year anniversary of his farewell game in which he played his ﬁnal

NBA game and scored 60 points.

kobe_freq_sent_score_bing2$Score %>% summary()

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## -2.00000 0.00000 0.00000 -0.04348 0.00000 1.00000

tibble(

sent_mean = mean(kobe_freq_sent_score_bing2$Score),

sent_err =

sd(kobe_freq_sent_score_bing2$Score) /

sqrt(length(kobe_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.0434783 0.1471503

After comparing the

bing

sentiment polarity values for each player in regards to

their dataset as a whole and by the spikes in tweet frequency, the frequency spikes

were only positive for two of the ﬁve NBA athletes.

The following ﬁgures represent the histograms of the NBA players from the original

bing sentiment polarities along with the sentiment polarities of the frequency spikes

as a method of comparison.

ggplot(nba_goat_tw_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Original Sentiment Polarity") + ylab("Count") +

theme_minimal()

James Harden

Kevin Durant

Kobe Bryant

LeBron James

Michael Jordan

−3 0 3 −3 0 3 −3 0 3 −3 0 3 −3 0 3

100

150

200

Original Sentiment Polarity

Count

Name

James Harden

Kevin Durant

Kobe Bryant

LeBron James

Michael Jordan

ggplot(nba_freq_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Frequency Spike Sentiment Polarity") + ylab("Count") +

theme_minimal()

James Harden

Kevin Durant

Kobe Bryant

LeBron James

Michael Jordan

−2.5 0.0 2.5 −2.5 0.0 2.5 −2.5 0.0 2.5 −2.5 0.0 2.5 −2.5 0.0 2.5

Frequency Spike Sentiment Polarity

Count

Name

James Harden

Kevin Durant

Kobe Bryant

LeBron James

Michael Jordan

From the above results and analyses, the GOAT debate between Michael Jordan

and LeBron James can be decided as a victory in the favor of Michael Jordan for

having the most positive sentiment polarity of 0

079

059. LeBron James’ sentiment

polarity was not too far behind so it would be interesting to see how much the results

vary according to new datasets.

We can also extend our analyses to other sports to determine how their athletes

respond to the GOAT debate. With that being said, we may be able to crown a

GOAT for each sport using the athletes we sampled from.

NFL

For the NFL, the professional athletes that we gathered Twitter data on include

Aaron Rodgers, Jerry Rice, Patrick Mahomes, and Tom Brady. Aaron Rodgers is a

quarterback for the Green Bay Packers who won Super Bowl XLV, was named Super

Bowl MVP, and is considered to be one of the best quarterbacks in the NFL. Jerry

Rice is a former wide receiver who won three Super Bowls (XXIII, XXIV, XXIX), a

Super Bowl MVP, and was named to the NFL Hall of Fame in 2010. Patrick Mahomes

is a quarterback for the Kansas City Chiefs that won Super Bowl LIV and was named

Super Bowl MVP. Finally, Tom Brady is a quarterback for the Tampa Bay Buccaneers

who has won seven Super Bowls (XXXVI, XXXVIII,XXXIX, XLIX, LI, LIII, LV),

ﬁve Super Bowl MVP’s (XXXVI, XXXVIII, XLIX, LI, LV), and is widely considered

to be the NFL’s GOAT.

With the above NFL athletes, will focus on the unique and interesting results

from the various analyses that the NBA athletes were put through.

Timeline of Tweets - Frequency Plot

Tom Brady

To begin, we can take a look at the frequency plots of Tom Brady in

both situations of plotting by hours and days. Within the NFL sample, Tom Brady

has the highest frequency of tweets with consistent spikes in activity. The two biggest

frequency spikes occurred on “2021-03-27” and “2021-03-28” with approximately 60

and 50 tweets, respectively.

Mar 28 Mar 30 Apr 01 Apr 03

Frequency of tweets with Tom Brady GOAT Keyword

Data collected from Twitter's API via rtweet

Mar 27 Mar 29 Mar 31 Apr 02 Apr 04

Frequency of tweets with Tom Brady GOAT Keyword

Data collected from Twitter's API via rtweet

Word Cloud Analysis

Tom Brady

It was also interesting to see how Tom Brady’s word cloud analysis

stacked up against the others because his word cloud not only outnumbered the others,

but emphasizes his presence and/or dominance within the NFL’s GOAT debate.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

words_tb_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "red3"))

tom

brady

goat

time

love

football

super

life

champion

world

bowl

human

mafraud

moment

light

planet

shining

trust

walk

basketball

team

jordan

michael

sports

nfl

ronaldo

ali

baseball

gretzky

lol

phelps

soccer

tennis

track

babe

boxing

game

hockey

messi

muhammed

people

play

ruth

swimming

bolt

federer

mahomes

win

won

manning

<u+0001f410>

lebron

patriots

peyton

player

qbs

rings

top

winning

bill

history

rodgers

season

sport

aaron

card

considered

goats

lost

money

montana

record

wins

<u+0001f602>

bradys

called

day

defense

guy

jones

played

stats

tampa

throw

type

agree

bowls

bucs

career

england

fan

joe

left

quarterback

tho

watch

Word Networks

Tom Brady

Tom Brady’s bigram and trigram ﬁgures also have interesting results

as the names of Aaron Rodgers, Patrick Mahomes. It can also be noted that Tom

Brady shares a connection with elite athletes in other sports like Michael Jordan,

Wayne Gretsky, and Muhammad Ali since they also appear in the networks.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

tom

super

goat

mafraud

shining

time

world

michael

ali

babe

baseball

boxing

brady

football

gretzky

hockey

muhammed

phelps

ronaldo

tennis

basketball

bolt

federer

jordan

soccer

sports

swimming

track

bleacherreport

peyton

aaron

patrick

mac

nfl

regular

tampa

ultraweedhater

undisputed

bowl

light

champion

ruth

messi

manning

rodgers

mahomes

bowls

bradyâ

œgoatâ

œthe

left

card

tombrady

jones

season

bay

100

200

300

Tweets using Tom Brady GOAT

Bigram Word Network

goat

mafraud

time

tom

baseball

boxing

football

hockey

muhammed

phelps

ali

basketball

bolt

brady

federer

gretzky

jordan

michael

ronaldo

sports

swimming

tennis

track

world

babe

soccer

15.0

17.5

20.0

22.5

Tweets using Tom Brady GOAT Keyword

Trigram Word Network

Bing Sentiment Polarity

As we move past the initial analysis phase, we can transition into the sentiment

analysis phase to determine each NFL player’s polarity within their dataset. Like

before, we can use the results to name a GOAT within the NFL sample and compare

them to that of the NBA sentiment polarity values.

Aaron Rodgers

Using the same

sentiment_bing_score()

function from the

NBA analysis, we can calculate Aaron Rodgers’ sentiment polarity scores to range

from [

−

5] with a mean and standard error of 0

23. Using the standard error,

the true polarity is within the values of [0

67] which can be interpreted as Aaron

Rodgers having a generally positive dataset.

tibble(

sent_mean = mean(arodgers_goat_tw_sent_score_bing2$Score),

sent_err =

sd(arodgers_goat_tw_sent_score_bing2$Score) /

sqrt(length(arodgers_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.4444444 0.2269342

Jerry Rice

Likewise, we can calculate Jerry Rice’s sentiment polarity scores to

range from [

−

2] with a mean and standard error of

−

14. Then, the true

mean is within the values of [

−

02] which means that Jerry Rice’s data has a

slightly negative attitude towards him.

tibble(

sent_mean = mean(jrice_goat_tw_sent_score_bing2$Score),

sent_err =

sd(jrice_goat_tw_sent_score_bing2$Score) /

sqrt(length(jrice_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.15625 0.1427649

Patrick Mahomes

Next, Patrick Mahomes’ sentiment polarity scores seem to

range between [

−

3] with a mean and standard error of 0

22. So, the true

mean is within the values of [0

46] which is positive but not as much as Aaron

Rodgers.

tibble(

sent_mean = mean(mahomes_goat_tw_sent_score_bing2$Score),

sent_err =

sd(mahomes_goat_tw_sent_score_bing2$Score) /

sqrt(length(mahomes_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.2380952 0.2171763

Tom Brady

Finally, Tom Brady’s sentiment polarity scores have the greatest

range of polarity scores which are between [

−

5] and have an estimated mean and

standard error of 0

371

077. Therefore, the true polarity is in [0

294

448] which

is highly positive. It is even greater than Michael Jordan’s sentiment polarity which

was the highest until this point.

tibble(

sent_mean = mean(tb_goat_tw_sent_score_bing2$Score),

sent_err =

sd(tb_goat_tw_sent_score_bing2$Score) /

sqrt(length(tb_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.3712575 0.077264

Bing Sentiment Polarities of Tweet Frequency Plots

While the sentiment polarities of the spikes in frequency plots for the NBA athletes

were not necessarily higher than their overall sentiment polarity, it is sensible to look

into how the NFL sample reacts because their response could be entirely diﬀerent.

Aaron Rodgers

Within Aaron Rodgers’ tweet frequency plot which covers the

days from “2021-03-27” to “2021-04-03,” there was an increase in tweets on “2021-04-

01” and “2021-04-03.” On these days, the sentiment polarity scores have a range of

[

−

2] with a mean and error of 0

32. If the true sentiment polarity for these

two frequency spikes is within [0

70], then the spikes can be seen as positive

inﬂuences to the overall sentiment polarity. However, the frequency spikes do not

grant a better sentiment polarity since the overall dataset has a greater ﬂoor estimate

and an almost identical ceiling.

tibble(

sent_mean = mean(arodgers_freq_sent_score_bing2$Score),

sent_err =

sd(arodgers_freq_sent_score_bing2$Score) /

sqrt(length(arodgers_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.375 0.3238992

Jerry Rice

Jerry Rice’s tweet frequency plot spans from “2021-03-27” to “2021-04-

04” with frequency spikes on “2021-04-01” and “2021-04-03.” The sentiment polarity

scores for the two days have a range of [

−

1], as well as a mean and error of

−

18. Since the true sentiment polarity is within [

−

02], then the spikes

can be seen as negative inﬂuences to the overall sentiment polarity. Also, the tweet

spikes have a worse ﬂoor estimate so they are not better than the dataset as a whole.

tibble(

sent_mean = mean(jrice_freq_sent_score_bing2$Score),

sent_err =

sd(jrice_freq_sent_score_bing2$Score) /

sqrt(length(jrice_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.1578947 0.1754386

Patrick Mahomes

The tweets from Patrick Mahomes’ tweet frequency plot were

tweeted between the dates of “2021-03-27” and “2021-04-04” with a surge coming on

“2021-04-01.” The sentiment polarity score for this day has a range from [0

3], along

with a mean and error of 0

60. Given that the true sentiment polarity is within

2], the increase in frequency was a positive inﬂuence on the overall sentiment

polarity. On another note, the sentiment polarity for this spike is also greater that

the dataset’s making it a succesful day.

tibble(

sent_mean = mean(mahomes_freq_sent_score_bing2$Score),

sent_err =

sd(mahomes_freq_sent_score_bing2$Score) /

sqrt(length(mahomes_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.6 0.6

Tom Brady

Lastly, Tom Brady’s tweet frequency plot extends from “2021-03-27”

to “2021-04-04” and captured tweet frequency spikes on “2021-03-27” and “2021-03-28.”

The sentiment polarity score for these days have a minimum and maximum of [

−

5],

in addition to a mean and error of 1

21. With the true sentiment polarity of

the frequency spikes are in the range of [1.2, 1.62], we are able to consider the spikes

as being positive despite being lower than the sentiment polarity calculated in the

previous section.

tibble(

sent_mean = mean(tb_freq_sent_score_bing2$Score),

sent_err =

sd(tb_freq_sent_score_bing2$Score) /

sqrt(length(tb_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

1.409836 0.2134508

In review, the tweet frequency spikes that took place in the NFL datasets had

a better impact than those in the NBA datasets seeing as the sentiment polarities

were positive for Patrick Mahomes and Tom Brady. Moreover, we can crown Aaron

Rodgers as the GOAT over the other NFL players we researched for having a

bing

sentiment polarity of 0.44 ± 0.23.

Comparing Sentiment Histograms

In the ﬁgures below, we can compare the histograms of the sentiment polarity

distributions that we computed in the past two sections. While the shapes of the

histograms vary among the situations, the biggest change was the decrease in the

count of the values which dropped from 150 to 25.

ggplot(nfl_goat_tw_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Original Sentiment Polarity") + ylab("Count") +

theme_minimal()

Aaron Rodgers

Jerry Rice

Patrick Mahomes

Tom Brady

−3 0 3 −3 0 3 −3 0 3 −3 0 3

100

150

Original Sentiment Polarity

Count

Name

Aaron Rodgers

Jerry Rice

Patrick Mahomes

Tom Brady

ggplot(nfl_freq_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Frequency Spike Sentiment Polarity") + ylab("Count") +

theme_minimal()

Aaron Rodgers

Jerry Rice

Patrick Mahomes

Tom Brady

−2 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 2 4

Frequency Spike Sentiment Polarity

Count

Name

Aaron Rodgers

Jerry Rice

Patrick Mahomes

Tom Brady

MLB

Another sport that we will be pursuing analytically is baseball and our sampled

athletes consist of Clayton Kershaw, Miguel Cabrera, Mike Trout, and Shohei Ohtani

of the MLB. Clayton Kershaw is currently a pitcher for the Los Angeles Dodgers who

has three Cy Young Awards (2011, 2013, 2014) and won the World Series in 2020.

Miguel Cabrera plays for the Detroit Tigers as a ﬁrst baseman, is a two-time American

League MVP (2012, 2013), and won the World Series in 2003. Mike Trout plays center

ﬁeld for the Los Angeles Angels, was selected to the All-MLB First Team in 2019 and

2020, and is a three-time American League MVP (2014, 2016, 2019). Shohei Ohtani

also plays for the Los Angeles Angels as a pitcher who is a Japan Series champion

(2016), and a Paciﬁc League MVP (2016).

Timeline of Tweets - Frequency Plot

Mike Trout

Among the MLB players in our sample, Mike Trout has the most

unique and active frequency plot with multiple spikes, but the most signiﬁcant came

on “2021-03-30” and “2021-04-02.” On those days, the second frequency plot allows us

to decipher that the number of tweets increased from 1 to 6 and 9 to 18, respectively.

Mar 31 Apr 02 Apr 04 Apr 06

Frequency of tweets with Mike Trout GOAT Keyword

Data collected from Twitter's API via rtweet

Mar 30 Apr 01 Apr 03 Apr 05

Frequency of tweets with Mike Trout GOAT Keyword

Data collected from Twitter's API via rtweet

Word Cloud Analysis

Mike Trout

Mike Trout’s word cloud was also the most unique among the others

as it consisted of more than just his name and actually referenced baseball terms

such as the American and National League, as well as baseball legend Barry Bonds.

Michael Jordan’s last name was also referenced in the cloud analysis.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

words_mtrout_goat %>%

with(wordcloud(word, n, random.order = FALSE,

max.words = 100, colors = "red"))

trout

mike

goat

won

angels

baseball

hit

mvp

series

world

bonds

jordan

mlb

mookie

players

season

watch

Word Networks

Shohei Ohtani

For the bigram and trigram word network analysis, Shohei Ohtani’s

data performed the best while also having substance. For example, Mike Trout has

many connections with his n-gram networks but they are so crowded that it is not

readable. In Shohei’s trigram network, it is interesting that all of the word pairings

are bold and connect to form interesting shapes. Another interesting fact is that his

network only contains baseball terms or other MLB players, not athletes from other

sports like in the NBA and NFL word networks.

shohei

shane

2019

451

absolute

acquired

allowed

amp

baseball

bieber

cool

drunkenly

ericcross04

foot

game

goat

home

mike

mookie

motherfuckin

mound

named

ohtani

paradise

plate

potential

rotoclegg

screw

solo

talented

traded

trout

yooooo

yordan

hit

season

joke

player

video

farm

run

itâ

walks

extremely

2.5

5.0

7.5

Tweets using Shohei Ohtani GOAT Keyword

Bigram Word Network

451

acquired

allowed

amp

bieber

drunkenly

ericcross04

foot

game

goat

mike

mookie

named

plate

potential

rotoclegg

screw

shane

shohei

solo

talented

traded

trout

yooooo

yordan

ohtani

home

baseball

Tweets using Shohei Ohtani GOAT Keyword

Trigram Word Network

Bing Sentiment Polarity

Clayton Kershaw

After using the

sentiment_bing_score()

function on Clayton

Kershaw’s dataset, we can use the following code to see that the

bing

sentiment

polarities have a range of [

−

1]. Plus, the sentiment polarities have a mean and

standard error of

−

75 which puts the true polarity value in [

−

5]. This

implies that the tweets in his dataset have a slightly negative connotation.

tibble(

sent_mean = mean(kershaw_goat_tw_sent_score_bing2$Score),

sent_err =

sd(kershaw_goat_tw_sent_score_bing2$Score) /

sqrt(length(kershaw_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.25 0.75

Miguel Cabrera

Taking a look into Miguel Cabrera’s sentiment scores, we see

that they range from [0

4] with a mean and error of 1

77. This would put

the true polarity value somewhere within [0

77]. Even with the bounds being so

large, the attitudes are still positive.

tibble(

sent_mean = mean(mcabrera_goat_tw_sent_score_bing2$Score),

sent_err =

sd(mcabrera_goat_tw_sent_score_bing2$Score) /

sqrt(length(mcabrera_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

1 0.7745967

Mike Trout

In the case of Mike Trout, he has minimum and maximum polarity

bounds of [

−

2] which obtain a mean and error of 0

000

084. Due to the true

sentiment polarity being in [

−

084

084] such that it is equally as negative as it is

positive, the dataset’s tweets portray a neutral opinion of Mike Trout.

tibble(

sent_mean = mean(mtrout_goat_tw_sent_score_bing2$Score),

sent_err =

sd(mtrout_goat_tw_sent_score_bing2$Score) /

sqrt(length(mtrout_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0 0.0842842

Shohei Ohtani

Finally, after observing Shohei Ohtani’s sentiments we notice

that the polarities are captured in a range from [

−

1]. Upon further research, these

polarity values possess a mean and error of

−

34 which places the true sentiment

polarity among the values of [

−

20]. With that said, the tweets in the dataset

have a more negative tone towards Shohei than positive.

tibble(

sent_mean = mean(sohtani_goat_tw_sent_score_bing2$Score),

sent_err =

sd(sohtani_goat_tw_sent_score_bing2$Score) /

sqrt(length(sohtani_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.1428571 0.340068

Bing Sentiment Polarities of Tweet Frequency Plots

Once again, we will gather the sentiment polarity of the spikes in the tweet frequency

plots for each MLB athlete before determining whether the spikes had a positive or

negative impact on the sentiment polarity of the entire dataset. Meanwhile, it is

helpful to understand if the increase in tweet frequencies were positive or negative

because that oﬀers a reason for further research into what the cause was.

Clayton Kershaw

Clayton Kershaw’s frequency plot reports on the dates of “2021-

03-30” to “2021-04-05,” including the tweet surge on “2021-03-31” and “2021-04-05.”

The sentiment polarity values of these days have the minimum and maximum bounds

[

−

1] coupled with a mean and standard error of

−

88. This leaves us with a

sentiment polarity that is mostly negative and between [

−

21]. Subsequently,

the increase in tweet frequencies oﬀer a negative inﬂuence that is not a better than

the polarity of the entire dataset.

tibble(

sent_mean = mean(kershaw_freq_sent_score_bing2$Score),

sent_err =

sd(kershaw_freq_sent_score_bing2$Score) /

sqrt(length(kershaw_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.6666667 0.8819171

Miguel Cabrera

Secondly, Miguel Cabrera’s frequency plot extends from “2021-

04-01” to “2021-04-02.” Within this time frame, there was a single of tweet on

“2021-04-02” that caused a spike and obtained a value of 4

± N A

. Since there is

only one tweet, it is not really possible to interpret the results without being biased,

especially with the value being 4.

tibble(

sent_mean = mean(mcabrera_freq_sent_score_bing2$Score),

sent_err =

sd(mcabrera_freq_sent_score_bing2$Score) /

sqrt(length(mcabrera_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

4 NA

Mike Trout

Next, Mike Trout’s dataset contains tweets from “2021-03-29” to

“2021-04-05” which undergo an increase in tweet frequency on the dates of “2021-03-30”

and “2021-04-02.” The listed days have sentiment values that range from [

−

1] which

lead to a mean and error of

−

12. The true sentiment polarity can ultimately

be deﬁned within [

−

08] which is marginally negative. This also follows the

trend of being worse than the dataset’s overall polarity since the original senitment is

neutral.

tibble(

sent_mean = mean(mtrout_freq_sent_score_bing2$Score),

sent_err =

sd(mtrout_freq_sent_score_bing2$Score) /

sqrt(length(mtrout_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.04 0.122202

Shohei Ohtani

Finally, Shohei Ohtani’s dataset captures tweets beginning on

“2021-03-30” and through “2021-04-05.” The day that underwent a frequency spike

is listed as “2021-04-05” and is understood to have a mean and standard error of

25, including minimum and maximum values of [0

1]. This presents us with

a true sentiment polarity that can be deﬁned within 0

5 which is positive and

greater than Shohei’s sentiment that includes the entire dataset.

tibble(

sent_mean = mean(sohtani_freq_sent_score_bing2$Score),

sent_err =

sd(sohtani_freq_sent_score_bing2$Score) /

sqrt(length(sohtani_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.25 0.25

To sum up the sentiment analysis (Feldman 2013) of the MLB players, Shohei

Ohtani was the only player whose spike in tweet frequency had positive sentiment

polarity and was positively inﬂuential to the polarity of their dataset as a whole.

Furthermore, Miguel Cabrera’s sentiment polarity of 1

77 was the highest

among the others in the sample, thus, we can crown him the GOAT of our MLB

sample.

Comparing Sentiment Histograms

In the ﬁgures below, we are able to compare the histograms of the sentiment polarity

distributions that we computed in the prior two sections. The shapes of the histograms

marginally shift from one ﬁgure to the next, but the main diﬀerences are the decrease

in the count of the sentiment values which dropped from 40 to 15, and Miguel Cabrera’s

lack of negative values in the sentiment frequency histogram.

ggplot(mlb_goat_tw_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Original Sentiment Polarity") + ylab("Count") +

theme_minimal()

Clayton Kershaw

Miguel Cabrera

Mike Trout

Shohei Ohtani

−2 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 2 4

Original Sentiment Polarity

Count

Name

Clayton Kershaw

Miguel Cabrera

Mike Trout

Shohei Ohtani

ggplot(mlb_freq_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Frequency Spike Sentiment Polarity") + ylab("Count") +

theme_minimal()

Clayton Kershaw

Miguel Cabrera

Mike Trout

Shohei Ohtani

−2 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 2 4

Frequency Spike Sentiment Polarity

Count

Name

Clayton Kershaw

Miguel Cabrera

Mike Trout

Shohei Ohtani

NCAA Basketball

After focusing on the NBA, NFL, and MLB, I thought that it would be interesting

to shift the focus to analyzing the NCAA. The NCAA Division I Men’s and Women’s

Basketball Tournaments both take place in March, and given their vast popularity and

media coverage, I took the opportunity to include two players from each tournaments

into my NCAA Basketball sample. On the men’s side, I selected Drew Timme and

Jalen Suggs who both played for the Gonzaga Bulldogs, the runner-ups in the 2021

championship game. For the women, I selected Aari McDonald who played for the

Arizona Wildcats, the runner-ups in the 2021 championship game. I also chose Paige

Bueckers who plays for the UConn Huskies as she lead her team to the Final Four and

became the ﬁrst freshman to win AP Player of the Year, Naismith Trophy, Wooden

Award POY, and the Nancy Lieberman Award.

Timeline of Tweets - Frequency Plot

Paige Bueckers

From Paige Buecker’s tweet frequency plot below, she has a high

count of tweets referring to her as the GOAT as well as surges that appear throughout

the entire dataset. The most notable and drastic peaks could be viewed as taking place

on “2021-03-30” and “2021-04-01.” The second frequency plot looks at the dataset

from another angle by adjusting the time interval to group tweets by days instead of

hours.

0.0

2.5

5.0

7.5

10.0

12.5

Mar 31 Apr 02 Apr 04 Apr 06

Frequency of tweets with Paige Bueckers GOAT Keyword

Data collected from Twitter's API via rtweet

Mar 30 Apr 01 Apr 03 Apr 05

Frequency of tweets with Paige Bueckers GOAT Keyword

Data collected from Twitter's API via rtweet

Word Cloud Analysis

Jalen Suggs A few things to notice about Jalen Suggs’ word cloud is that Paige

Bueckers’ name appears, the word “ﬁnal” is also mentioned which may be referring to

the championship game, and that “buzzer beater” is listed because of his game-winning

shot in the semi-ﬁnals against the University of California, Los Angeles (UCLA) that

advanced Gonzaga to the ﬁnals.

Note: The following ﬁgures may contain inappropriate language, but is included

to illustrate the prevalence of such terms within the dataset

jalen

suggs

goat

paige

bueckers

game

jim

nantz

perspective

uclakilling

advice

buzzerbeater

friend

readied

<u+0001f410>

credits

final

gonzagas

time

uconns

fuckin

fucking

omg

run

winners

Paige Bueckers

Similarly, Jalen Suggs’ name appears in Paige’s word cloud.

Paige’s word cloud contains the most words among her fellow NCAA basketball

players with some referencing her achievements and extraordinary skills.

bueckers

paige

goat

suggs

freshman

jalen

uconns

uconn

completing

season

unprecedented

game

credits

final

friend

wnba

advice

basketball

readied

womens

baylor

gonzagas

phenom

run

time

<u+0001f410>

auriemma

coach

confidence

expect

geno

happen

poy

Word Networks

Jalen Suggs

Using the bigram and trigram analyses, we are able to see the

connections between words that are in pairs and in trios. After seeing the top words

in his word cloud, it is interesting to see how those words are all connected.

Paige Bueckers

Using the bigram and trigram analyses, we are also able to see

the connections between w pairs and trios of words in Paige’s dataset. One interesting

point is that Paige has a stronger connection to the term GOAT in both the bigram

and trigram plots when compared to Jalen’s ﬁgures.

jalen

paige

jim

buzzer

goat

suggs

ucla

amp

bueckers

killing

gonzaga's

readied

uconn's

friend

game

nantz

beater

perspective

039

credits

winners

Tweets using Jalen Suggs GOAT

Bigram Word Network

jalen

suggs

killing

paige

ucla

bueckers

gonzaga's

readied

uconn's

friend

buzzer

Tweets using Jalen Suggs GOAT Keyword

Trigram Word Network

paige

jalen

bueckers

freshman

goatâ

unprecedented

goat

suggs

amp

uconn's

credits

gonzaga's

readied

wnba

friend

geno

phenom

uconn

sheâ

completing

season

uconnâ

039

auriemma

Tweets using Paige Bueckers GOAT

Bigram Word Network

paige

unprecedented

goat

jalen

uconn's

bueckers

gonzaga's

readied

suggs

credits

friend

phenom

uconn

freshman

Tweets using Paige Bueckers GOAT Keyword

Trigram Word Network

Bing Sentiment Polarity

As we did for the other sports and athletes, we will use the

sentiment_bing_score()

function to calculate the bing sentiment polarity for the NCAA athletes’ datasets.

Drew Timme

Drew Timme’s sentiment polarities can be found within the range

[

−

2], meanwhile, the sentiment polarity has a mean and standard error of 0

14.

Adjusting for the error produces a true sentiment polarity between [

−

20] which

is slightly positive, overall.

tibble(

sent_mean = mean(timme_goat_tw_sent_score_bing2$Score),

sent_err =

sd(timme_goat_tw_sent_score_bing2$Score) /

sqrt(length(timme_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.0625 0.1434326

Jalen Suggs

Jalen Suggs’ sentiment polarities can be found ranging between

[

−

3], with a sentiment polarity that has a mean and standard error of 0

030

067.

Adjusting for the error produces a true sentiment polarity between [

−

037

097]

which is slightly positive, like Drew Timme.

tibble(

sent_mean = mean(jsuggs_goat_tw_sent_score_bing2$Score),

sent_err =

sd(jsuggs_goat_tw_sent_score_bing2$Score) /

sqrt(length(jsuggs_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.03 0.0673525

Aari McDonald

Aari McDonald’s sentiment polarity can be found within the

values [

−

0], with a sentiment polarity that has a mean and standard error of

−

091

091. Adjusting for the error produces a true sentiment polarity between

[−0.182, 0.000] which is slightly slightly negative.

tibble(

sent_mean = mean(amcdonald_goat_tw_sent_score_bing2$Score),

sent_err =

sd(amcdonald_goat_tw_sent_score_bing2$Score) /

sqrt(length(amcdonald_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.0909091 0.0909091

Paige Bueckers

Paige Buecker’s sentiment polarity can be found in the range

of values with a minimum and maximum of [

−

3], such that the polarity that has

a mean and error of 0

10. Adjusting for the error produces a true sentiment

polarity between [0

47] which is positive, and the highest among the other NCAA

athletes.

tibble(

sent_mean = mean(pbueckers_goat_tw_sent_score_bing2$Score),

sent_err =

sd(pbueckers_goat_tw_sent_score_bing2$Score) /

sqrt(length(pbueckers_goat_tw_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.369863 0.1004307

Bing Sentiment Polarities of Tweet Frequency Plots

### Frequency Bing Sentiment Score

timme_goat_tw_freq <-

timme_goat_tw[

(timme_goat_tw$created_at >= "2021-04-04 00:00:00" &

timme_goat_tw$created_at < "2021-04-05 00:00:00") |

(timme_goat_tw$created_at >= "2021-04-05 00:00:00" &

timme_goat_tw$created_at < "2021-04-06 00:00:00"), ]

timme_freq_sent_score_bing <-

lapply(timme_goat_tw_freq$text,

function(x){sentiment_bing_score(x)})

timme_freq_sent_score_bing2 <- rbind(

tibble(

Name = "Drew Timme",

Score = unlist(map(timme_freq_sent_score_bing, "score")),

Type = unlist(map(timme_freq_sent_score_bing, "type"))

)

Drew Timme

Referring back to the tweet frequency plots, Drew Timme’s tweet

dataset ranges from “2021-03-30” to “2021-04-06” in which there were two spikes on

“2021-04-05” and “2021-04-06.” On those days, the

bing

sentiment polarity scores

range from [0

2] with a mean and standard error of 0

25 meaning that the true

mean is likely between the values of [0

5]. These values are much more positive

than Timme’s overall sentiment polarity score, so the spike acted in his favor.

tibble(

sent_mean = mean(timme_freq_sent_score_bing2$Score),

sent_err =

sd(timme_freq_sent_score_bing2$Score) /

sqrt(length(timme_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.25 0.25

Jalen Suggs

Jalen Suggs’ tweet dataset ranges from “2021-03-31” to “2021-04-06”

where he experienced a spike in frequency on “2021-04-04.” The sentiment polarity

ranges from [

−

3] with a mean and standard error of 0

024

076. This would place

the true mean between [

−

052

1] which also turns out to be more positive than his

original sentiment polarity.

tibble(

sent_mean = mean(jsuggs_freq_sent_score_bing2$Score),

sent_err =

sd(jsuggs_freq_sent_score_bing2$Score) /

sqrt(length(jsuggs_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.0238095 0.0756994

Aari McDonald

Aari McDonald’s dataset that covers the dates from “2021-04-03”

to “2021-04-05” underwent a couple spikes on “2021-04-03” and “2021-04-05” and oﬀer

polarities that range from [

−

0] with a mean and standard error of

−

17.

This gives a true mean within the values of [

−

0], proving that the spike was

detrimental since the ﬂoor estimate almost doubled in value.

tibble(

sent_mean = mean(amcdonald_freq_sent_score_bing2$Score),

sent_err =

sd(amcdonald_freq_sent_score_bing2$Score) /

sqrt(length(amcdonald_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

-0.1666667 0.1666667

Paige Bueckers

Finally, the tweet dataset for Paige Bueckers captures the dates

“2021-03-29” to “2021-04-05,” with spikes on “2021-03-30” and “2021-04-01.” On these

dates, the sentiment polarity has a minimum and maximum of [

−

3] with a mean

and standard error of 0.22 ± 0.10. With the true estimate being between [0.12, 0.32],

these values are still smaller than the overall dataset but has more of a neutral eﬀect

since they are positive.

tibble(

sent_mean = mean(pbueckers_freq_sent_score_bing2$Score),

sent_err =

sd(pbueckers_freq_sent_score_bing2$Score) /

sqrt(length(pbueckers_freq_sent_score_bing2$Score))

) %>% kable()

sent_mean sent_err

0.2195122 0.1018858

After comparing each NCAA basketball player’s

bing

sentiment polarity of their

entire dataset to their spikes in tweet frequency, the frequency spikes’ polarity was

more positive, or better, for only one of the four players. On a positive note, Paige

Buecker has earned the title of GOAT among the NCAA Basketball athletes within

our sample after earning a bing sentiment polarity score of 0.37 ± 0.10.

Comparing Sentiment Histograms

The following ﬁgures represent the histograms of the NCAA basketball players

from the original

bing

sentiment polarities along with the sentiment polarities of the

frequency spikes as a method of comparison.

ggplot(ncaab_goat_tw_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Original Sentiment Polarity") + ylab("Count") +

theme_minimal()

Aari McDonald

Drew Timme

Jalen Suggs

Paige Bueckers

−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3

Original Sentiment Polarity

Count

Name

Aari McDonald

Drew Timme

Jalen Suggs

Paige Bueckers

ggplot(ncaab_freq_sent_score_bing, aes(x = Score, fill = Name)) +

geom_histogram(bins = 15, alpha = 0.9) +

facet_grid(~Name) +

xlab("Frequency Spike Sentiment Polarity") + ylab("Count") +

theme_minimal()

Aari McDonald

Drew Timme

Jalen Suggs

Paige Bueckers

−1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3 −1 0 1 2 3

Frequency Spike Sentiment Polarity

Count

Name

Aari McDonald

Drew Timme

Jalen Suggs

Paige Bueckers

Comparisons Across Sports

The two athletes with the

highest bing

sentiment polarity are Miguel Cabrera of

the MLB and Aaron Rodgers of the NFL with scores of 1

77 and 0

23,

respectively. On the other hand, the two athletes with the

lowest bing

sentiment

polarity are James Harden of the NBA and Aari McDonald of NCAA Women’s

Basketball with scores of

−

758

093 and

−

17, respectively. This shows

that athletes receive criticism and praise even as professionals who are at the top

of their game. To summarize, the GOATs within our four-sport sample are Michael

Jordan for the NBA, Aaron Rodgers for the NFL, Miguel Cabrera for the MLB, and

Paige Bueckers for NCAA Women’s Basketball.

Conclusion

Regardless of the results that were presented by the various analyses performed on

the athletes’ datasets, there are always improvements that can be made. To begin,

it would be more insightful to increase the sample size of the athletes within each

sport along with the number of sports being analyzed. This would allow for the ability

to compare and contrast on a greater scale, whether it is internally or externally.

Another improvement would involve gathering more tweets per athlete, over period of

time that is greater than 6

−

9 days. This would improve the accuracy of the

bing

sentiment scores by decreasing the standard error. Increasing the time period would

also make it easier to locate frequency trends which could assist in gaining a better

understanding of how to predict an athlete’s sentiment by using results based on prior

events. One ﬁnal improvement is simply recreating the analysis multiple times because

social media can be more subjective than factual, meaning that opinions are easily

changed. This could be represented as an athlete’s sentiment polarity being positive

one day but the next it changes to being negative due to a bad performance or some

other “negative” event. Although the Twitter API (Makice 2009) did present some

limitations, it was a valuable experience learning to use access real Twitter data and

exploring the capabilities that it had to oﬀer.

In conclusion, the answer to the main question that motivated this entire project

and analysis is that Michael Jordan has been conﬁrmed to be the GOAT over LeBron

James based on the

bing

sentiment polarities that were calculated using the tweet

datasets gathered by the Twitter API (Makice 2009).

Bibliography

Allaire, J. 2012. “RStudio: Integrated Development Environment for r.” Boston, MA

770: 394.

Bernard, H Russell, and Gery Ryan. 1998. “Text Analysis.” Handbook of Methods in

Cultural Anthropology 613.

Cavnar, William B, John M Trenkle, and others. 1994. “N-Gram-Based Text Cate-

gorization.” In Proceedings of SDAIR-94, 3rd Annual Symposium on Document

Analysis and Information Retrieval. Vol. 161175. Citeseer.

Chowdhury, Gobinda G. 2003. “Natural Language Processing.” Annual Review of

Information Science and Technology 37 (1): 51–89.

Ding, Xiaowen, Bing Liu, and Philip S Yu. 2008. “A Holistic Lexicon-Based Approach

to Opinion Mining.” In Proceedings of the 2008 International Conference on Web

Search and Data Mining, 231–40.

“Everything You Need to Know about ’The Last Dance’.” 2020. ESPN.com. https:

//www.espn.com/nba/story/

/id/28973557/the-last-dance-updates-untold-

story-michael-jordan-chicago-bulls.

Feldman, Ronen. 2013. “Techniques and Applications for Sentiment Analysis.” Com-

munications of the ACM 56 (4): 82–89.

Grishman, Ralph. 1986. Computational Linguistics: An Introduction. Cambridge

University Press.

Hand, David J, and Niall M Adams. 2014. “Data Mining.” Wiley StatsRef: Statistics

Reference Online, 1–7.

Heimerl, Florian, Steﬀen Lohmann, Simon Lange, and Thomas Ertl. 2014. “Word

Cloud Explorer: Text Analytics Based on Word Clouds.” In 2014 47th Hawaii

International Conference on System Sciences, 1833–42. IEEE.

Jain, Anil K, Patrick Flynn, and Arun A Ross. 2007. Handbook of Biometrics.

Springer Science & Business Media.

Li, Yunyao, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan,

and HV Jagadish. 2008. “Regular Expression Learning for Information Extraction.”

In Proceedings of the 2008 Conference on Empirical Methods in Natural Language

Processing, 21–30.

Makice, Kevin. 2009. Twitter API: Up and Running: Learn How to Build Applications

with the Twitter API. " O’Reilly Media, Inc.".

Martin, Sven, Jörg Liermann, and Hermann Ney. 1998. “Algorithms for Bigram and

Trigram Word Clustering.” Speech Communication 24 (1): 19–37.

Tromp, Erik, and Mykola Pechenizkiy. 2014. “Rule-Based Emotion Detection on Social

Media: Putting Tweets on Plutchik’s Wheel.” arXiv Preprint arXiv:1412.4682.

Zuo, Yuan, Jichang Zhao, and Ke Xu. 2016. “Word Network Topic Model: A Simple

but General Solution for Short and Imbalanced Texts.” Knowledge and Information

Systems 48 (2): 379–98.

Appendix

Code that was applied to Michael Jordan’s tweet data but not included in the paper

can be found here. As far as the other athletes involved, their results are replicated

using the same code but utilize their dataset’s name in place of the mj seen below.

Libraries

# Set-up

library(rtweet)

library(ggmap)

library(igraph)

library(ggraph)

library(tidytext)

library(ggplot2)

library(dplyr)

library(readr)

library(magrittr)

library(wordcloud)

library(widyr)

library(tidyr)

library(utils)

library(wordcloud)

library(purrr)

library(stringr)

library(knitr)

library(grid)

library(ggpubr)

library(tinytex)

Code

## Michael Jordan

### Code Collection

mj_goat_tw <-

search_tweets(

q = "Michael Jordan GOAT OR Michael Jordan Goat",

n = 300,

include_rts = FALSE,

`-filter` = "replies",

lang = "en")

## Michael Jordan

### Exporting tweets to CSV

write_as_csv(mj_goat_tw, "mj_goat_tw.csv")

## Michael Jordan

### Most Liked Tweets

mj_goat_tw %>%

arrange(-favorite_count) %>%

top_n(3, favorite_count) %>%

select(created_at, screen_name, favorite_count)

### Arrow within bigram/trigram networks

a <- arrow(length = unit(.075, "inches"), type = "closed")

## Michael Jordan

### Trigram plot

mj_goat_tw_trigram_counts %>%

filter(n >= 3) %>%

graph_from_data_frame() %>%

ggraph(layout = "fr") +

geom_edge_link(aes(edge_alpha = n, edge_width = n), arrow = a) +

geom_node_point(color = "red", size = 3) +

geom_node_text(aes(label = name), vjust = 1.8, size = 3) +

labs(title = "Trigram Word Network",

subtitle = "Tweets using Michael Jordan GOAT Keyword",

x = "", y = "")

## NBA

### Arranging bigram/trigram plots

ggarrange(lebron_bigram_plot, lebron_trigram_plot, ncol = 2)

ggarrange(harden_bigram_plot, harden_trigram_plot, ncol = 2)

ggarrange(kd_bigram_plot, kd_trigram_plot, ncol = 2)

ggarrange(kobe_bigram_plot, kobe_trigram_plot, ncol = 2)

## Michael Jordan

### Bing Lexicon Counts/Histogram

mj_goat_tw_bing_word_counts <- mj_goat_tw_clean_02 %>%

inner_join(get_sentiments("bing")) %>%

count(word, sentiment, sort = TRUE) %>%

ungroup()

mj_goat_tw_bing_word_counts %>%

group_by(sentiment) %>%

top_n(10) %>%

ungroup() %>%

mutate(word = reorder(word, n)) %>%

ggplot(aes(word, n, fill = sentiment)) +

geom_col(show.legend = FALSE) +

facet_wrap(~sentiment, scales = "free_y") +

labs(title = "Sentiment of Michael Jordan Goat Tweets

using Bing Lexicon",

y = "Word Frequency",

x = NULL) +

coord_flip()

## Michael Jordan

### AFINN Lexicon Counts/Histogram

mj_goat_tw_afinn_word_counts <- mj_goat_tw_clean_02 %>%

inner_join(get_sentiments("afinn")) %>%

count(word, value, sort = TRUE) %>%

ungroup()

mj_goat_tw_afinn_word_counts %>%

group_by(value) %>%

top_n(4) %>%

ungroup() %>%

mutate(word = reorder(word, n)) %>%

ggplot(aes(word, n, fill = value)) +

geom_col(show.legend = FALSE) +

facet_wrap(~value, scales = "free_y") +

labs(title = "Sentiment of Michael Jordan Goat Tweets

using AFINN Lexicon",

y = "Word Frequency",

x = NULL) +

coord_flip()

## Michael Jordan

### Loughran Lexicon Counts/Histogram

mj_goat_tw_loughran_word_counts <- mj_goat_tw_clean_02 %>%

inner_join(get_sentiments("loughran")) %>%

count(word, sentiment, sort = TRUE) %>%

ungroup()

mj_goat_tw_loughran_word_counts %>%

group_by(sentiment) %>%

top_n(5) %>%

ungroup() %>%

mutate(word = reorder(word, n)) %>%

ggplot(aes(word, n, fill = sentiment)) +

geom_col(show.legend = FALSE) +

facet_wrap(~sentiment, scales = "free_y") +

labs(title = "Sentiment of Michael Jordan Goat Tweets

using Loughran Lexicon",

y = "Word Frequency",

x = NULL) +

coord_flip()

## Michael Jordan

### NRC Lexicon Counts/Histogram

mj_goat_tw_nrc_word_counts <- mj_goat_tw_clean_02 %>%

inner_join(get_sentiments("nrc")) %>%

count(word, sentiment, sort = TRUE) %>%

ungroup()

mj_goat_tw_nrc_word_counts %>%

group_by(sentiment) %>%

top_n(5) %>%

ungroup() %>%

mutate(word = reorder(word, n)) %>%

ggplot(aes(word, n, fill = sentiment)) +

geom_col(show.legend = FALSE) +

facet_wrap(~sentiment, scales = "free_y") +

labs(title = "Sentiment of Michael Jordan Goat Tweets

using NRC Lexicon",

y = "Word Frequency",

x = NULL) +

coord_flip()

### Bing Sentiment Score Function

sentiment_bing_score <- function(twt) {

#Step 1: Perform text cleaning (on tweet)

twt_tbl = tibble(text = twt) %>%

mutate(

#Remove http elements manually

stripped_text = gsub("http\\S+", "", text)

) %>%

unnest_tokens(word, stripped_text) %>%

anti_join(stop_words) %>% # remove stop words

inner_join(get_sentiments("bing")) %>% # merge with bing sentiment

count(word, sentiment, sort = TRUE) %>%

ungroup() %>%

# Create a column "score" that assigns a -1 to all negative words

# and 1 to all positive words

mutate(

score = case_when(

sentiment == 'negative'~ n*(-1),

sentiment == 'positive'~ n*(1)

)

## Calculate the total score

sentiment.score = case_when(

nrow(twt_tbl) == 0~0, # if there are no words, score is 0

nrow(twt_tbl) > 0~sum(twt_tbl$score) # sum the positive&negatives

)

# This is to keep track of which tweets

# contained no words at all from the bing list

zero.type = case_when(

nrow(twt_tbl) == 0~"Type 1", #Type 1: no words at all, zero = no

nrow(twt_tbl) == 0~"Type 2" #Type 2: zero means sum of words = 0

)

list(score = sentiment.score, type = zero.type, twt_tbl = twt_tbl)

}

## Michael Jordan

### Applying Bing Sentiment Score Function

mj_goat_tw_sent_score_bing <-

lapply(mj_goat_tw$text,

function(x){sentiment_bing_score(x)})

mj_goat_tw_sent_score_bing2 <- rbind(

tibble(

Name = "Michael Jordan",

Score = unlist(map(mj_goat_tw_sent_score_bing, "score")),

Type = unlist(map(mj_goat_tw_sent_score_bing, "type"))

)

## NBA

### Bing Sentiment

nba_goat_tw_sent_score_bing <- rbind(

tibble(

Name = "Michael Jordan",

Score = unlist(map(mj_goat_tw_sent_score_bing, "score")),

Type = unlist(map(mj_goat_tw_sent_score_bing, "type"))

tibble(

Name = "LeBron James",

Score = unlist(map(lebron_goat_tw_sent_score_bing, "score")),

Type = unlist(map(lebron_goat_tw_sent_score_bing, "type"))

tibble(

Name = "James Harden",

Score = unlist(map(harden_goat_tw_sent_score_bing, "score")),

Type = unlist(map(harden_goat_tw_sent_score_bing, "type"))

tibble(

Name = "Kevin Durant",

Score = unlist(map(kd_goat_tw_sent_score_bing, "score")),

Type = unlist(map(kd_goat_tw_sent_score_bing, "type"))

tibble(

Name = "Kobe Bryant",

Score = unlist(map(kobe_goat_tw_sent_score_bing, "score")),

Type = unlist(map(kobe_goat_tw_sent_score_bing, "type"))

)

## NBA

### Bing Sentiment of Frequency Plots

nba_freq_sent_score_bing <- rbind(

tibble(

Name = "Michael Jordan",

Score = unlist(map(mj_freq_sent_score_bing, "score")),

Type = unlist(map(mj_freq_sent_score_bing, "type"))

tibble(

Name = "LeBron James",

Score = unlist(map(lebron_freq_sent_score_bing, "score")),

Type = unlist(map(lebron_freq_sent_score_bing, "type"))

tibble(

Name = "James Harden",

Score = unlist(map(harden_freq_sent_score_bing, "score")),

Type = unlist(map(harden_freq_sent_score_bing, "type"))

tibble(

Name = "Kevin Durant",

Score = unlist(map(kd_freq_sent_score_bing, "score")),

Type = unlist(map(kd_freq_sent_score_bing, "type"))

tibble(

Name = "Kobe Bryant",

Score = unlist(map(kobe_freq_sent_score_bing, "score")),

Type = unlist(map(kobe_freq_sent_score_bing, "type"))

)