Sentiment analysis using R
And there is this package:
sentiment: Tools for Sentiment Analysis
sentiment is an R package with tools for sentiment analysis including bayesian classifiers for positivity/negativity and emotion classification.
Update 14 Dec 2012: it has been removed to the archive...
Update 15 Mar 2013: the qdap package has a polarity
function, based on Jeffery Breen's work
Confused with sentiment package in R?
Thanks rawr. I found this helpful.
>library(qdap)
> polarity("Not Good")
all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
1 all 1 2 -0.707 NA NA
> polarity("It's cool but not great")
all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
1 all 1 5 -0.894 NA NA
> polarity("It's awesome")
all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
1 all 1 2 0.707 NA NA
sentiment analysis with R
Here's an example:
df <- read.table(header=TRUE, text="word score
laughter 8.50
happiness 8.44
love 8.42
happy 8.30
laughed 8.26
laugh 8.22")
sentence <- "I love happiness"
words <- strsplit(sentence, "\\s+")[[1]]
score <- sum(df$score[match(words, df$word)], na.rm = TRUE)
print(score)
# [1] 16.86
Dutch sentiment analysis using R
Sentiment analysis (using a dictionary) is basically just a pattern matching task. I think this becomes clear when using the tidytext
package and reading the book about it.
So I wouldn't bother with such a complex setup here. Instead, I would convert the dictionary they are using (which is from here) into a data.frame
and then use tidytext
. Unfortunately, the dictionary is stored in XML format and I'm not very familiar with that, so the code looks a little hacky:
library(tidyverse)
library(xml2)
library(tidytext)
sentiment_nl <- read_xml(
"https://raw.githubusercontent.com/clips/pattern/master/pattern/text/nl/nl-sentiment.xml"
) %>%
as_list() %>%
.[[1]] %>%
map_df(function(x) {
tibble::enframe(attributes(x))
}) %>%
mutate(id = cumsum(str_detect("form", name))) %>%
unnest(value) %>%
pivot_wider(id_cols = id) %>%
mutate(form = tolower(form), # lowercase all words to ignore case during matching
polarity = as.numeric(polarity),
subjectivity = as.numeric(subjectivity),
intensity = as.numeric(intensity),
confidence = as.numeric(confidence))
But the output is correct for the purpose:
head(sentiment_nl)
#> # A tibble: 6 x 11
#> id form cornetto_id cornetto_synset… wordnet_id pos sense polarity
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 1 amst… r_a-16677 "" "" JJ van … 0
#> 2 2 ange… r_a-8929 "" "" JJ Enge… 0.1
#> 3 3 arab… r_a-16693 "" "" JJ van … 0
#> 4 4 arde… r_a-17252 "" "" JJ van … 0
#> 5 5 arnh… r_a-16698 "" "" JJ van … 0
#> 6 6 asse… r_a-16700 "" "" JJ van … 0
#> # … with 3 more variables: subjectivity <dbl>, intensity <dbl>,
#> # confidence <dbl>
Now we can use the functions from tidytext
and the broader tidyverse
to lookup the words in the dictionary and attach the score to each word. summarise()
is used to get exactly one value per text (that's also why you need the text_id
).
df <- data.frame(text = c("Het eten was heerlijk en de bediening was fantastisch",
"Verschrikkelijk. Ik had een vlieg in mijn soep",
"Het was oké. De bediening kon wat beter, maar het eten was wel lekker. Leuk sfeertje wel!",
"Ondanks dat het druk was toch op tijd ons eten gekregen. Complimenten aan de kok voor het op smaak brengen van mijn biefstuk"))
df %>%
mutate(text_id = row_number()) %>%
unnest_tokens(output = word, input = text, drop = FALSE) %>%
inner_join(sentiment_nl, by = c("word" = "form")) %>%
group_by(text_id) %>%
summarise(text = head(text, 1),
polarity = mean(polarity),
subjectivity = mean(subjectivity),
.groups = "drop")
#> # A tibble: 4 x 4
#> text_id text polarity subjectivity
#> <int> <chr> <dbl> <dbl>
#> 1 1 Het eten was heerlijk en de bediening was fanta… 0.56 0.72
#> 2 2 Verschrikkelijk. Ik had een vlieg in mijn soep -0.5 0.9
#> 3 3 Het was oké. De bediening kon wat beter, maar h… 0.6 0.98
#> 4 4 Ondanks dat het druk was toch op tijd ons eten … -0.233 0.767
As I said, more on this (and NLP) is explained on tidytextmining.com, so don't worry if this looks complicated to you now.
Emoji Sentiment Analysis in R
Check this discussion: VaderSentiment: unable to update emoji sentiment score
"Vader transforms emojis to their word representation prior to extracting sentiment"
Basically from what I tested out emoji's values are hidden but part of the score and can influence it. If you need the score for a specific emoji you can check library(lexicon)
and run data.frame(hash_emojis_identifier)
(dataframe that contains identifiers for emojis and matches them to a lexicon format) and data.frame(hash_sentiment_emojis)
to get each emoji sentiment value. It is not possible though to determine from that what was the impact of a series of emojis over the total message score without knowing how vader calculates their cumulative impact on the score itself using libraries such as vader, lexicon.
You can evaluate the impact of the emoji though by doing a simple difference between the total score value of the message with emojis and the score without it:
allvals <- NULL
for (i in 1:length(data_sample)){
outs <- vader_df(data_sample[i])
allvals <- rbind(allvals,outs)
}
allvalswithout <- NULL
for (i in 1:length(data_samplewithout)){
outs <- vader_df(data_samplewithout[i])
allvalswithout <- rbind(allvalswithout,outs)
}
emojiscore <- allvals$compound-allvalswithout$compound
Then:
allvals <- cbind(allvals,emojiscore)
Now for large datasets it would be ideal to automate the process of removing emojis out of texts. Here i just removed it manually to propose this kind of approach to the problem.
Related Topics
Multiple Graphs of Each Time Series
R: Multiple Linear Regression Model and Prediction Model
Create Top-To-Bottom Fade/Gradient Geom_Density in Ggplot2
Convert Data Frame into Vector
How to Calculate the 95% Confidence Interval for the Slope in a Linear Regression Model in R
Scatterplot with Alpha Transparent Histograms in R
The Difference Between Domc and Doparallel in R
Correlation Between Na Columns
Relationship Between R Markdown, Knitr, Pandoc, and Bookdown
Remove All Variables Except Functions
Simple Manual Rmarkdown Tables That Look Good in HTML, PDF and Docx