Is There Any Other Package Other Than "Sentiment" to Do Sentiment Analysis in R

Sentiment analysis using R

And there is this package:

sentiment: Tools for Sentiment Analysis

sentiment is an R package with tools for sentiment analysis including bayesian classifiers for positivity/negativity and emotion classification.

Update 14 Dec 2012: it has been removed to the archive...

Update 15 Mar 2013: the qdap package has a polarity function, based on Jeffery Breen's work

Confused with sentiment package in R?

Thanks rawr. I found this helpful.

>library(qdap)
> polarity("Not Good")
  all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
  1 all               1           2       -0.707          NA                 NA
> polarity("It's cool but not great")
  all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
  1 all               1           5       -0.894          NA                 NA
> polarity("It's awesome")
  all total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
  1 all               1           2        0.707          NA                 NA

sentiment analysis with R

Here's an example:

df <- read.table(header=TRUE, text="word            score   
laughter        8.50    
happiness       8.44    
love            8.42    
happy           8.30    
laughed         8.26    
laugh           8.22")
sentence <- "I love happiness"

words <- strsplit(sentence, "\\s+")[[1]]
score <- sum(df$score[match(words, df$word)], na.rm = TRUE)

print(score)
# [1] 16.86

Dutch sentiment analysis using R

Sentiment analysis (using a dictionary) is basically just a pattern matching task. I think this becomes clear when using the tidytext package and reading the book about it.

So I wouldn't bother with such a complex setup here. Instead, I would convert the dictionary they are using (which is from here) into a data.frame and then use tidytext. Unfortunately, the dictionary is stored in XML format and I'm not very familiar with that, so the code looks a little hacky:

library(tidyverse)
library(xml2)
library(tidytext)

sentiment_nl <- read_xml(
  "https://raw.githubusercontent.com/clips/pattern/master/pattern/text/nl/nl-sentiment.xml"
) %>% 
  as_list() %>% 
  .[[1]] %>% 
  map_df(function(x) {
    tibble::enframe(attributes(x))
  }) %>% 
  mutate(id = cumsum(str_detect("form", name)))  %>% 
  unnest(value) %>% 
  pivot_wider(id_cols = id) %>% 
  mutate(form = tolower(form), # lowercase all words to ignore case during matching
         polarity = as.numeric(polarity),
         subjectivity = as.numeric(subjectivity),
         intensity = as.numeric(intensity),
         confidence = as.numeric(confidence))

But the output is correct for the purpose:

head(sentiment_nl)
#> # A tibble: 6 x 11
#>      id form  cornetto_id cornetto_synset… wordnet_id pos   sense polarity
#>   <int> <chr> <chr>       <chr>            <chr>      <chr> <chr>    <dbl>
#> 1     1 amst… r_a-16677   ""               ""         JJ    van …      0  
#> 2     2 ange… r_a-8929    ""               ""         JJ    Enge…      0.1
#> 3     3 arab… r_a-16693   ""               ""         JJ    van …      0  
#> 4     4 arde… r_a-17252   ""               ""         JJ    van …      0  
#> 5     5 arnh… r_a-16698   ""               ""         JJ    van …      0  
#> 6     6 asse… r_a-16700   ""               ""         JJ    van …      0  
#> # … with 3 more variables: subjectivity <dbl>, intensity <dbl>,
#> #   confidence <dbl>

Now we can use the functions from tidytext and the broader tidyverse to lookup the words in the dictionary and attach the score to each word. summarise() is used to get exactly one value per text (that's also why you need the text_id).

df <- data.frame(text = c("Het eten was heerlijk en de bediening was fantastisch", 
                          "Verschrikkelijk. Ik had een vlieg in mijn soep", 
                          "Het was oké. De bediening kon wat beter, maar het eten was wel lekker. Leuk sfeertje wel!",
                          "Ondanks dat het druk was toch op tijd ons eten gekregen. Complimenten aan de kok voor het op smaak brengen van mijn biefstuk"))

df %>% 
  mutate(text_id = row_number()) %>% 
  unnest_tokens(output = word, input = text, drop = FALSE) %>% 
  inner_join(sentiment_nl, by = c("word" = "form")) %>%
  group_by(text_id) %>% 
  summarise(text = head(text, 1),
            polarity = mean(polarity),
            subjectivity = mean(subjectivity),
            .groups = "drop")
#> # A tibble: 4 x 4
#>   text_id text                                             polarity subjectivity
#>     <int> <chr>                                               <dbl>        <dbl>
#> 1       1 Het eten was heerlijk en de bediening was fanta…    0.56         0.72 
#> 2       2 Verschrikkelijk. Ik had een vlieg in mijn soep     -0.5          0.9  
#> 3       3 Het was oké. De bediening kon wat beter, maar h…    0.6          0.98 
#> 4       4 Ondanks dat het druk was toch op tijd ons eten …   -0.233        0.767

As I said, more on this (and NLP) is explained on tidytextmining.com, so don't worry if this looks complicated to you now.

Emoji Sentiment Analysis in R

Check this discussion: VaderSentiment: unable to update emoji sentiment score

"Vader transforms emojis to their word representation prior to extracting sentiment"

Basically from what I tested out emoji's values are hidden but part of the score and can influence it. If you need the score for a specific emoji you can check library(lexicon) and run data.frame(hash_emojis_identifier) (dataframe that contains identifiers for emojis and matches them to a lexicon format) and data.frame(hash_sentiment_emojis) to get each emoji sentiment value. It is not possible though to determine from that what was the impact of a series of emojis over the total message score without knowing how vader calculates their cumulative impact on the score itself using libraries such as vader, lexicon.

You can evaluate the impact of the emoji though by doing a simple difference between the total score value of the message with emojis and the score without it:

allvals <- NULL
for (i in 1:length(data_sample)){
outs <-  vader_df(data_sample[i])
allvals <- rbind(allvals,outs)
}
allvalswithout <- NULL
for (i in 1:length(data_samplewithout)){
outs <-  vader_df(data_samplewithout[i])
allvalswithout <- rbind(allvalswithout,outs)
}

emojiscore <- allvals$compound-allvalswithout$compound

Then:

allvals <- cbind(allvals,emojiscore)

Now for large datasets it would be ideal to automate the process of removing emojis out of texts. Here i just removed it manually to propose this kind of approach to the problem.