Twitter Sentiment Analysis w R using german language set SentiWS
This may work for you:
readAndflattenSentiWS <- function(filename) {
words = readLines(filename, encoding="UTF-8")
words <- sub("\\|[A-Z]+\t[0-9.-]+\t?", ",", words)
words <- unlist(strsplit(words, ","))
words <- tolower(words)
return(words)
}
pos.words <- c(scan("positive-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("SentiWS_v1.8c_Positive.txt"))
neg.words <- c(scan("negative-words.txt",what='character', comment.char=';', quiet=T),
readAndflattenSentiWS("SentiWS_v1.8c_Negative.txt"))
score.sentiment = function(sentences, pos.words, neg.words, .progress='none') {
# ... see OP ...
}
sample <- c("ich liebe dich. du bist wunderbar",
"Ich hasse dich, geh sterben!",
"i love you. you are wonderful.",
"i hate you, die.")
(test.sample <- score.sentiment(sample,
pos.words,
neg.words))
# score text
# 1 2 ich liebe dich. du bist wunderbar
# 2 -2 ich hasse dich, geh sterben!
# 3 2 i love you. you are wonderful.
# 4 -2 i hate you, die.
Sentiment analysis of non-English texts
As Andy has pointed about above, the best approach would be to train your own classifier. Another, more quick and dirty approach would be to use a German sentiment lexicon such as the SentiWS, and compute the polarity of a sentence simply on the basis of the polarity values of its individual words (for example by summing them). This method isn't foolproof (it doesn't take negation into account, for example), but it would give reasonable results relatively quickly.
Related Topics
Error in Bind_Rows_(X, .Id):Column Can't Be Converted from Factor to Numeric
How to Control the Canvas Size in Ggplot
How to Run a Function Every Second
Specify Function Parameters in Do.Call
Labelling the Plots with Images on Graph in Ggplot2
What Is the "Embracing Operator" '{{ }}'
As.Posixct Gives an Unexpected Timezone
R: Pass a List of Filtering Conditions into a Dataframe
Error Connecting to Azure Blob Storage API from R
How to Take a Rolling Product Using Data.Table
Select Multiple Columns with Dplyr::Select() with Numbers as Names
Ggplot2 2.1.0 Broke My Code? Secondary Transformed Axis Now Appears Incorrectly
How Does Settimelimit Work in R