Could Not Find Function Tagpos

could not find function tagPOS

I think tagPOS is not a built in function of any of the package, so you'll have to add the function.

Here is the R Code:

library(NLP)
library(openNLP)

tagPOS <-  function(x, ...) {
  s <- as.String(x)
  word_token_annotator <- Maxent_Word_Token_Annotator()
  a2 <- Annotation(1L, "sentence", 1L, nchar(s))
  a2 <- annotate(s, word_token_annotator, a2)
  a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
  a3w <- a3[a3$type == "word"]
  POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
  POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
  list(POStagged = POStagged, POStags = POStags)
}

str <- "this is a the first sentence."
tagged_str <-  tagPOS(str)

Output:

> tagged_str
$POStagged
[1] "this/DT is/VBZ a/DT the/DT first/JJ sentence/NN ./."

$POStags
[1] "DT"  "VBZ" "DT"  "DT"  "JJ"  "NN"  "."

Hope this helps.

How to use OpenNLP to get POS tags in R?

There might be more elegant ways to obtain the result, but this one should work:

q <- strsplit(unlist(tagged_str[1]),'/NN')
q <- tail(strsplit(unlist(q[1])," ")[[1]],1)
#> q
#[1] "sentence"

Hope this helps.

parallel parLapply setup

Since you're calling functions from NLP on the cluster workers, you should load it on each of the workers before calling parLapply. You can do that from the worker function, but I tend to use clusterCall or clusterEvalQ right after creating the cluster object:

clusterEvalQ(cl, {library(openNLP); library(NLP)})

Since as.String and Maxent_Word_Token_Annotator are in those packages, they shouldn't be exported.

Note that while running your example on my machine, I noticed that the PTA object doesn't work after being exported to the worker machines. Presumably there is something in that object that can't be safely serialized and unserialized. After I created that object on the workers using clusterEvalQ, the example ran successfully. Here it is, using openNLP 0.2-1:

library(parallel)
tagPOS <-  function(x, ...) {
    s <- as.String(x)
    word_token_annotator <- Maxent_Word_Token_Annotator()
    a2 <- Annotation(1L, "sentence", 1L, nchar(s))
    a2 <- annotate(s, word_token_annotator, a2)
    a3 <- annotate(s, PTA, a2)
    a3w <- a3[a3$type == "word"]
    POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
    POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
    list(POStagged = POStagged, POStags = POStags)
}
text.var <- c("I like it.", "This is outstanding soup!",
    "I really must get the recipe.")
cl <- makeCluster(mc <- getOption("cl.cores", detectCores()/2))
clusterEvalQ(cl, {
    library(openNLP)
    library(NLP)
    PTA <- Maxent_POS_Tag_Annotator()
})
m <- parLapply(cl, text.var, tagPOS)
print(m)
stopCluster(cl)

If clusterEvalQ fails because Maxent_POS_Tag_Annotator is not found, you might be loading the wrong version of openNLP on the workers. You can determine what package versions you're getting on the workers by executing sessionInfo with clusterEvalQ:

library(parallel)
cl <- makeCluster(2)
clusterEvalQ(cl, {library(openNLP); library(NLP)})
clusterEvalQ(cl, sessionInfo())

This will return the results of executing sessionInfo() on each of the cluster workers. Here is the version information for some of the packages that I'm using and that work for me:

other attached packages:
[1] NLP_0.1-0     openNLP_0.2-1

loaded via a namespace (and not attached):
[1] openNLPdata_1.5.3-1 rJava_0.9-4

Count number of verbs for each speech in data frame R

Assuming you are using function similar to this one (found here: could not find function tagPOS):

tagPOS <-  function(x, ...) {
  s <- as.String(x)
  word_token_annotator <- Maxent_Word_Token_Annotator()
  a2 <- Annotation(1L, "sentence", 1L, nchar(s))
  a2 <- annotate(s, word_token_annotator, a2)
  a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
  a3w <- a3[a3$type == "word"]
  POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
  POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
  list(POStagged = POStagged, POStags = POStags)
}

Create a function that counts the number of POS tags that contain the letters 'VB'

count_verbs <-function(x) {
  pos_tags <- tagPOS(x)$POStags
  sum(grepl("VB", pos_tags))
  }

And use dplyr to group by Group and summarise using count_verbs():

library(dplyr)
data %>% 
  group_by(Group) %>%
  summarise(num_verbs = count_verbs(Description))

Extract Pronouns from text in R

What, exactly, do you want as the output? This seems to give what I think you want:

library("stringr")

prp <- str_extract_all(acqTag$POStagged,"\\w+/PRP\\$?")
str_replace(unlist(prp), "/PRP\\$?", "")
#[1] "my" "He"

Running out of Memory with POS Tagging in R

Give this a try:

options(java.parameters = "- Xmx3000m")
library(rJava)
library(NLP)
library(openNLP)
library(data.table)
tagPOS <-  function(x) {
  s <- as.String(x)
  sent_token_annotator = Maxent_Sent_Token_Annotator()
  word_token_annotator = Maxent_Word_Token_Annotator()
  a2 = annotate(s, list(sent_token_annotator, word_token_annotator))
  pos_tag_annotator = Maxent_POS_Tag_Annotator()
  a3 = annotate(s, pos_tag_annotator, a2)
  a3w = subset(a3, type == "word")
  POStags = unlist(lapply(a3w$features, `[[`, "POS"))
  gc()
  return(paste(POStags,collapse = " "))
}
dat <- data.table(Tweet = rep("This is a tweet.", 10000L))
dat[,c("ID"):= .I]
dat[,c("POS"):= tagPOS(Tweet),by = .(ID)]

Could Not Find Function Tagpos