could not find function tagPOS
I think tagPOS
is not a built in function of any of the package, so you'll have to add the function.
Here is the R Code:
library(NLP)
library(openNLP)
tagPOS <- function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)
}
str <- "this is a the first sentence."
tagged_str <- tagPOS(str)
Output:
> tagged_str
$POStagged
[1] "this/DT is/VBZ a/DT the/DT first/JJ sentence/NN ./."
$POStags
[1] "DT" "VBZ" "DT" "DT" "JJ" "NN" "."
Hope this helps.
How to use OpenNLP to get POS tags in R?
There might be more elegant ways to obtain the result, but this one should work:
q <- strsplit(unlist(tagged_str[1]),'/NN')
q <- tail(strsplit(unlist(q[1])," ")[[1]],1)
#> q
#[1] "sentence"
Hope this helps.
parallel parLapply setup
Since you're calling functions from NLP
on the cluster workers, you should load it on each of the workers before calling parLapply
. You can do that from the worker function, but I tend to use clusterCall
or clusterEvalQ
right after creating the cluster object:
clusterEvalQ(cl, {library(openNLP); library(NLP)})
Since as.String
and Maxent_Word_Token_Annotator
are in those packages, they shouldn't be exported.
Note that while running your example on my machine, I noticed that the PTA
object doesn't work after being exported to the worker machines. Presumably there is something in that object that can't be safely serialized and unserialized. After I created that object on the workers using clusterEvalQ
, the example ran successfully. Here it is, using openNLP 0.2-1:
library(parallel)
tagPOS <- function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, PTA, a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)
}
text.var <- c("I like it.", "This is outstanding soup!",
"I really must get the recipe.")
cl <- makeCluster(mc <- getOption("cl.cores", detectCores()/2))
clusterEvalQ(cl, {
library(openNLP)
library(NLP)
PTA <- Maxent_POS_Tag_Annotator()
})
m <- parLapply(cl, text.var, tagPOS)
print(m)
stopCluster(cl)
If clusterEvalQ
fails because Maxent_POS_Tag_Annotator is not found, you might be loading the wrong version of openNLP on the workers. You can determine what package versions you're getting on the workers by executing sessionInfo
with clusterEvalQ
:
library(parallel)
cl <- makeCluster(2)
clusterEvalQ(cl, {library(openNLP); library(NLP)})
clusterEvalQ(cl, sessionInfo())
This will return the results of executing sessionInfo()
on each of the cluster workers. Here is the version information for some of the packages that I'm using and that work for me:
other attached packages:
[1] NLP_0.1-0 openNLP_0.2-1
loaded via a namespace (and not attached):
[1] openNLPdata_1.5.3-1 rJava_0.9-4
Count number of verbs for each speech in data frame R
Assuming you are using function similar to this one (found here: could not find function tagPOS):
tagPOS <- function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)
}
Create a function that counts the number of POS tags that contain the letters 'VB'
count_verbs <-function(x) {
pos_tags <- tagPOS(x)$POStags
sum(grepl("VB", pos_tags))
}
And use dplyr
to group by Group
and summarise using count_verbs()
:
library(dplyr)
data %>%
group_by(Group) %>%
summarise(num_verbs = count_verbs(Description))
Extract Pronouns from text in R
What, exactly, do you want as the output? This seems to give what I think you want:
library("stringr")
prp <- str_extract_all(acqTag$POStagged,"\\w+/PRP\\$?")
str_replace(unlist(prp), "/PRP\\$?", "")
#[1] "my" "He"
Running out of Memory with POS Tagging in R
Give this a try:
options(java.parameters = "- Xmx3000m")
library(rJava)
library(NLP)
library(openNLP)
library(data.table)
tagPOS <- function(x) {
s <- as.String(x)
sent_token_annotator = Maxent_Sent_Token_Annotator()
word_token_annotator = Maxent_Word_Token_Annotator()
a2 = annotate(s, list(sent_token_annotator, word_token_annotator))
pos_tag_annotator = Maxent_POS_Tag_Annotator()
a3 = annotate(s, pos_tag_annotator, a2)
a3w = subset(a3, type == "word")
POStags = unlist(lapply(a3w$features, `[[`, "POS"))
gc()
return(paste(POStags,collapse = " "))
}
dat <- data.table(Tweet = rep("This is a tweet.", 10000L))
dat[,c("ID"):= .I]
dat[,c("POS"):= tagPOS(Tweet),by = .(ID)]
Related Topics
R Leaflet Offline Tiles Within Shiny
Ggplot2 Ggsave Function Causes Graphics Device to Not Display Plots
Calculate Percentages/Proportions of Values by Group Using Data.Table
Joining Two Data.Tables in R Based on Multiple Keys and Duplicate Entries
How to Use Custom Cross Validation Folds with Xgboost
R Dplyr Mutate, Calculating Standard Deviation for Each Row
Leaflet Not Rendering in Dynamically Generated R Markdown HTML Knitr
Tidyeval with List of Column Names in a Function
How to Extract Variable Names from a Netcdf File in R
R: Raster Mosaic from List of Rasters
Why Does Apt-Get Install R-Base Install 3.2.3 Instead of 3.4.0 in R
How to Perform Single Factor Anova in R with Samples Organized by Column
Initialize a List of Matrices in R
Importing Many Files at The Same Time and Adding Id Indicator