Extracting all Nouns from a text file using nltk
If you are open to options other than NLTK
, check out TextBlob
. It extracts all nouns and noun phrases easily:
>>> from textblob import TextBlob
>>> txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
>>> blob = TextBlob(txt)
>>> print(blob.noun_phrases)
[u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']
Extracting most common nouns and verbs from category using numpy and NLTK
Thanks, this is what I ended up with which served my purpose. Thanks for your help
ByTripType = text_reviews.groupby("Trip Type")
def findtags(tag_prefix, tagged_text):
cfd = nltk.ConditionalFreqDist((tag, word) for (word, tag) in tagged_text if tag.startswith(tag_prefix))
return dict((tag, cfd[tag].most_common(10)) for tag in cfd.conditions())
for name, group in ByTripType:
sentences = group['text'].str.cat(sep = ' ')
sentences = sentences.lower()
remove_punctuation(sentences)
sentences = '"' + sentences + '"'
text = word_tokenize(sentences)
sentences = nltk.pos_tag(text)
for i in ('NN', 'VBP'):
tagdict = findtags(i, sentences)
print(name, tagdict)
extract nouns and verbs using NLTK
Use nltk
pos-tagger
>>> import nltk
>>> text = nltk.word_tokenize("They refuse to permit us to obtain the refuse permit")
>>> pos_tagged = nltk.pos_tag(text)
>>> pos_tagged
[('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'),
('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]
>>> nouns = filter(lambda x:x[1]=='NN',pos_tagged)
>>> nouns
[('refuse', 'NN'), ('permit', 'NN')]
Nouns are marked by NN
and verbs are by VB
, so you can use them accordingly.
NOTE:
If you have not setup/downloaded punkt
and averaged_perceptron_tagger
with nltk, you might have to do that using:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
Extract verbs from sentence in R?
you can get it by using udpipe_annotate function from udpipe library:
library(udpipe)
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
system.time(
x <- udpipe_annotate(ud_model, x = df$recipe_name, doc_id = df$id)
)
x <- as.data.frame(x)
abc <- c("NN","VB")
stats <- dplyr::filter(x,grepl(pattern = paste(abc, collapse = "|"), x = xpos, ignore.case = T))
you can also use list of word types from this list.
Extracting Nouns and Verbs from Text
Using an example: (this is to extract words tagged as /VBx, where x is any single character)
library("openNLP")
acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."
acqTag <- tagPOS(acq)
sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))
[,1]
[1,] "said"
[2,] "sold"
[3,] "engaged"
[4,] "said"
[5,] "is"
[6,] "did"
[7,] " not/RB explain./NN Reuter./."
Ok, my regular expression needs some improvement in order to get rid of the last line in the result.
EDIT
An alternative could be to ignore rows containing a space
character
sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )
Related Topics
Reproduce a 'The Economist' Chart with Dual Axis
Retrieve Census Tract from Coordinates
Passing by Reference a Data.Frame and Updating It with Rcpp
How to Include Svg Image in PDF Document Rendered by Rmarkdown
Linear Interpolate Missing Values in Time Series
R: Data.Table Count !Na Per Row
Returning a Vector of Class Posixct with Vapply
Hover Image in Plotly R Chart in Shiny App
Change Paper Size and Orientation in an Rmarkdown PDF
R: Creating a Map of Selected Canadian Provinces and U.S. States
Plotting Survival Curves in R with Ggplot2
Use a Custom Icon in Plotly's Pie Chart in R
How to Make Scatterplot Points Open a Hyperlink Using Ggplotly - R
Labelling Logarithmic Scale Display in R
Xpath to Extract Text After Br Tags in R
Annual, Monthly or Daily Mean for Irregular Time Series