Extracting Nouns and Verbs from Text

Extracting all Nouns from a text file using nltk

If you are open to options other than NLTK, check out TextBlob. It extracts all nouns and noun phrases easily:

>>> from textblob import TextBlob
>>> txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
>>> blob = TextBlob(txt)
>>> print(blob.noun_phrases)
[u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']

Extracting most common nouns and verbs from category using numpy and NLTK

Thanks, this is what I ended up with which served my purpose. Thanks for your help

ByTripType = text_reviews.groupby("Trip Type")

def findtags(tag_prefix, tagged_text):
cfd = nltk.ConditionalFreqDist((tag, word) for (word, tag) in tagged_text if tag.startswith(tag_prefix))
return dict((tag, cfd[tag].most_common(10)) for tag in cfd.conditions())

for name, group in ByTripType:
sentences = group['text'].str.cat(sep = ' ')
sentences = sentences.lower()
remove_punctuation(sentences)
sentences = '"' + sentences + '"'
text = word_tokenize(sentences)
sentences = nltk.pos_tag(text)
for i in ('NN', 'VBP'):
tagdict = findtags(i, sentences)
print(name, tagdict)

extract nouns and verbs using NLTK

Use nltk pos-tagger

>>> import nltk
>>> text = nltk.word_tokenize("They refuse to permit us to obtain the refuse permit")
>>> pos_tagged = nltk.pos_tag(text)
>>> pos_tagged
[('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'),
('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]
>>> nouns = filter(lambda x:x[1]=='NN',pos_tagged)
>>> nouns
[('refuse', 'NN'), ('permit', 'NN')]

Nouns are marked by NN and verbs are by VB, so you can use them accordingly.

NOTE:
If you have not setup/downloaded punkt and averaged_perceptron_tagger with nltk, you might have to do that using:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Extract verbs from sentence in R?

you can get it by using udpipe_annotate function from udpipe library:

library(udpipe)
ud_model <- udpipe_download_model(language = "english")
ud_model <- udpipe_load_model(ud_model$file_model)
system.time(
x <- udpipe_annotate(ud_model, x = df$recipe_name, doc_id = df$id)
)
x <- as.data.frame(x)
abc <- c("NN","VB")
stats <- dplyr::filter(x,grepl(pattern = paste(abc, collapse = "|"), x = xpos, ignore.case = T))

you can also use list of word types from this list.

Extracting Nouns and Verbs from Text

Using an example: (this is to extract words tagged as /VBx, where x is any single character)

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter."

acqTag <- tagPOS(acq)

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) sub("(^.*\\s)(\\w+$)", "\\2", x))

[,1]
[1,] "said"
[2,] "sold"
[3,] "engaged"
[4,] "said"
[5,] "is"
[6,] "did"
[7,] " not/RB explain./NN Reuter./."

Ok, my regular expression needs some improvement in order to get rid of the last line in the result.

EDIT

An alternative could be to ignore rows containing a space character

sapply(strsplit(acqTag,"[[:punct:]]*/VB.?"),function(x) {res = sub("(^.*\\s)(\\w+$)", "\\2", x); res[!grepl("\\s",res)]} )


Related Topics



Leave a reply



Submit