How to extract noun and adjective pairs including conjunctions
You may wish to try noun_chunks
:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('I got a red candy and an interesting and big book.')
noun_adj_pairs = {}
for chunk in doc.noun_chunks:
adj = []
noun = ""
for tok in chunk:
if tok.pos_ == "NOUN":
noun = tok.text
if tok.pos_ == "ADJ":
adj.append(tok.text)
if noun:
noun_adj_pairs.update({noun:adj})
# expected output
noun_adj_pairs
{'candy': ['red'], 'book': ['interesting', 'big']}
Should you wish to include conjunctions:
noun_adj_pairs = {}
for chunk in doc.noun_chunks:
adj = []
noun = ""
for tok in chunk:
if tok.pos_ == "NOUN":
noun = tok.text
if tok.pos_ == "ADJ" or tok.pos_ == "CCONJ":
adj.append(tok.text)
if noun:
noun_adj_pairs.update({noun:" ".join(adj)})
noun_adj_pairs
{'candy': 'red', 'book': 'interesting and big'}
How to extract all possible noun phrases from text
You may wish to make use of noun_chunks
attribute:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('Where a shoulder of richer mix is required at these junctions, or at junctions of columns and beams, the items are so described.')
phrases = set()
for nc in doc.noun_chunks:
phrases.add(nc.text)
phrases.add(doc[nc.root.left_edge.i:nc.root.right_edge.i+1].text)
print(phrases)
{'junctions of columns and beams', 'junctions', 'the items', 'a shoulder', 'columns', 'richer mix', 'beams', 'columns and beams', 'a shoulder of richer mix', 'these junctions'}
Extracting noun+noun or (adj|noun)+noun from Text
It is possible.
EDIT:
You got it. Use the POS tagger and split on spaces: ll <- strsplit(acqTag,' '). From there iterate on the length of the input list (length of ll) like:
for (i in 1:37){qq <-strsplit(ll[[1]][i],'/')} and get the part of speech sequence you're looking for.
After splitting on spaces it is just list processing in R.
is there a method to extract noun-adjectives pair from sentence in french?
I wrote something by using stanza
for high quality dependency parsing. It should not be a lot of work to convert this to spaCy if you need that specifically. Recursion is needed if you need to find embedded structures. Note that this specifically works for such constructions where an adjective is the parent of the subject that you are interested in and not for adjectival positions. E.g., this will not find adjectives like La belle voiture.
import stanza
nlp = stanza.Pipeline("fr")
doc = nlp("La voiture est belle et jolie, et grand. Le tableau qui est juste en dessous est grand. La femme intelligente et belle est grande. Le service est rapide et les plats sont délicieux.")
def recursive_find_adjs(root, sent):
children = [w for w in sent.words if w.head == root.id]
if not children:
return []
filtered_c = [w for w in children if w.deprel == "conj" and w.upos == "ADJ"]
# Do not include an adjective if it is the parent of a noun to prevent
results = [w for w in filtered_c if not any(sub.head == w.id and sub.upos == "NOUN" for sub in sent.words)]
for w in children:
results += recursive_find_adjs(w, sent)
return results
for sent in doc.sentences:
nouns = [w for w in sent.words if w.upos == "NOUN"]
noun_adj_pairs = {}
for noun in nouns:
# Find constructions in the form of "La voiture est belle"
# In this scenario, the adjective is the parent of the noun
cop_root = sent.words[noun.head-1]
adjs = [cop_root] + recursive_find_adjs(cop_root, sent) if cop_root.upos == "ADJ" else []
# Find constructions in the form of "La femme intelligente et belle"
# Here, the adjectives are descendants of the noun
mod_adjs = [w for w in sent.words if w.head == noun.id and w.upos == "ADJ"]
# This should only be one element because conjunctions are hierarchical
if mod_adjs:
mod_adj = mod_adjs[0]
adjs.extend([mod_adj] + recursive_find_adjs(mod_adj, sent))
if adjs:
unique_adjs = []
unique_ids = set()
for adj in adjs:
if adj.id not in unique_ids:
unique_adjs.append(adj)
unique_ids.add(adj.id)
noun_adj_pairs[noun.text] = " ".join([adj.text for adj in unique_adjs])
print(noun_adj_pairs)
This will output:
{'voiture': 'belle jolie grand'}
{'tableau': 'grand'}
{'femme': 'grande belle intelligente'}
{'service': 'rapide', 'plats': 'délicieux'}
How to extract noun adjective pairs from a sentence
Spacy's POS tagging would be a better than NLTK
. It's faster and better. Here is an example of what you want to do
import spacy
nlp = spacy.load('en')
doc = nlp(u'Mark and John are sincere employees at Google.')
noun_adj_pairs = []
for i,token in enumerate(doc):
if token.pos_ not in ('NOUN','PROPN'):
continue
for j in range(i+1,len(doc)):
if doc[j].pos_ == 'ADJ':
noun_adj_pairs.append((token,doc[j]))
break
noun_adj_pairs
output
[(Mark, sincere), (John, sincere)
]
Related Topics
Using Parlapply and Clusterexport Inside a Function
How to Separately Control the X and Y Axes Using Ggplot
Combining Elements of List of Lists by Index
R Dpylr Select_If with Multiple Conditions
Plot Circle with a Certain Radius Around Point on a Map in Ggplot2
R Ggplot2: Legend Should Be Discrete and Not Continuous
Save Object Using Variable with Object Name
How to Draw Two Half Circles in Ggplot in R
Extracting Noun+Noun or (Adj|Noun)+Noun from Text
Counting Unique Items in Data Frame
How to Use a MACro Variable in R? (Similar to %Let in Sas)
Read.Table() and Read.CSV Both Error in Rmd
Date Time Conversion and Extract Only Time
Merge/Combine Columns with Same Name But Incomplete Data
Is Data Really Copied Four Times in R's Replacement Functions
Cast Function Argument as a Character String
Override Column Types When Importing Data Using Readr::Read_Csv() When There Are Many Columns