Convert Words Between Verb/Noun/Adjective Forms

Convert words between verb/noun/adjective forms

This is more a heuristic approach. I have just coded it so appologies for the style. It uses the derivationally_related_forms() from wordnet. I have implemented nounify. I guess verbify works analogous. From what I've tested works pretty well:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
""" Transform a verb to the closest noun: die -> death """
verb_synsets = wn.synsets(verb_word, pos="v")

# Word not found
if not verb_synsets:
return []

# Get all verb lemmas of the word
verb_lemmas = [l for s in verb_synsets \
for l in s.lemmas if s.name.split('.')[1] == 'v']

# Get related forms
derivationally_related_forms = [(l, l.derivationally_related_forms()) \
for l in verb_lemmas]

# filter only the nouns
related_noun_lemmas = [l for drf in derivationally_related_forms \
for l in drf[1] if l.synset.name.split('.')[1] == 'n']

# Extract the words from the lemmas
words = [l.name for l in related_noun_lemmas]
len_words = len(words)

# Build the result in the form of a list containing tuples (word, probability)
result = [(w, float(words.count(w))/len_words) for w in set(words)]
result.sort(key=lambda w: -w[1])

# return all the possibilities sorted by probability
return result

How to convert a verb to its (derived) noun form?

You can start exploring the following resources:

  1. Morphosemantic Database from WordNet.

The Excel file contains entries like:

define%2:42:03::    202736778   result  definition%1:07:00::    104702957   show the form or outline of; "The tree w... clarity of outline; "exercise had given ...

  1. CatVar

A Categorial-Variation Database (or Catvar) is a database of clusters of uninflected words (lexemes) and their categorial (i.e. part-of-speech) variants. For example, the words hunger(V), hunger(N), hungry(AJ) and hungriness(N) are different English variants of some underlying concept describing the state of being hungry. Another example is the developing cluster:(develop(V), developer(N), developed(AJ), developing(N), developing(AJ), development(N)).


  1. WordNet's word derivation relationship

Convert words between part of speech, when wordnet doesn't do it

(Asking for software/data recommendations is off-topic for StackOverflow; but I have tried to give a more general "approach" answer.)

  1. Another approach to finding related words would be one of the machine learning approaches. If you are dealing with words in isolation, look at word embeddings such as GloVe or Word2Vec. Spacy and gensim have libraries for working with them, though I'm also getting some search hits for tutorials of working with them in nltk.

2/3. One of the (in my opinion) core reasons for the success of Princeton WordNet was the liberal license they used. That means you can branch the project, add your extra data, and redistribute.

You might also find something useful at http://globalwordnet.org/resources/global-wordnet-grid/ Obviously most of them are not for English, but there are a few multilingual ones in there, that might be worth evaluating?

Another approach would be to create a wrapper function. It first searches a lookup list of fixes and additions you think should be in there. If not found then it searches WordNet as normal. This allows you to add 'succeed', 'success', 'successful', and then other sets of words as end users point out something missing.



Related Topics



Leave a reply



Submit