All Synonyms for Word in Python

How to get synonyms from nltk WordNet Python

If you want the synonyms in the synset (aka the lemmas that make up the set), you can get them with lemma_names():

>>> for ss in wn.synsets('small'):
>>> print(ss.name(), ss.lemma_names())

small.n.01 ['small']
small.n.02 ['small']
small.a.01 ['small', 'little']
minor.s.10 ['minor', 'modest', 'small', 'small-scale', 'pocket-size', 'pocket-sized']
little.s.03 ['little', 'small']
small.s.04 ['small']
humble.s.01 ['humble', 'low', 'lowly', 'modest', 'small']
...

Synonyms/Join with a string of words in Python

If your string argument is from words separated by spaces you can try this:

def str_synonyms(string):
newstring_list = []
for word in string.split():
if dictionary.synonym(word):
newstring_list.extend(dictionary.synonym(word))
newstring = ', '.join(newstring_list)
return newstring

I need to find the synonyms for a given word from a sentence. For an example

The easiest way to do this would be using a split method to split the two words into single words, then running that against the library that you're using.
An example would be like the one below:

from nltk.corpus import wordnet
synonyms = []
antonyms = []
input = "happy life"

input = input.split()

dictSynonyms = {}

for word in input:
for syn in wordnet.synsets(word):
for l in syn.lemmas():
dictSynonyms[l.name()] = word

Find similar/synonyms/context words Python

The other answer, and comments, describe how to get synonyms, but I think you want more than that?

I can suggest two broad approaches: WordNet and word embeddings.

Using nltk and wordnet, you want to explore the adjacent graph nodes. See http://www.nltk.org/howto/wordnet.html for an overview of the functions available. I'd suggest that once you've found your start word in Wordnet, follow all its relations, but also go up to the hypernym, and do the same there.

Finding the start word is not always easy:
http://wordnetweb.princeton.edu/perl/webwn?s=Postal+address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=

Instead it seems I have to use "address": http://wordnetweb.princeton.edu/perl/webwn?s=address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
and then decide which of those is the correct sense here. Then try clicking the hypernym, hyponym, sister term, etc.
To be honest, none of those feels quite right.

Open Multilingual WordNet tries to link different languages. http://compling.hss.ntu.edu.sg/omw/ So you could take your English WordNet code, and move to the French WordNet with it, or vice versa.

The other approach is to use word embeddings. You find the, say, 300 dimensional, vector of your source word, and then hunt for the nearest words in that vector space. This will be returning words that are used in similar contexts, so they could be similar meaning, or similar syntactically.

Spacy has a good implementation, see https://spacy.io/usage/spacy-101#vectors-similarity and https://spacy.io/usage/vectors-similarity

Regarding English and French, normally you would work in the two languages independently. But if you search for "multilingual word embeddings" you will find some papers and projects where the vector stays the same for the same concept in different languages.

Note: the API is geared towards telling you how two words are similar, not finding similar words. To find similar words you need to take your vector and compare with every other word vector, which is O(N) in the size of the vocabulary. So you might want to do this offline, and build your own "synonyms-and-similar" dictionary for each word of interest.

Using WordNet with nltk to find synonyms that make sense

Sounds like you want word synonyms based upon the part of speech of the word (i.e. noun, verb, etc.)

Follows creates synonyms for each word in a sentence based upon part of speech.
References:

  1. Extract Word from Synset using Wordnet in NLTK 3.0
  2. Printing the part of speech along with the synonyms of the word

Code

import nltk; nltk.download('popular') 
from nltk.corpus import wordnet as wn

def get_synonyms(word, pos):
' Gets word synonyms for part of speech '
for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
for lemma in synset.lemmas():
yield lemma.name()

def pos_to_wordnet_pos(penntag, returnNone=False):
' Mapping from POS tag word wordnet pos tag '
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''

Example Usage

# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")

for word, tag in nltk.pos_tag(text):
print(f'word is {word}, POS is {tag}')

# Filter for unique synonyms not equal to word and sort.
unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))

for synonym in unique:
print('\t', synonym)

Output

Note the different sets of synonyms for refuse based upon POS.

word is I, POS is PRP
word is refuse, POS is VBP
decline
defy
deny
pass_up
reject
resist
turn_away
turn_down
word is to, POS is TO
word is pick, POS is VB
beak
blame
break_up
clean
cull
find_fault
foot
nibble
peck
piece
pluck
plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
food_waste
garbage
scraps

How to find the most similar word in a list in python

Use difflib:

difflib.get_close_matches(word, ['car', 'animal', 'house', 'animation'])

As you can see from perusing the source, the "close" matches are sorted from best to worst.

>>> import difflib
>>> difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animation'])
['animal']


Related Topics



Leave a reply



Submit