How to get synonyms from nltk WordNet Python
If you want the synonyms in the synset (aka the lemmas that make up the set), you can get them with lemma_names()
:
>>> for ss in wn.synsets('small'):
>>> print(ss.name(), ss.lemma_names())
small.n.01 ['small']
small.n.02 ['small']
small.a.01 ['small', 'little']
minor.s.10 ['minor', 'modest', 'small', 'small-scale', 'pocket-size', 'pocket-sized']
little.s.03 ['little', 'small']
small.s.04 ['small']
humble.s.01 ['humble', 'low', 'lowly', 'modest', 'small']
...
Synonyms/Join with a string of words in Python
If your string
argument is from words separated by spaces you can try this:
def str_synonyms(string):
newstring_list = []
for word in string.split():
if dictionary.synonym(word):
newstring_list.extend(dictionary.synonym(word))
newstring = ', '.join(newstring_list)
return newstring
I need to find the synonyms for a given word from a sentence. For an example
The easiest way to do this would be using a split method to split the two words into single words, then running that against the library that you're using.
An example would be like the one below:
from nltk.corpus import wordnet
synonyms = []
antonyms = []
input = "happy life"
input = input.split()
dictSynonyms = {}
for word in input:
for syn in wordnet.synsets(word):
for l in syn.lemmas():
dictSynonyms[l.name()] = word
Find similar/synonyms/context words Python
The other answer, and comments, describe how to get synonyms, but I think you want more than that?
I can suggest two broad approaches: WordNet and word embeddings.
Using nltk and wordnet, you want to explore the adjacent graph nodes. See http://www.nltk.org/howto/wordnet.html for an overview of the functions available. I'd suggest that once you've found your start word in Wordnet, follow all its relations, but also go up to the hypernym, and do the same there.
Finding the start word is not always easy:
http://wordnetweb.princeton.edu/perl/webwn?s=Postal+address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
Instead it seems I have to use "address": http://wordnetweb.princeton.edu/perl/webwn?s=address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
and then decide which of those is the correct sense here. Then try clicking the hypernym, hyponym, sister term, etc.
To be honest, none of those feels quite right.
Open Multilingual WordNet tries to link different languages. http://compling.hss.ntu.edu.sg/omw/ So you could take your English WordNet code, and move to the French WordNet with it, or vice versa.
The other approach is to use word embeddings. You find the, say, 300 dimensional, vector of your source word, and then hunt for the nearest words in that vector space. This will be returning words that are used in similar contexts, so they could be similar meaning, or similar syntactically.
Spacy has a good implementation, see https://spacy.io/usage/spacy-101#vectors-similarity and https://spacy.io/usage/vectors-similarity
Regarding English and French, normally you would work in the two languages independently. But if you search for "multilingual word embeddings" you will find some papers and projects where the vector stays the same for the same concept in different languages.
Note: the API is geared towards telling you how two words are similar, not finding similar words. To find similar words you need to take your vector and compare with every other word vector, which is O(N) in the size of the vocabulary. So you might want to do this offline, and build your own "synonyms-and-similar" dictionary for each word of interest.
Using WordNet with nltk to find synonyms that make sense
Sounds like you want word synonyms based upon the part of speech of the word (i.e. noun, verb, etc.)
Follows creates synonyms for each word in a sentence based upon part of speech.
References:
- Extract Word from Synset using Wordnet in NLTK 3.0
- Printing the part of speech along with the synonyms of the word
Code
import nltk; nltk.download('popular')
from nltk.corpus import wordnet as wn
def get_synonyms(word, pos):
' Gets word synonyms for part of speech '
for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
for lemma in synset.lemmas():
yield lemma.name()
def pos_to_wordnet_pos(penntag, returnNone=False):
' Mapping from POS tag word wordnet pos tag '
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''
Example Usage
# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")
for word, tag in nltk.pos_tag(text):
print(f'word is {word}, POS is {tag}')
# Filter for unique synonyms not equal to word and sort.
unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))
for synonym in unique:
print('\t', synonym)
Output
Note the different sets of synonyms for refuse based upon POS.
word is I, POS is PRP
word is refuse, POS is VBP
decline
defy
deny
pass_up
reject
resist
turn_away
turn_down
word is to, POS is TO
word is pick, POS is VB
beak
blame
break_up
clean
cull
find_fault
foot
nibble
peck
piece
pluck
plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
food_waste
garbage
scraps
How to find the most similar word in a list in python
Use difflib:
difflib.get_close_matches(word, ['car', 'animal', 'house', 'animation'])
As you can see from perusing the source, the "close" matches are sorted from best to worst.
>>> import difflib
>>> difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animation'])
['animal']
Related Topics
Variable Defined with With-Statement Available Outside of With-Block
Prevent Python from Caching the Imported Modules
Python Requests.Get Always Get 404
How to Call an Async Function Without Await
Using a Python Subprocess Call to Invoke a Python Script
Pygame 2 Dimensional Movement of an Enemy Towards the Player, How to Calculate X and Y Velocity
How to Create Collapsible Box in Pyqt
How to Get a Single Result from a SQL Query in Python
Installing Module from Github Through Jupyter Notebook
Python: Catching Specific Exception
Duplicate Items in Legend in Matplotlib
Cannot Display an Image in Tkinter
Case Insensitive Flask-Sqlalchemy Query
How to Get Stable Results with Tensorflow, Setting Random Seed