Looking for Java spell checker library
You should check out Jazzy its used in some high profile Java applications. Two problems with it:
- It has not been updated since 2005.
- There is only English dictionary on their SourceForge page.
There are some third party dictionaries floating around. I had one for French, last time I used jazzy.
Open source spell checking library for Java
GNU Aspell is a LGPL spelling library that you can use but it's implemented in C++. Although I haven't used it there's a Java library called Jazzy that aims to be a Java re-implementation of Aspell. It's a fairly old project but it looks like it still works.
Edit:
Just discovered that Hunspell is a better project for spell checking. It powers Open Office.org, Firefox and Google Chrome. There's also a project with that supplies JNA wrappers so you can use it in Java.
Spell checking for base word
The mistake you do here is in this loop
for(IndexWord word : collection) {
Synset[] senses = word.getSenses();
if(senses != null && senses.length > 0
&& senses[0].toString().toLowerCase().contains(token)) {
return true;
}
}
The line Synset[] senses = word.getSenses()
returns all senses of the word, but you are checking only the first one (0-index). The word will be available in one of the senses.
Something like this
for (IndexWord word : collection) {
Synset[] senses = word.getSenses();
for(Synset sense:senses){
if(sense.getGloss().toLowerCase().contains(token)){return true;}
}
}
Adding on to this, the ing forms of words may not be available as senses. I'm not sure why you want to search for the senses to decide its a valid word.
A code like if(set.getLemma() != null)
return true;
should be enough to decide the spell check right?
JTextArea Real Time Spell Checker
Decided to go with Wintertree
Spell check and/or spell correction in Java
Google's Spell Checker http://code.google.com/p/google-api-spelling-java/
SpellChecker checker = new SpellChecker();
SpellResponse spellResponse = checker.check( "helloo worlrd" );
for( SpellCorrection sc : spellResponse.getCorrections() )
System.out.println( sc.getValue() );
It's much like when you use Gmail or Google services (like translate.google.com or search) that gives you alternate suggestion if you have a typo.
What happens in the background?
The SpellChecker class transforms the request into XML and sends it to
the Google's spell checker service. The response is also in XML, which
is then deserialized into simple POJOs.The request to the first example above looks like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<spellrequest textalreadyclipped="0" ignoredigits="1"
ignoreallcaps="1" ignoredups="0">
<text>helloo worlrd</text>
</spellrequest>
And the response XML looks like:
<?xml version="1.0" encoding="UTF-8"?>
<spellresult error="0" clipped="0" charschecked="13">
<c o="0" l="6" s="1">hello Helli hell hallo hullo</c>
<c o="7" l="6" s="1">world whorled wold warlord would</c>
</spellresult>
Haven't tried though.
UPDATE:
Google might have started charging for this. I do not have time to code to check this. Someone can confirm. As far as Google is concerned, it seems that they have deprecated the old API for new and paid one.
Refer: Google Translate API FAQ
What happened to earlier free versions of Translate API?
Google Translate API v1 is no longer available as of December 1, 2011 and has been replaced by Google Translate API v2. Google Translate API v1 was officially deprecated on May 26, 2011. The decision to deprecate the API and replace it with the paid service was made due to the substantial economic burden caused by extensive abuse.
Spelling correction for data normalization in Java
What you want to implement is not spelling corrector but a fuzzy search. Peter Norvig's essay is a good starting point to build a fuzzy search from candidates checked against a dictionary.
Alternatively have a look at BK-Trees.
An n-gram index (used by Lucene) produces better results for longer words. The approach to produce candidates up to a given edit distance will probably work good enough for words found in normal text but will not work good enough for names, addresses and scientific texts. It will increase you index size, though.
If you have the texts indexed you have your text corpus (your dictionary). Only what is in your data can be found anyway. You need not use an external dictionary.
A good resource is Introduction to Information Retrieval - Dictionaries and tolerant retrieval . There is a short description of context sensitive spelling correction.
How do I add spell checking to a JTextArea?
You could implement your own spell checker using a dictionary (can get quite large depending on languages you support), then distance metrics are calculated from the words in the text box to the dictionary. Underlining can be done using font styling, there as applet based sample here.
Jaspell is a Java implementation of the popular Aspell. In there are some explantions of the search algorithms used.
As mentioned previously Jazzy is also great and IBM provides a nice tutorial.
Related Topics
Java 9, Compatability Issue with Classloader.Getsystemclassloader
Get All Variable Names in a Class
Gson and Deserializing an Array of Objects with Arrays in It
Tomcat in Idea. War Exploded: Server Is Not Connected. Deploy Is Not Available
How to Get Which Jradiobutton Is Selected from a Buttongroup
Multiple Inheritance for an Anonymous Class
Compile Time VS Run Time Dependency - Java
Having a Column Name as Input Parameter of a Preparedstatement
Java Hashmap Performance Optimization/Alternative
Why Can Array Constants Only Be Used in Initializers
Should I Use a Separate Scriptengine and Compiledscript Instances Per Each Thread
How to Rotate Jpeg Images Based on the Orientation Metadata
What's the Best Way to Implement 'Next' and 'Previous' on an Enum Type
Why Does Collections.Sort Use Mergesort But Arrays.Sort Does Not