Looking for Java Spell Checker Library

Looking for Java spell checker library

You should check out Jazzy its used in some high profile Java applications. Two problems with it:

  1. It has not been updated since 2005.
  2. There is only English dictionary on their SourceForge page.

There are some third party dictionaries floating around. I had one for French, last time I used jazzy.

Open source spell checking library for Java

GNU Aspell is a LGPL spelling library that you can use but it's implemented in C++. Although I haven't used it there's a Java library called Jazzy that aims to be a Java re-implementation of Aspell. It's a fairly old project but it looks like it still works.

Edit:

Just discovered that Hunspell is a better project for spell checking. It powers Open Office.org, Firefox and Google Chrome. There's also a project with that supplies JNA wrappers so you can use it in Java.

Spell checking for base word

The mistake you do here is in this loop

for(IndexWord word : collection) {
Synset[] senses = word.getSenses();
if(senses != null && senses.length > 0
&& senses[0].toString().toLowerCase().contains(token)) {
return true;
}
}

The line Synset[] senses = word.getSenses() returns all senses of the word, but you are checking only the first one (0-index). The word will be available in one of the senses.
Something like this

for (IndexWord word : collection) {

Synset[] senses = word.getSenses();
for(Synset sense:senses){
if(sense.getGloss().toLowerCase().contains(token)){return true;}
}

}

Adding on to this, the ing forms of words may not be available as senses. I'm not sure why you want to search for the senses to decide its a valid word.

A code like if(set.getLemma() != null)
return true;

should be enough to decide the spell check right?

JTextArea Real Time Spell Checker

Decided to go with Wintertree

Spell check and/or spell correction in Java

Google's Spell Checker http://code.google.com/p/google-api-spelling-java/

 SpellChecker checker = new SpellChecker();

SpellResponse spellResponse = checker.check( "helloo worlrd" );

for( SpellCorrection sc : spellResponse.getCorrections() )
System.out.println( sc.getValue() );

It's much like when you use Gmail or Google services (like translate.google.com or search) that gives you alternate suggestion if you have a typo.

What happens in the background?

The SpellChecker class transforms the request into XML and sends it to
the Google's spell checker service. The response is also in XML, which
is then deserialized into simple POJOs.

The request to the first example above looks like:

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<spellrequest textalreadyclipped="0" ignoredigits="1"
ignoreallcaps="1" ignoredups="0">
<text>helloo worlrd</text>
</spellrequest>

And the response XML looks like:

  <?xml version="1.0" encoding="UTF-8"?>  
<spellresult error="0" clipped="0" charschecked="13">
<c o="0" l="6" s="1">hello Helli hell hallo hullo</c>
<c o="7" l="6" s="1">world whorled wold warlord would</c>
</spellresult>

Haven't tried though.


UPDATE:
Google might have started charging for this. I do not have time to code to check this. Someone can confirm. As far as Google is concerned, it seems that they have deprecated the old API for new and paid one.

Refer: Google Translate API FAQ

What happened to earlier free versions of Translate API?
Google Translate API v1 is no longer available as of December 1, 2011 and has been replaced by Google Translate API v2. Google Translate API v1 was officially deprecated on May 26, 2011. The decision to deprecate the API and replace it with the paid service was made due to the substantial economic burden caused by extensive abuse.

Spelling correction for data normalization in Java

What you want to implement is not spelling corrector but a fuzzy search. Peter Norvig's essay is a good starting point to build a fuzzy search from candidates checked against a dictionary.

Alternatively have a look at BK-Trees.

An n-gram index (used by Lucene) produces better results for longer words. The approach to produce candidates up to a given edit distance will probably work good enough for words found in normal text but will not work good enough for names, addresses and scientific texts. It will increase you index size, though.

If you have the texts indexed you have your text corpus (your dictionary). Only what is in your data can be found anyway. You need not use an external dictionary.

A good resource is Introduction to Information Retrieval - Dictionaries and tolerant retrieval . There is a short description of context sensitive spelling correction.

How do I add spell checking to a JTextArea?

You could implement your own spell checker using a dictionary (can get quite large depending on languages you support), then distance metrics are calculated from the words in the text box to the dictionary. Underlining can be done using font styling, there as applet based sample here.

Jaspell is a Java implementation of the popular Aspell. In there are some explantions of the search algorithms used.

As mentioned previously Jazzy is also great and IBM provides a nice tutorial.



Related Topics



Leave a reply



Submit