Validate Words Against an English Dictionary in Rails

Simplest way to validate an input against a file of words?

The simplest way, but by no means the fastest, is to simply search against the word list each time. If the word list is in an array:

if word_list.index word
#manipulate word
end

If, however, you had the word list as a separate file (with each word on a separate line), then we'll use File#foreach to find it:

if File.foreach("word.list") {|x| break x if x.chomp == word}
#manipulate word
end

Note that foreach does not strip off the trailing newline character(s), so we get rid of them with String#chomp.

Is there a ruby library for checking whether a string is a valid word?

You can check out raspell, or even manually invoking aspell, with any dictionary you like.

:tsearch english vs simple dictionary performance

If you read PostgreSQL Full text Search Dictonary

There are

1) Simple dictonary(The simple dictionary template operates by
converting the input token to lower case and checking it against a
file of stop words.)

2) Synonym Dictionary(This dictionary template is
used to create dictionaries that replace a word with a synonym.
Phrases are not supported (use the thesaurus template (Section 12.6.4)
for that). A synonym dictionary can be used to overcome linguistic
problems, for example, to prevent an English stemmer dictionary from
reducing the word 'Paris' to 'pari')

3)Thesaurus Dictionary(A
thesaurus dictionary (sometimes abbreviated as TZ) is a collection of
words that includes information about the relationships of words and
phrases, i.e., broader terms (BT), narrower terms (NT), preferred
terms, non-preferred terms, related terms, etc.)

4)Ispell Dictionary(The Ispell dictionary template supports
morphological dictionaries, which can normalize many different
linguistic forms of a word into the same lexeme. For example, an
English Ispell dictionary can match all declensions and conjugations
of the search term bank, e.g. banking, banked, banks, banks', and
bank's.)

5) Snowball Dictionary(The Snowball dictionary template is based on
the project of Martin Porter, inventor of the popular Porter's
stemming algorithm for the English language. Snowball now provides
stemming algorithms for many languages (see the Snowball site for more
information). Each algorithm understands how to reduce common variant
forms of words to a base, or stem, spelling within its language)

Now in your case it will not check spell or not streaming or not abbreviation not any other language difference

If you use one of the language dictionaries, such as "english", then variants of words (e.g. "jumping" and "jumped") will match each other.

so any english word which are closer will match

If you don't want stemming, you should pick the "simple" dictionary which does not do any stemming. If you don't specify a dictionary, the "simple" dictionary will be used.

Like below Example

class Product < ActiveRecord::Base
include PgSearch

pg_search_scope(
:search,
against: %i(
description
manufacturer_name
name
),
using: {
tsearch: {
dictionary: "english",
}
}
)
end

So

Product.search("Milk") # return ["Mil", "Milka", "Milkmaid"]

But if you use dictionary: "simple"

 Product.search("Milk") #return "Milk" only   

I need a way to define words in a rails project

You can find your Wordnik API key on your user settings page: Wordnik.com/users/edit (it's all the way at the bottom)

Finding dictionary words within a source text, using Ruby

First let's read the words of a dictionary into an array, after chomping, downcasing and removing duplicates (if, for example, the dictionary contains both "A" and "a", as does the dictionary on my Mac that I've used below).

DICTIONARY = File.readlines("/usr/share/dict/words").map { |w| w.chomp.downcase }.uniq
#=> ["a", "aa", "aal", "aalii",..., "zyzomys", "zyzzogeton"]
DICTIONARY.size
#=> 234371

The following method generates all combinations of one or more characters of a given word, respecting order, and for each, joins the characters to form a string, checks to see if the string is in the dictionary, and if it is, saves the string to an array.

To check if a string matches a word in the dictionary I perform a binary search, using the method Array#bsearch. This makes use of the fact that the dictionary is already sorted in alphabetical order.

def subwords(word)
arr = word.chars
(1..word.size).each.with_object([]) do |n,a|
arr.combination(n).each do |comb|
w = comb.join
a << w if DICTIONARY.bsearch { |dw| w <=> dw }
end
end
end

subwords "crazed"
# => ["c", "r", "a", "z", "e", "d",
# "ca", "ce", "ra", "re", "ae", "ad", "ed",
# "cad", "rad", "red", "zed",
# "raze", "craze", "crazed"]

Yes, that particular dictionary contains all those strings (such as "z") that don't appear to be English words.

Another example.

subwords "importance"
#=> ["i", "m", "p", "o", "r", "t", "a", "n", "c", "e",
# "io", "it", "in", "ie", "mo", "mr", "ma", "me", "po", "pa", "or",
# "on", "oe", "ra", "re", "ta", "te", "an", "ae", "ne", "ce",
# "imp", "ima", "ion", "ira", "ire", "ita", "ian", "ice", "mor", "mot",
# "mon", "moe", "man", "mac", "mae", "pot", "poa", "pon", "poe", "pan",
# "pac", "ort", "ora", "orc", "ore", "one", "ran", "tan", "tae", "ace",
# "iota", "ione", "iran", "mort", "mora", "morn", "more", "mote",
# "moan", "mone", "mane", "mace", "port", "pore", "pote", "pone",
# "pane", "pace", "once", "rane", "race", "tane",
# "impot", "moran", "morne", "porta", "ponce", "rance",
# "import", "impone", "impane", "prance",
# "portance",
# "importance"]

Checking passwords against word database on server or use a web service?

/usr/share/dict/words contains a massive wordlist if you working on unix

Otherwise here is a ruby gem for something called wordnet which could easily solve your problem and probably include names of famous cities and people as well

You should google for 'password analysis' and check out some other common bad password patterns as well



Related Topics



Leave a reply



Submit