Simplest way to validate an input against a file of words?
The simplest way, but by no means the fastest, is to simply search against the word list each time. If the word list is in an array:
if word_list.index word
#manipulate word
end
If, however, you had the word list as a separate file (with each word on a separate line), then we'll use File#foreach
to find it:
if File.foreach("word.list") {|x| break x if x.chomp == word}
#manipulate word
end
Note that foreach
does not strip off the trailing newline character(s), so we get rid of them with String#chomp
.
Is there a ruby library for checking whether a string is a valid word?
You can check out raspell, or even manually invoking aspell
, with any dictionary you like.
:tsearch english vs simple dictionary performance
If you read PostgreSQL Full text Search Dictonary
There are
1) Simple dictonary(The simple dictionary template operates by
converting the input token to lower case and checking it against a
file of stop words.)2) Synonym Dictionary(This dictionary template is
used to create dictionaries that replace a word with a synonym.
Phrases are not supported (use the thesaurus template (Section 12.6.4)
for that). A synonym dictionary can be used to overcome linguistic
problems, for example, to prevent an English stemmer dictionary from
reducing the word 'Paris' to 'pari')3)Thesaurus Dictionary(A
thesaurus dictionary (sometimes abbreviated as TZ) is a collection of
words that includes information about the relationships of words and
phrases, i.e., broader terms (BT), narrower terms (NT), preferred
terms, non-preferred terms, related terms, etc.)4)Ispell Dictionary(The Ispell dictionary template supports
morphological dictionaries, which can normalize many different
linguistic forms of a word into the same lexeme. For example, an
English Ispell dictionary can match all declensions and conjugations
of the search term bank, e.g. banking, banked, banks, banks', and
bank's.)5) Snowball Dictionary(The Snowball dictionary template is based on
the project of Martin Porter, inventor of the popular Porter's
stemming algorithm for the English language. Snowball now provides
stemming algorithms for many languages (see the Snowball site for more
information). Each algorithm understands how to reduce common variant
forms of words to a base, or stem, spelling within its language)
Now in your case it will not check spell or not streaming or not abbreviation not any other language difference
If you use one of the language dictionaries, such as "english", then variants of words (e.g. "jumping" and "jumped") will match each other.
so any english word which are closer will match
If you don't want stemming, you should pick the "simple" dictionary which does not do any stemming. If you don't specify a dictionary, the "simple" dictionary will be used.
Like below Example
class Product < ActiveRecord::Base
include PgSearch
pg_search_scope(
:search,
against: %i(
description
manufacturer_name
name
),
using: {
tsearch: {
dictionary: "english",
}
}
)
end
So
Product.search("Milk") # return ["Mil", "Milka", "Milkmaid"]
But if you use dictionary: "simple"
Product.search("Milk") #return "Milk" only
I need a way to define words in a rails project
You can find your Wordnik API key on your user settings page: Wordnik.com/users/edit (it's all the way at the bottom)
Finding dictionary words within a source text, using Ruby
First let's read the words of a dictionary into an array, after chomping, downcasing and removing duplicates (if, for example, the dictionary contains both "A"
and "a"
, as does the dictionary on my Mac that I've used below).
DICTIONARY = File.readlines("/usr/share/dict/words").map { |w| w.chomp.downcase }.uniq
#=> ["a", "aa", "aal", "aalii",..., "zyzomys", "zyzzogeton"]
DICTIONARY.size
#=> 234371
The following method generates all combinations of one or more characters of a given word, respecting order, and for each, joins the characters to form a string, checks to see if the string is in the dictionary, and if it is, saves the string to an array.
To check if a string matches a word in the dictionary I perform a binary search, using the method Array#bsearch. This makes use of the fact that the dictionary is already sorted in alphabetical order.
def subwords(word)
arr = word.chars
(1..word.size).each.with_object([]) do |n,a|
arr.combination(n).each do |comb|
w = comb.join
a << w if DICTIONARY.bsearch { |dw| w <=> dw }
end
end
end
subwords "crazed"
# => ["c", "r", "a", "z", "e", "d",
# "ca", "ce", "ra", "re", "ae", "ad", "ed",
# "cad", "rad", "red", "zed",
# "raze", "craze", "crazed"]
Yes, that particular dictionary contains all those strings (such as "z"
) that don't appear to be English words.
Another example.
subwords "importance"
#=> ["i", "m", "p", "o", "r", "t", "a", "n", "c", "e",
# "io", "it", "in", "ie", "mo", "mr", "ma", "me", "po", "pa", "or",
# "on", "oe", "ra", "re", "ta", "te", "an", "ae", "ne", "ce",
# "imp", "ima", "ion", "ira", "ire", "ita", "ian", "ice", "mor", "mot",
# "mon", "moe", "man", "mac", "mae", "pot", "poa", "pon", "poe", "pan",
# "pac", "ort", "ora", "orc", "ore", "one", "ran", "tan", "tae", "ace",
# "iota", "ione", "iran", "mort", "mora", "morn", "more", "mote",
# "moan", "mone", "mane", "mace", "port", "pore", "pote", "pone",
# "pane", "pace", "once", "rane", "race", "tane",
# "impot", "moran", "morne", "porta", "ponce", "rance",
# "import", "impone", "impane", "prance",
# "portance",
# "importance"]
Checking passwords against word database on server or use a web service?
/usr/share/dict/words contains a massive wordlist if you working on unix
Otherwise here is a ruby gem for something called wordnet which could easily solve your problem and probably include names of famous cities and people as well
You should google for 'password analysis' and check out some other common bad password patterns as well
Related Topics
Best Way to Highlight Current Page in Rails 3? - Apply a CSS Class to Links Conditionally
How to Install Jekyll on Osx 10.11
How to Say Something Happened "X Minutes Ago" or "X Hours Ago" or "X Days Ago" in Ruby
How to Install JSON Gem - Failed to Build Gem Native Extension(MAC 10.10)
Weird Imoperfection in Ruby Blocks
Tcpserver Error: Address Already in Use - Bind(2)
How to Return Early from a Rake Task
Rails Activesupport Time Parsing
Gem.Source_Index Is Deprecated, Use Specification. Should I Re-Install Gem or Rails
How to Use Hash Keys as Methods on a Class
How to Automate Chrome Request Blocking Using Selenium-Webdriver for Ruby
How to Set Http_Referer When Testing in Rails
Why Doesn't Minitest::Spec Have a Wont_Raise Assertion
How to Invoke an Instance Method on a Ruby Module Without Including It