Ruby 1.9 Regex as a Hash Key

Ruby 1.9 regex as a hash key

It will not work without some extra code, as it is you are comparing a Regexp object with either an Integer or a String object. They won't be value equal, nor identity equal. They would match but that requires changes to the Hash class code.

irb(main):001:0> /(\d+)/.class
=> Regexp
irb(main):002:0> 2222.class
=> Fixnum
irb(main):003:0> '2222'.class
=> String
irb(main):004:0> /(\d+)/==2222
=> false
irb(main):007:0> /(\d+)/=='2222'
=> false
irb(main):009:0> /(\d+)/.equal?'2222'
=> false
irb(main):010:0> /(\d+)/.equal?2222
=> false

you would have to iterate the hash and use =~ in something like:

 hash.each do |k,v|    
unless (k=~whatever.to_s).nil?
puts v
end
end

or change the Hash class to try =~ in addition to the normal matching conditions. (I think that last option would be difficult, in mri the Hash class seems to have a lot of C code)

Ruby regex key search

I would advise extending Hash with a new method instead of replacing has_key?.

class Hash
def has_rkey?(search)
search = Regexp.new(search.to_s) unless search.is_a?(Regexp)
!!keys.detect{ |key| key =~ search }
end
end

This will work with strings, symbols or a regexp as arguments.

irb> h = {:test => 1}
 => {:test=>1}  
irb> h.has_rkey?(:te)
=> true
irb> h.has_rkey?("te")
=> true
irb> h.has_rkey?(/te/)
=> true
irb> h.has_rkey?("foo")
=> false
irb> h.has_rkey?(:foo)
=> false
irb> h.has_rkey?(/foo/)
=> false

Ruby 1.9 - Convert hash based off regex

One more option:

Hash[h.keys.grep(/P\d+/).map {|k| [h[k], h[k.tr('P','R')]] }]

Must a gsub hash key be a string, not a regexp?

This is a related question. If you need to use the hash because many things have to be substituted, this might work:

list = Hash.new{|h,k|if  /\s+/ =~ k then ' ' else k end}
list['foo'] = 'bar'
list['apple'] = 'banana'

p "appleabc\t \tabc apple foo".gsub(/\w+|\W+/,list)
#=> "appleabc abc banana bar"
p list
#=>{"foo"=>"bar", "apple"=>"banana"} no garbage

Use a regex to match which is stored in a hash

This isn't a Ruby question per se, it's how to construct a regex pattern that accomplishes what you want.

In "regex-ese", /pattern[:key]/ means:

  1. Find pattern.
  2. Following pattern look for one of :, k, e or y.

Ruby doesn't automatically interpolate variables in strings or regex patterns like Perl does, so, instead, we have to mark where the variable is using #{...} inline.

If you're only using /pattern[:key]/ as a pattern, don't bother interpolating it into a pattern. Instead, take the direct path and let Regexp do it for you:

pattern[:key] = 'foo'
Regexp.new(pattern[:key])
=> /foo/

Which is the same result as:

/#{pattern[:key]}/
=> /foo/

but doesn't waste CPU cycles.

Another of your attempts used ., [ and ], which are reserved characters in patterns, used to help define patterns. If you need to use such characters, you can have Ruby's Regexp.escape add \ escape characters appropriately, preserving their normal/literal meaning in the string:

Regexp.escape('5.7.1 [abc]')
=> "5\\.7\\.1\\ \\[abc\\]"

which, in real life is "5\.7\.1\ \[abc\]" (when not being displayed in IRB)

To use that in a regex, use:

Regexp.new(Regexp.escape('5.7.1 [abc]'))
=> /5\.7\.1\ \[abc\]/

Iterate through a ruby hash and select a regex value

The (?-mix:calendar) is the string representation of the regular expression when using ruby.

>> a = /test(er)/
=> /test(er)/
>> print a.source
test(er)=> nil
>> print a
(?-mix:test(er))=> nil
>>

Ruby 1.9 regular expression to match (un)?quoted key-value assignment

It's probably possible to do in one regex pattern, but I am a believer in keeping the patterns simple. Regex can be insidious and hide lots of little errors. Keep it simple to avoid that, then tweak afterwards.

text = <<EOT
RAILS_ENV=production
listen_address = 127.0.0.1 # localhost only by default
PATH="/usr/local/bin"
EOT

text.scan(/^([^=]+)=(.+)/)
# => [["RAILS_ENV", "production"], ["listen_address ", " 127.0.0.1 # localhost only by default"], ["PATH", "\"/usr/local/bin\""]]

To trim off the trailing comment is easy in a subsequent map:

text.scan(/^([^=]+)=(.+)/).map{ |n,v| [ n, v.sub(/#.+/, '') ] }
# => [["RAILS_ENV", "production"], ["listen_address ", " 127.0.0.1 "], ["PATH", "\"/usr/local/bin\""]]

If you want to normalize all your name/values so they have no extraneous spaces you can do that in the map also:

text.scan(/^([^=]+)=(.+)/).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""]]

What the regex "/^([^=]+)=(.+)/" is doing is:

  1. "^" is "At the beginning of a line", which is the character after a "\n". This is not the same as the start of a string, which would be \A. There is an important difference so if you don't understand the two it is a good idea to learn when and why you'd want to use one over the other. That's one of those places a regex can be insidious.
  2. "([^=]+)" is "Capture everything that is not an equal-sign".
  3. "=" is obviously the equal-sign we were looking for in the previous step.
  4. "(.+)" is going to capture everything after the equal-sign.

I purposely kept the above pattern simple. For production use I'd tighten up the patterns a little using some "non-greedy" flags, along with a trailing "$" anchor:

text.scan(/^([^=]+?)=(.+)$/).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""]]
  1. +? means find the first matching '='. It's already implied by the use of [^=] but +? makes that even more obvious to be my intent. I can get away without the ? but it's more of a self-documentation thing for later maintenance. In your use-case it should be benign but is a worthy thing to keep in your Regex Bag 'o Tricks.
  2. $ means the end-of-the-string, i.e., the place immediately preceding the EOL, AKA end-of-line, or carriage-return. It's implied also, but inserting it in the pattern makes it more obvious that's what I'm searching for.

EDIT to track the OP's added test:

text = <<EOT
RAILS_ENV=production
listen_address = 127.0.0.1 # localhost only by default
PATH="/usr/local/bin"
HOSTNAME=`cat /etc/hostname`
EOT

text.scan( /^ ( [^=]+? ) = ( .+ ) $/x ).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] }
=> [["RAILS_ENV", "production"], ["listen_address", "127.0.0.1"], ["PATH", "\"/usr/local/bin\""], ["HOSTNAME", "`cat /etc/hostname`"]]

If I was writing this for myself I'd generate a hash for convenience:

Hash[ text.scan( /^ ( [^=]+? ) = ( .+ ) $/x ).map{ |n,v| [ n.strip, v.sub(/#.+/, '').strip ] } ]
=> {"RAILS_ENV"=>"production", "listen_address"=>"127.0.0.1", "PATH"=>"\"/usr/local/bin\"", "HOSTNAME"=>"`cat /etc/hostname`"}

Check the string with hash key

OK, hold onto your hat:

HASH_LIST = {
"ruby" => "fun to learn",
"the rails" => "It is a framework"
}

test_string = "I am learning the ruby by myself and also the rails."

keys_regex = /\b (?:#{Regexp.union(HASH_LIST.keys).source}) \b/x # => /\b (?:ruby|the\ rails) \b/x
test_string.gsub(keys_regex, HASH_LIST) # => "I am learning the fun to learn by myself and also It is a framework."

Ruby's got some great tricks up its sleeve, one of which is how we can throw a regular expression and a hash at gsub, and it'll search for every match of the regular expression, look up the matching "hits" as keys in the hash, and substitute the values back into the string:

gsub(pattern, hash) → new_str

...If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string....

Regexp.union(HASH_LIST.keys) # => /ruby|the\ rails/
Regexp.union(HASH_LIST.keys).source # => "ruby|the\\ rails"

Note that the first returns a regular expression and the second returns a string. This is important when we embed them into another regular expression:

/#{Regexp.union(HASH_LIST.keys)}/ # => /(?-mix:ruby|the\ rails)/
/#{Regexp.union(HASH_LIST.keys).source}/ # => /ruby|the\ rails/

The first can quietly destroy what you think is a simple search, because of the ?-mix: flags, which ends up embedding different flags inside the pattern.

The Regexp documentation covers all this well.

This capability is the core to making an extremely high-speed templating routine in Ruby.



Related Topics



Leave a reply



Submit