Do Ruby 1.8 and 1.9 Have the Same Hash Code for a String

Do Ruby 1.8 and 1.9 have the same hash code for a string?

Fortunately, the answer is easy because they do not:

~$ ruby1.8 -e 'p "hello world".hash'
444332266
~$ ruby1.9 -e 'p "hello world".hash'
-194819219

If you use the builtin hash method, I would recommend having a script as part of your build process that generates the necessary hashcodes. Note that they are not guaranteed to be the same even from one machine to the next.

If you need consistent hashing, use something like CRC32 or SHA1:

>> require 'zlib'
>> Zlib.crc32 "hello world"
=> 222957957
>> require 'digest'
>> Digest::SHA1.hexdigest "hello world"
=> "2aae6c35c94fcfb415dbe95f408b9ce91ee846ed"
>> Digest::MD5.hexdigest "hello world"
=> "5eb63bbbe01eeed093cb22bb8f5acdc3"

They have quite different purposes, but CRC32 has the advantage of returning a 32-bit number and being quite fast, while SHA1 is an 80-bit number but more secure. (I’m assuming this is not for cryptographic purposes, but look into SHA-256 if you need it.)

Ruby: How come the same strings have different hashcodes?

The results of these expressions are not all the same data. Ruby 1.8 integers contain character numbers for single character indexing. This has been changed in Ruby 1.9, but slice(0) returns the first character of the string '@', not 'a'.

In Ruby 1.8 (using irb):

irb(main):001:0> test = 'a'
=> "a"
irb(main):002:0> test2 = '@a'.slice(0)
=> 64
irb(main):003:0> test3 = '@a'[1]
=> 97
irb(main):004:0> test.hash
=> 100
irb(main):005:0> test2.hash
=> 129
irb(main):006:0> test3.hash
=> 195

In Ruby 1.9.1:

irb(main):001:0> test = 'a'
=> "a"
irb(main):002:0> test2 = '@a'.slice(0)
=> "@"
irb(main):003:0> test3 = '@a'[1]
=> "a"
irb(main):004:0> test.hash
=> 1365935838
irb(main):005:0> test2.hash
=> 347394336
irb(main):006:0> test3.hash
=> 1365935838

String length difference between ruby 1.8 and 1.9

This is a Unicode issue. The string you are using contains characters outside the ASCII range, and the UTF-8 encoding that is frequently used encodes those as 2 (or more) bytes.

Ruby 1.8 did not handle Unicode properly, and length simply gives the number of bytes in the string, which results in fun stuff like:

"ą".length
=> 2

Ruby 1.9 has better Unicode handling. This includes length returning the actual number of characters in the string, as long as Ruby knows the encoding:

"ä".length
=> 1

One possible workaround in Ruby 1.8 is using regular expressions, which can be made Unicode aware:

"ą".scan(/./mu).size
=> 1

Ruby make 1.8 Hash#select behave like 1.9 Hash#select

Hash[{1=>2,3=>4}.select{|k,v| v>2}]

Consistent String#hash based only on the string's content

there are lot of such functionality in ruby's digest module: http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

simple example:

require 'digest/sha1'
Digest::SHA1.hexdigest("some string")

Why do String hashes change?

Same string doesn't return same hash between two sessions of Ruby, only in the current session.

➜  tmp  pry
[1] pry(main)> "foo".hash
=> -3172192351909719463
[2] pry(main)> exit
➜ tmp pry
[1] pry(main)> "foo".hash
=> 2138900251898429379
[2] pry(main)> "foo".hash
=> 2138900251898429379

Ruby make 1.8 Hash#select behave like 1.9 Hash#select

Hash[{1=>2,3=>4}.select{|k,v| v>2}]

Allowing for Ruby 1.9's hash syntax?

Even in Ruby < 1.9, you could use symbols for keys. For example:

# Ruby 1.8.7
settings = { :host => "localhost" }
puts settings[:host] #outputs localhost
settings.keys[0].class # => Symbol

Ruby 1.9 changes the way that you create hashes. It takes the key and converts it to a symbol for you, while eliminating the need for a hash rocket.

# Ruby 1.9.2
settings = { host: "localhost" }
settings[:host] # => "localhost"
settings.keys[0].class # => Symbol

In both cases, if I try to access settings[:name] with settings["name"], I'm going to get nil. All Ruby 1.9 does is allow for a new way of creating hashes. To answer your question, you cannot, as far as I know, use the new {key: value} syntax if you want backwards compatibility with Ruby 1.8.

Library to get a String to behave like in 1.9 in 1.8

Just did it myself...

gem install string19
String19('áßð').size == 3
String19('áßð').index('ð') == 2

etc.
not all methods supported, but easy to add more

alternatives to Hash#index that works without warning in both Ruby 1.8 and 1.9

You could also invert the hash:

{ :hello => :world }.invert[:world]    # ==> :hello

No monkey-patching or external dependencies, but probably less efficient for most purposes.



Related Topics



Leave a reply



Submit