Why Is a String Key for a Hash Frozen

Why is a string key for a hash frozen?

In short it's just Ruby trying to be nice.

When a key is entered in a Hash, a special number is calculated, using the hash method of the key. The Hash object uses this number to retrieve the key. For instance, if you ask what the value of h['a'] is, the Hash calls the hash method of string 'a' and checks if it has a value stored for that number. The problem arises when someone (you) mutates the string object, so the string 'a' is now something else, let's say 'aa'. The Hash would not find a hash number for 'aa'.

The most common types of keys for hashes are strings, symbols and integers. Symbols and integers are immutable, but strings are not. Ruby tries to protect you from the confusing behaviour described above by dupping and freezing string keys. I guess it's not done for other types because there could be nasty performance side effects (think of large arrays).

Why freezing hash literal is not the same as freezing string literal?

It's probably the case that the Ruby optimizer can identify that string as being the same from one loop to the next, but it's unable to identify that hash as being identical so it makes new ones. In the second variant you literally use the same hash so the optimizer can handle it.

For proof, look at this:

pipeline = []
100_000.times { pipeline << 'hello world'.freeze }
pipeline.map(&:object_id).uniq.length
# => 1

That's an array of identical objects, one allocation only.

pipeline = []
100_000.times { pipeline << {hello: 'world'}.freeze }

pipeline.map(&:object_id).uniq.length
# => 100000

That's 100,000 different objects.

Why would one use String keys for a hash over symbols

Because the source of the keys--the query string--is made up of strings, so searching through this string for keys, it is most directly convenient to index the hash via the strings.

Every Symbol that is created in the Ruby runtime is allocated and never released. There is a theoretical (but unlikely) DOS attack available by sending hundreds of thousands of requests with unique query string parameters. If these were symbolized, each request would slowly grow the runtime memory pool.

Strings, on the other hand, may be garbage collected. Thousands of unique strings handled across various requests will eventually go away, with no long-term impact.

Edit: Note that with Sinatra, symbols are also available for accessing the params hash. However, this is done by creating a hash that is indexed by strings, and converting symbols (in your code) to strings when you make a request. Unless you do something like the following:

params.each{ |key,_| key.to_sym }

...you are not at risk of any symbol pseudo-DOS attack.

Hash code, why is string and numeric key not good for memory address

These lecture notes you refer to come directly from Goodrich & Tamassia’s book (both the Algorithm design one and Data Structures). It discusses a variety of hash code functions - such as using the memory address of the object, using an integer cast, component sum, or polynomial accumulation. It notes that using the memory address of an object is “good in general, except for numeric and string keys”.

There are times when the hash code that maps an object to an integer based on its memory address is sufficient, even if that object is a string. However, two objects with equal value (a=‘hello’, b=‘hello’) would not have the same hash code using this method, since they have different memory addresses. The same applies for other objects such as numeric keys (a=10, b=10 are equal in value but not in memory address).

Consider a simple system which stores a password for a user as a hash code. If the user enters in a password, the string they entered in is hashed according to the same hash function and compared against the one which is stored. These two passwords have the same value (the one the user first created and the one they use to log in), so they should produce the same hash to successfully log in. Therefore, we would not want to use the memory address to map the string to an integer in this scenario.

Why are symbols not frozen strings?

This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)

So, at one point in 2006, matz implemented the Symbol class as Symbol < String. Then the Symbol class was stripped down to remove any mutability. So a Symbol was in fact a immutable String.

However, it was reverted. The reason given was

Even though it is highly against DuckTyping, people tend to use case
on classes, and Symbol < String often cause serious problems.

So the answer to your question is still: a Symbol is like a String, but it isn't.
The problem isn't that a Symbol shouldn't be String, but instead that it historically wasn't.

Why use symbols as hash keys in Ruby?

TL;DR:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Ruby Symbols are immutable (can't be changed), which makes looking something up much easier

Short(ish) answer:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.

  a = 'name'
a.object_id
=> 557720

b = 'name'
=> 557740

'name'.object_id
=> 1373460

'name'.object_id
=> 1373480 # !! different entity from the one above

# Ruby assumes any string can change at any point in time,
# therefore treating it as a separate entity

# versus:

:name.object_id
=> 71068

:name.object_id
=> 71068

# the symbol :name is a references to the same unique entity

Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).

If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.

If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)

Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter (e.g. Ruby can run out of memory and blow up if you generate too many symbols programmatically).

Notes:

If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.

If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.

Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.

Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.

Long answers:

https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings

http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/

https://www.rubyguides.com/2016/01/ruby-mutability/

String values as hash keys, copy created

From Hash.[]= documentation:

key should not have its value changed while it is in use as a key (an
unfrozen String passed as a key will be duplicated and frozen).

Since by default, strings are not immutable in ruby, theoretically you can change them after you set them as keys in your hash. If you do that - your hash will become invalid, as it will not be able to find those keys properly.

Since string are ubiquitous and are often used by reference, this way Ruby protects its hashes from unexpected bugs, which are very hard to detect.



Related Topics



Leave a reply



Submit