Why Are Symbols Not Frozen Strings

Why are symbols not frozen strings?

This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)

So, at one point in 2006, matz implemented the Symbol class as Symbol < String. Then the Symbol class was stripped down to remove any mutability. So a Symbol was in fact a immutable String.

However, it was reverted. The reason given was

Even though it is highly against DuckTyping, people tend to use case
on classes, and Symbol < String often cause serious problems.

So the answer to your question is still: a Symbol is like a String, but it isn't.
The problem isn't that a Symbol shouldn't be String, but instead that it historically wasn't.

How are Symbols faster than Strings in Hash lookups?

There's no obligation for hash to be equivalent to object_id. Those two things serve entirely different purposes. The point of hash is to be as deterministic and yet random as possible so that the values you're inserting into your hash are evenly distributed. The point of object_id is to define a unique object identifier, though there's no requirement that these be random or evenly distributed. In fact, randomizing them is counter-productive, that'd just make things slower for no reason.

The reason symbols tend to be faster is because the memory for them is allocated once (garbage collection issues aside) and recycled for all instances of the same symbol. Strings are not like that. They can be constructed in a multitude of ways, and even two strings that are byte-for-byte identical are likely to be different objects. In fact, it's safer to presume they are than otherwise unless you know for certain they're the same object.

Now when it comes to computing hash, the value must be randomly different even if the string changes very little. Since the symbol can't change computing it can be optimized more. You could just compute a hash of the object_id since that won't change, for example, while the string needs to take into account the content of itself, which is presumably dynamic.

Try benchmarking things:

require 'benchmark'

count = 100000000

Benchmark.bm do |bm|
bm.report('Symbol:') do
count.times { :symbol.hash }
end
bm.report('String:') do
count.times { "string".hash }
end
end

This gives me results like this:

       user     system      total        real
Symbol: 6.340000 0.020000 6.360000 ( 6.420563)
String: 11.380000 0.040000 11.420000 ( 11.454172)

Which in this most trivial case is easily 2x faster. Based on some basic testing the performance of the string code degrades O(N) as the strings get longer but the symbol times remain constant.

Why even use Strings in hashes if symbols exist

There are different tradeoffs,

  • Symbols are best used for a bounded set of keys that are ideally limited to values found in the source code.
  • String are best used for an unbounded set of keys that are taken from user input or other external sources, like for when processing unstructured JSON data.

Why?

Before Ruby 2.2 symbols are not garbage collected and dealing with an unbounded set of keys obviously leads to a memory leak. But even with garage collection there is still a significant cost of having to "intern" all string input to turn them into symbols. And it can thus be smartest to just use string keys if your code consumes strings from text files or the web anyway.

Symbols (storing strings in variables vs symbols)

[...] can't we just store a string inside a variable and freeze it?

Almost, recent versions of Ruby can optimize frozen strings:

'foo'.freeze.object_id #=> 70313275108080
'foo'.freeze.object_id #=> 70313275108080

But this optimization is limited. It works for string literals (as shown above), but it doesn't work if the string is frozen later on:

a = 'foo'
a.freeze
a.object_id #=> 70313275335500

b = 'foo'
b.freeze
b.object_id #=> 70313275274260

Unless you enable the frozen_string_literal feature:

# frozen_string_literal: true

puts 'foo'.object_id
puts 'foo'.object_id

Output:

$ ruby test.rb
70185151269500
70185151269500

Or, from the command line:

$ ruby --enable-frozen-string-literal -e "puts 'foo'.object_id, 'foo'.object_id"
70102955495340
70102955495340

Why should I use a string and not a symbol when referencing object attributes?

Your instincts are right, IMHO.

Symbols are more appropriate than strings to represent the elements of an enumerated type because they are immutable. While it's true that they aren't garbage collected, unlike strings, there is always only one instance of any given symbol, so the impact is minimal for most state transition applications. And, while the performance difference is minimal as well for most applications, symbol comparison is much quicker than string comparison.

See also Enums in Ruby

Why use symbols as hash keys in Ruby?

TL;DR:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Ruby Symbols are immutable (can't be changed), which makes looking something up much easier

Short(ish) answer:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.

Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).

If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.

If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)

Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter.

Notes:

If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.

If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.

Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.

Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.

Long answers:

https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings

http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/

https://www.rubyguides.com/2016/01/ruby-mutability/

Why is it important to create a method as a symbol?

In Ruby, a symbol is just an immutable string:

"hello " + "world" #=> "hello world"
:hello_ + :world #=> NoMethodError: undefined method `+' for :hello:Symbol

Being immutable makes symbols a safe and reliable reference, for example:

 Object.methods => [:new, :allocate, :superclass, #etc..]

If Ruby were to use strings here, users would be able to modify the strings, thus ruining future calls of Object.methods. This could be fixed by making copies of the strings each time the method is called, but that would be a huge memory footprint.

In fact, since Ruby knows symbols are never going to be modified, it saves each symbol only once, no matter how many times you declare it:

"hello".object_id #=> 9504940
"hello".object_id #=> 9565300

:hello.object_id #=> 1167708
:hello.object_id #=> 1167708

This takes the memory-saving potential of symbols even further, allowing you to use symbol literals in your code anywhere and everywhere with little memory overhead.

So, the round-about answer to your question: symbols can't be modified, but they're safer and more memory efficient; therefore, you should use them whenever you have a string that you know shouldn't be modified.

Symbols are used as the keys to hashes because:

  1. You should never modify the key of a hash while it's in the hash.
  2. Hashes require literal referencing a lot, ie my_hash[:test], so it's more memory-efficient to use symbols.

As for method references: you can't reference a method directly, ie send(my_method()) because can't tell the difference between passing the method in and executing it. Strings could have been used here, but since a method's name never changes once defined, it makes more sense to represent the name as a symbol.



Related Topics



Leave a reply



Submit