Why are symbols not frozen strings?
This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)
So, at one point in 2006, matz implemented the Symbol
class as Symbol < String
. Then the Symbol
class was stripped down to remove any mutability. So a Symbol
was in fact a immutable String
.
However, it was reverted. The reason given was
Even though it is highly against DuckTyping, people tend to use case
on classes, and Symbol < String often cause serious problems.
So the answer to your question is still: a Symbol
is like a String
, but it isn't.
The problem isn't that a Symbol
shouldn't be String
, but instead that it historically wasn't.
How are Symbols faster than Strings in Hash lookups?
There's no obligation for hash
to be equivalent to object_id
. Those two things serve entirely different purposes. The point of hash
is to be as deterministic and yet random as possible so that the values you're inserting into your hash are evenly distributed. The point of object_id
is to define a unique object identifier, though there's no requirement that these be random or evenly distributed. In fact, randomizing them is counter-productive, that'd just make things slower for no reason.
The reason symbols tend to be faster is because the memory for them is allocated once (garbage collection issues aside) and recycled for all instances of the same symbol. Strings are not like that. They can be constructed in a multitude of ways, and even two strings that are byte-for-byte identical are likely to be different objects. In fact, it's safer to presume they are than otherwise unless you know for certain they're the same object.
Now when it comes to computing hash
, the value must be randomly different even if the string changes very little. Since the symbol can't change computing it can be optimized more. You could just compute a hash of the object_id
since that won't change, for example, while the string needs to take into account the content of itself, which is presumably dynamic.
Try benchmarking things:
require 'benchmark'
count = 100000000
Benchmark.bm do |bm|
bm.report('Symbol:') do
count.times { :symbol.hash }
end
bm.report('String:') do
count.times { "string".hash }
end
end
This gives me results like this:
user system total real
Symbol: 6.340000 0.020000 6.360000 ( 6.420563)
String: 11.380000 0.040000 11.420000 ( 11.454172)
Which in this most trivial case is easily 2x faster. Based on some basic testing the performance of the string code degrades O(N) as the strings get longer but the symbol times remain constant.
Why even use Strings in hashes if symbols exist
There are different tradeoffs,
- Symbols are best used for a bounded set of keys that are ideally limited to values found in the source code.
- String are best used for an unbounded set of keys that are taken from user input or other external sources, like for when processing unstructured JSON data.
Why?
Before Ruby 2.2 symbols are not garbage collected and dealing with an unbounded set of keys obviously leads to a memory leak. But even with garage collection there is still a significant cost of having to "intern" all string input to turn them into symbols. And it can thus be smartest to just use string keys if your code consumes strings from text files or the web anyway.
Symbols (storing strings in variables vs symbols)
[...] can't we just store a string inside a variable and freeze it?
Almost, recent versions of Ruby can optimize frozen strings:
'foo'.freeze.object_id #=> 70313275108080
'foo'.freeze.object_id #=> 70313275108080
But this optimization is limited. It works for string literals (as shown above), but it doesn't work if the string is frozen later on:
a = 'foo'
a.freeze
a.object_id #=> 70313275335500
b = 'foo'
b.freeze
b.object_id #=> 70313275274260
Unless you enable the frozen_string_literal
feature:
# frozen_string_literal: true
puts 'foo'.object_id
puts 'foo'.object_id
Output:
$ ruby test.rb
70185151269500
70185151269500
Or, from the command line:
$ ruby --enable-frozen-string-literal -e "puts 'foo'.object_id, 'foo'.object_id"
70102955495340
70102955495340
Why should I use a string and not a symbol when referencing object attributes?
Your instincts are right, IMHO.
Symbols are more appropriate than strings to represent the elements of an enumerated type because they are immutable. While it's true that they aren't garbage collected, unlike strings, there is always only one instance of any given symbol, so the impact is minimal for most state transition applications. And, while the performance difference is minimal as well for most applications, symbol comparison is much quicker than string comparison.
See also Enums in Ruby
Why use symbols as hash keys in Ruby?
TL;DR:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Ruby Symbols are immutable (can't be changed), which makes looking something up much easier
Short(ish) answer:
Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.
Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.
Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).
If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.
If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)
Downside:
Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released.
Symbols are never garbage-collected.
So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter.
Notes:
If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.
If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.
Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.
Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.
Long answers:
https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings
http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/
https://www.rubyguides.com/2016/01/ruby-mutability/
Why is it important to create a method as a symbol?
In Ruby, a symbol is just an immutable string:
"hello " + "world" #=> "hello world"
:hello_ + :world #=> NoMethodError: undefined method `+' for :hello:Symbol
Being immutable makes symbols a safe and reliable reference, for example:
Object.methods => [:new, :allocate, :superclass, #etc..]
If Ruby were to use strings here, users would be able to modify the strings, thus ruining future calls of Object.methods
. This could be fixed by making copies of the strings each time the method is called, but that would be a huge memory footprint.
In fact, since Ruby knows symbols are never going to be modified, it saves each symbol only once, no matter how many times you declare it:
"hello".object_id #=> 9504940
"hello".object_id #=> 9565300
:hello.object_id #=> 1167708
:hello.object_id #=> 1167708
This takes the memory-saving potential of symbols even further, allowing you to use symbol literals in your code anywhere and everywhere with little memory overhead.
So, the round-about answer to your question: symbols can't be modified, but they're safer and more memory efficient; therefore, you should use them whenever you have a string that you know shouldn't be modified.
Symbols are used as the keys to hashes because:
- You should never modify the key of a hash while it's in the hash.
- Hashes require literal referencing a lot, ie
my_hash[:test]
, so it's more memory-efficient to use symbols.
As for method references: you can't reference a method directly, ie send(my_method())
because can't tell the difference between passing the method in and executing it. Strings could have been used here, but since a method's name never changes once defined, it makes more sense to represent the name as a symbol.
Related Topics
Error Installing Any Ruby Version with Rvm on Osx
Ruby Koans: Why Convert List of Symbols to Strings
Rails 5.2.0 with Ruby 2.5.1 Console - 'Warning:' 'Already' Initialized Constant Fileutils::Version
Array Attribute for Ruby Model
Rails How to Switch Between Dev and Production Mode
Rails Applications' Life Cycle
Ruby: Explicit Scoping on a Class Definition
I Need to Generate Uuid for My Rails Application. What Are the Options(Gems) I Have
How to Open Files Relative to Home Directory
"Ago" Date/Time Functions in Ruby/Rails
Rails: Validating Min and Max Length of a String But Allowing It to Be Blank
Ruby Strftime: Month Without Leading Zero
Rails - Finding Intersections Between Multiple Arrays