Why don't more projects use Ruby Symbols instead of Strings?
In ruby, after creating the AST, each symbol is represented as a unique integer. Having symbols as hash keys makes the computing a lot faster, as the main operation is comparison.
When to use symbols instead of strings in Ruby?
TL;DR
A simple rule of thumb is to use symbols every time you need internal identifiers. For Ruby < 2.2 only use symbols when they aren't generated dynamically, to avoid memory leaks.
Full answer
The only reason not to use them for identifiers that are generated dynamically is because of memory concerns.
This question is very common because many programming languages don't have symbols, only strings, and thus strings are also used as identifiers in your code. You should be worrying about what symbols are meant to be, not only when you should use symbols. Symbols are meant to be identifiers. If you follow this philosophy, chances are that you will do things right.
There are several differences between the implementation of symbols and strings. The most important thing about symbols is that they are immutable. This means that they will never have their value changed. Because of this, symbols are instantiated faster than strings and some operations like comparing two symbols is also faster.
The fact that a symbol is immutable allows Ruby to use the same object every time you reference the symbol, saving memory. So every time the interpreter reads :my_key
it can take it from memory instead of instantiate it again. This is less expensive than initializing a new string every time.
You can get a list all symbols that are already instantiated with the command Symbol.all_symbols
:
symbols_count = Symbol.all_symbols.count # all_symbols is an array with all
# instantiated symbols.
a = :one
puts a.object_id
# prints 167778
a = :two
puts a.object_id
# prints 167858
a = :one
puts a.object_id
# prints 167778 again - the same object_id from the first time!
puts Symbol.all_symbols.count - symbols_count
# prints 2, the two objects we created.
For Ruby versions before 2.2, once a symbol is instantiated, this memory will never be free again. The only way to free the memory is restarting the application. So symbols are also a major cause of memory leaks when used incorrectly. The simplest way to generate a memory leak is using the method to_sym
on user input data, since this data will always change, a new portion of the memory will be used forever in the software instance. Ruby 2.2 introduced the symbol garbage collector, which frees symbols generated dynamically, so the memory leaks generated by creating symbols dynamically it is not a concern any longer.
Answering your question:
Is it true I have to use a symbol instead of a string if there is at least two the same strings in my application or script?
If what you are looking for is an identifier to be used internally at your code, you should be using symbols. If you are printing output, you should go with strings, even if it appears more than once, even allocating two different objects in memory.
Here's the reasoning:
- Printing the symbols will be slower than printing strings because they are cast to strings.
- Having lots of different symbols will increase the overall memory usage of your application since they are never deallocated. And you are never using all strings from your code at the same time.
Use case by @AlanDert
@AlanDert: if I use many times something like %input{type: :checkbox} in haml code, what should I use as checkbox?
Me: Yes.
@AlanDert: But to print out a symbol on html page, it should be converted to string, shouldn't it? what's the point of using it then?
What is the type of an input? An identifier of the type of input you want to use or something you want to show to the user?
It is true that it will become HTML code at some point, but at the moment you are writing that line of your code, it is mean to be an identifier - it identifies what kind of input field you need. Thus, it is used over and over again in your code, and have always the same "string" of characters as the identifier and won't generate a memory leak.
That said, why don't we evaluate the data to see if strings are faster?
This is a simple benchmark I created for this:
require 'benchmark'
require 'haml'
str = Benchmark.measure do
10_000.times do
Haml::Engine.new('%input{type: "checkbox"}').render
end
end.total
sym = Benchmark.measure do
10_000.times do
Haml::Engine.new('%input{type: :checkbox}').render
end
end.total
puts "String: " + str.to_s
puts "Symbol: " + sym.to_s
Three outputs:
# first time
String: 5.14
Symbol: 5.07
#second
String: 5.29
Symbol: 5.050000000000001
#third
String: 4.7700000000000005
Symbol: 4.68
So using smbols is actually a bit faster than using strings. Why is that? It depends on the way HAML is implemented. I would need to hack a bit on HAML code to see, but if you keep using symbols in the concept of an identifier, your application will be faster and reliable. When questions strike, benchmark it and get your answers.
Why should I use a string and not a symbol when referencing object attributes?
Your instincts are right, IMHO.
Symbols are more appropriate than strings to represent the elements of an enumerated type because they are immutable. While it's true that they aren't garbage collected, unlike strings, there is always only one instance of any given symbol, so the impact is minimal for most state transition applications. And, while the performance difference is minimal as well for most applications, symbol comparison is much quicker than string comparison.
See also Enums in Ruby
Ruby Symbols vs. Strings - Performance Loss by switching back and forth?
Unless you're seriously pushing the constraints of your server/system, the benefits or drawbacks of either method are going to be negligible.
When using a library that absolutely requires that you give it a string-keyed hash, it is obviously better to simply use strings, as it keeps your code clear and concise, and eliminates the need for you to cast the keys to strings.
Ruby aims to make programming more enjoyable for the developer, and makes no claim to be the most efficient. When I use Ruby, I take this to heart and use Symbols for the keys in my hashes simply because it makes them easier to read.
When it comes down to it, it's personal preference, and you won't notice a speed increase/decrease either way. Unless you're running into speed/memory constraint issues, you've got nothing to worry about. Other parts of the Ruby standard library will begin to fall apart before this becomes an issue.
"Premature optimization is the root of all evil" -- Donald Knuth
Why are symbols in Ruby not thought of as a type of variable?
Symbols used in accessor methods are not variables. They are just representing the name of a variable. Variables hold some reference, so you cannot use a variable itself in defining accessor methods. For example, suppose you wanted to define an accessor method for the variable @foo
in a context where its value is "bar"
. What would happen if Ruby's syntax were to be like this:
attr_accessor @foo
This would be no different from writing:
attr_accessor "bar"
where you have no access to the name @foo
that you are interested in. Therefore, such constructions have to be designed to refer to variable names at a meta level. Symbol is used for this reason. They are not variables themselves. They represent the name of a variable.
And the variable relevant to accessor methods are instance variables.
Ruby: Why symbols change to strings when using puts instead of print?
When you call puts
, what really gets called is the rb_io_puts
C function, which basically works like this:
- If there is no argument, output a newline.
- For each argument check if it's of type string (
T_STRING
in Ruby C lingo) and if yes, callrb_io_write
with it. Also, if the string was of length zero or didn't finish in a newline, add a\n
. - If the argument is an array, recursively call
io_puts_ary
on it. - In any other case, call
rb_obj_as_string
on the argument, which basically is the low-level equivalent ofto_s
.
So when you puts [:a, :b, :c]
, you'll hit the third case and io_puts_ary
will take over. Long story short this will do something similar as what I described above, and will call rb_obj_as_string
on each element and output it followed by a newline.
Why are symbols not frozen strings?
This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)
So, at one point in 2006, matz implemented the Symbol
class as Symbol < String
. Then the Symbol
class was stripped down to remove any mutability. So a Symbol
was in fact a immutable String
.
However, it was reverted. The reason given was
Even though it is highly against DuckTyping, people tend to use case
on classes, and Symbol < String often cause serious problems.
So the answer to your question is still: a Symbol
is like a String
, but it isn't.
The problem isn't that a Symbol
shouldn't be String
, but instead that it historically wasn't.
When to use symbols in Ruby
Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.
For example:
params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed
These are generally preferable to strings because they will be repeated frequently.
The most common uses are:
- Hash keys
- Arguments to methods
- Option flags or enum-type property values
It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.
For example:
user_data = JSON.load(...).symbolize_keys
This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.
In ruby, symbols that cannot be replaced by strings?
String
s and Symbol
s in Ruby are never directly equal. The difference in class is important in more than one way, and
:my_label != "my_label"
However,
:my_label.to_s == "my_label"
A Ruby Symbol
is more efficient than a String
in a few ways, including:
A
Symbol
hashes and compares faster, which helps when using as hash keys.Multiple uses of the same
Symbol
do not make copies of the internal data, but are just identical pointers to the same object in memory. This makes them memory efficient when you have a lot with the same value.
If a library, such as Selenium::WebDriver
makes use of a Symbol as a parameter, then you cannot always replace it with an equivalent string. Whether or not you can treat it like that depends on the specific library. It is relatively easy to cast Symbol
s to String
s and vice-versa, so a lot of libraries will do that cast for you. It is very common to have library code that does param = param.to_s
when it needs a String
param.
Casting from String
to Symbol
is less commonly found in library code, because for a long while Ruby would not garbage-collect unreferenced Symbol
objects - converting arbitrary String
values to equivalent Symbol
ones was a way of getting memory leaks (and a vector for an attacker to crash your program).
Related Topics
Make Rails Ignore Daylight Saving Time When Displaying a Date
Why Doesn't Relative_Require Work on Ruby 1.8.6
How to Inspect What Is the Default Value for Optional Parameter in Ruby's Method
Stop Loading Page Watir-Webdriver
Best Way to Handle Dynamic CSS in a Rails App
Paperclip Error: Model Missing Required Attr_Accessor for 'Avatar_File_Name'
How Does Inheritance Work in Ruby
How to List the Available Variables in an Ruby Erb Template
Dynamic Active Record Store Accessors Based Off a User Form
How to Stringize/Serialize Ruby Code
What Is the Purpose of 'Kernel'
No Such File to Load -- SQLite3/Sqlite3_Native
Couldn't Find User with Id=Sign_Out
How to Create Dynamic CSS in Rails