Using Ruby Symbols

When to use symbols instead of strings in Ruby?

TL;DR

A simple rule of thumb is to use symbols every time you need internal identifiers. For Ruby < 2.2 only use symbols when they aren't generated dynamically, to avoid memory leaks.

Full answer

The only reason not to use them for identifiers that are generated dynamically is because of memory concerns.

This question is very common because many programming languages don't have symbols, only strings, and thus strings are also used as identifiers in your code. You should be worrying about what symbols are meant to be, not only when you should use symbols. Symbols are meant to be identifiers. If you follow this philosophy, chances are that you will do things right.

There are several differences between the implementation of symbols and strings. The most important thing about symbols is that they are immutable. This means that they will never have their value changed. Because of this, symbols are instantiated faster than strings and some operations like comparing two symbols is also faster.

The fact that a symbol is immutable allows Ruby to use the same object every time you reference the symbol, saving memory. So every time the interpreter reads :my_key it can take it from memory instead of instantiate it again. This is less expensive than initializing a new string every time.

You can get a list all symbols that are already instantiated with the command Symbol.all_symbols:

symbols_count = Symbol.all_symbols.count # all_symbols is an array with all 
# instantiated symbols.
a = :one
puts a.object_id
# prints 167778

a = :two
puts a.object_id
# prints 167858

a = :one
puts a.object_id
# prints 167778 again - the same object_id from the first time!

puts Symbol.all_symbols.count - symbols_count
# prints 2, the two objects we created.

For Ruby versions before 2.2, once a symbol is instantiated, this memory will never be free again. The only way to free the memory is restarting the application. So symbols are also a major cause of memory leaks when used incorrectly. The simplest way to generate a memory leak is using the method to_sym on user input data, since this data will always change, a new portion of the memory will be used forever in the software instance. Ruby 2.2 introduced the symbol garbage collector, which frees symbols generated dynamically, so the memory leaks generated by creating symbols dynamically it is not a concern any longer.

Answering your question:

Is it true I have to use a symbol instead of a string if there is at least two the same strings in my application or script?

If what you are looking for is an identifier to be used internally at your code, you should be using symbols. If you are printing output, you should go with strings, even if it appears more than once, even allocating two different objects in memory.

Here's the reasoning:

  1. Printing the symbols will be slower than printing strings because they are cast to strings.
  2. Having lots of different symbols will increase the overall memory usage of your application since they are never deallocated. And you are never using all strings from your code at the same time.

Use case by @AlanDert

@AlanDert: if I use many times something like %input{type: :checkbox} in haml code, what should I use as checkbox?

Me: Yes.

@AlanDert: But to print out a symbol on html page, it should be converted to string, shouldn't it? what's the point of using it then?

What is the type of an input? An identifier of the type of input you want to use or something you want to show to the user?

It is true that it will become HTML code at some point, but at the moment you are writing that line of your code, it is mean to be an identifier - it identifies what kind of input field you need. Thus, it is used over and over again in your code, and have always the same "string" of characters as the identifier and won't generate a memory leak.

That said, why don't we evaluate the data to see if strings are faster?

This is a simple benchmark I created for this:

require 'benchmark'
require 'haml'

str = Benchmark.measure do
10_000.times do
Haml::Engine.new('%input{type: "checkbox"}').render
end
end.total

sym = Benchmark.measure do
10_000.times do
Haml::Engine.new('%input{type: :checkbox}').render
end
end.total

puts "String: " + str.to_s
puts "Symbol: " + sym.to_s

Three outputs:

# first time
String: 5.14
Symbol: 5.07
#second
String: 5.29
Symbol: 5.050000000000001
#third
String: 4.7700000000000005
Symbol: 4.68

So using smbols is actually a bit faster than using strings. Why is that? It depends on the way HAML is implemented. I would need to hack a bit on HAML code to see, but if you keep using symbols in the concept of an identifier, your application will be faster and reliable. When questions strike, benchmark it and get your answers.

What is the use of symbols?

Ok, so the misunderstanding probably stems from this:

A symbol is not a variable, it is a value. like 9 is a value that is a number.

A symbol is a value that is kinda of roughly a string... it's just not a string that you can change... and because you can't change it, we can use a shortcut -> all symbols with the same name/value are stored in the same memory-spot (to save space).

You store the symbol into a variable, or use the value somewhere - eg as the key of a hash.... this last is probably one of the most common uses of a symbol.

you make a hash that contains key-value pairs eg:

thing_attrs = {:name => "My thing", :colour => "blue", :size => 6}
thing_attrs[:colour] # 'blue'

In this has - the symbols are the keys you can use any object as a key, but symbols are good to use as they use english words, and are thus easy to understand what you're storing/fetching... much better than, say numbers. Imagine you had:

thing_attrs = {0 => "My thing", 1 => "blue", 2 => 6}
thing_attrs[1] # => "blue"

It would be annoying and hard to remember that attribute 1 is the colour... it's much nicer to give names that you can read when you're reading the code. Thus we have two options: a string, or a symbol.

There would be very little difference between the two. A string is definitely usable eg:

thing_attrs = {"name" => "My thing", "colour" => "blue", "size" => 6}
thing_attrs["colour"] # 'blue'

except that as we know... symbols use less memory. Not a lot less, but enough less that in a large program, over time, you will notice it.
So it has become a ruby-standard to use symbols instead.

When to use symbols in Ruby

Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.

For example:

params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed

These are generally preferable to strings because they will be repeated frequently.

The most common uses are:

  • Hash keys
  • Arguments to methods
  • Option flags or enum-type property values

It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.

For example:

user_data = JSON.load(...).symbolize_keys

This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.

Using Ruby Symbols

In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.

You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.

# typical use cases

# access hash value
user = User.find(params[:id])

# name something
attr_accessor :first_name

# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)

Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")

If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.

However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?

I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).

Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example

symbol vs string

In Ruby, how to choose whether a symbol or string to be used in a given scenario?

a = :foo
b = :foo

a and b refer to the same object in memory (same identity)

a.object_id # => 898908
b.object_id # => 898908

Strings behave differently

a = 'foo'
b = 'foo'

a.object_id # => 70127643805220
b.object_id # => 70127643805200

So, you use strings to store data and perform manipulations on data (replace characters or whatnot) and you use symbols to name things (keys in a hash or something). Also see this answer for more use cases for symbol.

Ruby Syntax, using numbers in symbols?

If you want to start a symbol with a digit, you need to enclose it in quotes:

:'2grok' => ['Hi']

If you use double quotes, ruby interpolates string inside:

:"#{1 + 1}grok"

Also, you can use percent-notation:

%s{2grok}

Finally, you can get the symbol by calling to_sym method on a String:

'2grok'.to_sym => ['Hi']

Understanding Ruby symbols

The value '/patients/:id' is a normal Ruby string, and although the :id part looks like a symbol, it is not.

When Rails parses the string, it uses the colon to identify a parameter name in the path. When it sets the parameter from receiveing a request like GET /patients/1, it does not attempt to alter the symbol value, but does something like the following

params[:id] = '1'

Note I'm not 100% certain that is doesn't just use the string "id" as the key here. But either way you can see it does not alter any symbol value, but just uses the name of the symbol so you know which key it will be stored under in the params Hash

The similarity between ':id' as part of the URL parameter definition and for when you use the Symbol literal :id might be confusing, but is a design choice shared used in Rack path handling engine so most Ruby web frameworks use the same style.

How to represent a value with symbol in ruby?

I think you are confused about the use of symbols by some of the conventions in Ruby and Rails.

Symbols are not variables. Variables are used to store values. Symbols are lighter weight versions of strings. They can be used in place of strings in places like hash keys.

hash1 = {'name' => 'Mary', 'age' => 30}
puts hash1['name']
#=> 'Mary'

hash2 = {:name => 'John', :age => 32}
puts hash2[:age]
#=> '32'

Ruby introduced a new hash notation to make things cleaner when using symbols for hash notation that eliminated the "hash rocket" =>

hash2 = {name: 'John', age: 32}

To take advantage of the conventions in Rails they came up with "hash with indifferent access". So it is a hash that allows you to use either a string or its symbol version interchangeably in a hash:

hash2 = = ActiveSupport::HashWithIndifferentAccess.new
hash2['name'] = 'John'
puts hash2[:name]
#=> 'John'
puts hash2['name']
#=> 'John'

In Rails you assign values to columns in a table using symbols:

p = Person.new(name: 'John', age: '32')

you then access them through method names that are string versions of the column name:

puts p.name
#=> 'John'

I think you are missing some real foundations of Ruby. I would study that more and then maybe do one of the tutorials that involves rebuilding Rails so you see how the syntax of Ruby relates to the conventions in Rails.

In ruby, symbols that cannot be replaced by strings?

Strings and Symbols in Ruby are never directly equal. The difference in class is important in more than one way, and

:my_label != "my_label"

However,

:my_label.to_s == "my_label"

A Ruby Symbol is more efficient than a String in a few ways, including:

  • A Symbol hashes and compares faster, which helps when using as hash keys.

  • Multiple uses of the same Symbol do not make copies of the internal data, but are just identical pointers to the same object in memory. This makes them memory efficient when you have a lot with the same value.

If a library, such as Selenium::WebDriver makes use of a Symbol as a parameter, then you cannot always replace it with an equivalent string. Whether or not you can treat it like that depends on the specific library. It is relatively easy to cast Symbols to Strings and vice-versa, so a lot of libraries will do that cast for you. It is very common to have library code that does param = param.to_s when it needs a String param.

Casting from String to Symbol is less commonly found in library code, because for a long while Ruby would not garbage-collect unreferenced Symbol objects - converting arbitrary String values to equivalent Symbol ones was a way of getting memory leaks (and a vector for an attacker to crash your program).



Related Topics



Leave a reply



Submit