How to Understand Symbols in Ruby

How to understand symbols in Ruby

Consider this:

x = :sym
y = :sym
(x.__id__ == y.__id__ ) && ( :sym.__id__ == x.__id__) # => true

x = "string"
y = "string"
(x.__id__ == y.__id__ ) || ( "string".__id__ == x.__id__) # => false

So, however you create a symbol object, as long as its contents are the same, it will refer to the same object in memory. This is not a problem because a symbol is an immutable object. Strings are mutable.


(In response to the comment below)

In the original article, the value is not being stored in a symbol, it is being stored in a hash. Consider this:

hash1 = { "string" => "value"}
hash2 = { "string" => "value"}

This creates six objects in the memory -- four string objects and two hash objects.

hash1 = { :symbol => "value"}
hash2 = { :symbol => "value"}

This only creates five objects in memory -- one symbol, two strings and two hash objects.

Understanding Ruby symbols

The value '/patients/:id' is a normal Ruby string, and although the :id part looks like a symbol, it is not.

When Rails parses the string, it uses the colon to identify a parameter name in the path. When it sets the parameter from receiveing a request like GET /patients/1, it does not attempt to alter the symbol value, but does something like the following

params[:id] = '1'

Note I'm not 100% certain that is doesn't just use the string "id" as the key here. But either way you can see it does not alter any symbol value, but just uses the name of the symbol so you know which key it will be stored under in the params Hash

The similarity between ':id' as part of the URL parameter definition and for when you use the Symbol literal :id might be confusing, but is a design choice shared used in Rack path handling engine so most Ruby web frameworks use the same style.

What is the use of symbols?

Ok, so the misunderstanding probably stems from this:

A symbol is not a variable, it is a value. like 9 is a value that is a number.

A symbol is a value that is kinda of roughly a string... it's just not a string that you can change... and because you can't change it, we can use a shortcut -> all symbols with the same name/value are stored in the same memory-spot (to save space).

You store the symbol into a variable, or use the value somewhere - eg as the key of a hash.... this last is probably one of the most common uses of a symbol.

you make a hash that contains key-value pairs eg:

thing_attrs = {:name => "My thing", :colour => "blue", :size => 6}
thing_attrs[:colour] # 'blue'

In this has - the symbols are the keys you can use any object as a key, but symbols are good to use as they use english words, and are thus easy to understand what you're storing/fetching... much better than, say numbers. Imagine you had:

thing_attrs = {0 => "My thing", 1 => "blue", 2 => 6}
thing_attrs[1] # => "blue"

It would be annoying and hard to remember that attribute 1 is the colour... it's much nicer to give names that you can read when you're reading the code. Thus we have two options: a string, or a symbol.

There would be very little difference between the two. A string is definitely usable eg:

thing_attrs = {"name" => "My thing", "colour" => "blue", "size" => 6}
thing_attrs["colour"] # 'blue'

except that as we know... symbols use less memory. Not a lot less, but enough less that in a large program, over time, you will notice it.
So it has become a ruby-standard to use symbols instead.

Using Ruby Symbols

In short, symbols are lightweight strings, but they also are immutable and non-garbage-collectable.

You should not use them as immutable strings in your data processing tasks (remember, once symbol is created, it can't be destroyed). You typically use symbols for naming things.

# typical use cases

# access hash value
user = User.find(params[:id])

# name something
attr_accessor :first_name

# set hash value in opts parameter
db.collection.update(query, update, multi: true, upsert: true)

Let's take first example, params[:id]. In a moderately big rails app there may be hundreds/thousands of those scattered around the codebase. If we accessed that value with a string, params["id"], that means new string allocation each time (and that string needs to be collected afterwards). In case of symbol, it's actually the same symbol everywhere. Less work for memory allocator, garbage collector and even you (: is faster to type than "")

If you have a simple one-word string that appears often in your code and you don't do something funky to it (interpolation, gsub, upcase, etc), then it's likely a good candidate to be a symbol.

However, does this apply only to text that is used as part of the actual program logic such as naming, not text that you get while actually running the program...such as text from the user/web etc?

I can not think of a single case where I'd want to turn data from user/web to symbol (except for parsing command-line options, maybe). Mainly because of the consequences (once created symbols live forever).

Also, many editors provide different coloring for symbols, to highlight them in the code. Take a look at this example

symbol vs string

Need explanation of some Ruby syntax

  1. The colon character (:) is the beginning of a syntax literal for a Ruby "Symbol":

    :abc.class # => Symbol
    "abc".to_sym # => :abc

    Symbols are like strings but they are "interned", meaning the Ruby interpreter only has a single copy of it in memory despite multiple possible references (whereas there can be many equivalent strings in memory at once).

  2. The 'validates' token in your example above is a class method (of something in the class hierarchy of the "Post class") that is being called with a symbol argument (:name) and a hash argument with a single key/value pair of :presence => true.

  3. The 'create_table' token is a method which is being called with a single argument (the symbol ":posts") and is given a block which takes a single argument "t" (do |t| ... end).

When to use symbols in Ruby

Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.

For example:

params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed

These are generally preferable to strings because they will be repeated frequently.

The most common uses are:

  • Hash keys
  • Arguments to methods
  • Option flags or enum-type property values

It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.

For example:

user_data = JSON.load(...).symbolize_keys

This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.

Understanding Ruby variables and symbols?

Variables starting with @ are instance variables, "properties" in other languages. Whereas 'classic' variables are local to the scope of their method/block, instance variables are local to a specific instance of an object, for example:

class Foo

def initialize(bar)
@bar = bar
end

def bar
@bar # the variable is specific to this instance
end

def buzz
buzz = 'buzz' # this variable is not accessible outside of this method
end

end

You may also see variables starting with @@, which are class variables, and are accessible by every instance of the class and shared with every instance of the subclass. Usage of those variables is usually discouraged, primarily because subclasses share the variable, which can cause a lot of mess.

In Ruby everything is an object, classes are objects (instances of class Class), so you can also have class instance variables:

class Foo

def self.bar
@bar #we are in class Foo's scope, which is an instance of class Class
end

def self.bar=(bar)
@bar = bar
end

def bar
@bar # Foo.new.bar != Foo.bar
end

end

What you call "variables with a colon" are not variables. They are a particular type of string, called a symbol, that is immutable and optimized for quick identification by the interpreter, in fact, those are stored internally as pointers, so that :this == :this is a very quick operation.

This property makes them good candidates for hash keys because they offer quick retrieval or for "flags" to pass to a method; Think of them as a sort of loose constant that "stands for" what they say. Their immutability is also dangerous: All symbols ever created never get garbage collected; It's easy to create a memory-leak by creating thousands of symbols, so use them wisely.

UPDATE since ruby 2.2 symbols may be garbage-collected in certain cases (when no reference is kept and no comparison is needed)

In Ruby, how to choose whether a symbol or string to be used in a given scenario?

a = :foo
b = :foo

a and b refer to the same object in memory (same identity)

a.object_id # => 898908
b.object_id # => 898908

Strings behave differently

a = 'foo'
b = 'foo'

a.object_id # => 70127643805220
b.object_id # => 70127643805200

So, you use strings to store data and perform manipulations on data (replace characters or whatnot) and you use symbols to name things (keys in a hash or something). Also see this answer for more use cases for symbol.



Related Topics



Leave a reply



Submit