What Are Symbols and How Do We Use Them

What are symbols and how do we use them?

The most basic usage of Symbols is well summarized with phrase: "A Symbol is a constant integer with human readable name" (by Wei Lie Sho).

If in C you type this:

#define USER 1
#define ADMIN 2
#define GUEST 3
[...]
user.type = ADMIN;

then in ruby you just use a Symbol:

user.type = :admin

So, a Symbol in ruby is just some value, in which the only important thing is the name, or put in other words: the value of the Symbol is its name.

The main difference between Symbols and Strings (because this is also correct code: user.type = "admin") is that Symbols are fast. The main difference between Symbols and integers (used as in this example) is that the Symbols are easily readable for the programmer, while integers are not.

Other Symbol properties are not crucial for their basic usage.


While there is some integer value associated with Symbols (for example: .object_id), you should not rely on it. In each run of your program the integer value of a given Symbol may be different. However, while your program runs, each (let's call it so) "instance" of the same Symbol will have the same integer value.

Unlike the integer constants (as in the C example) the Symbols cannot be sorted in Ruby 1.8 - they do not know whether one is greater than another.

So, you can match Symbols (for equality) as quick as integers are matched, but you cannot sort Symbols directly in Ruby 1.8. Of course, you can sort the String equivalents of Symbols:

if user.type == :admin # OK
if user.type > :guest # will throw an exception.

[:two, :one].sort # will throw an exception
[:two, :one].sort_by {|n| n.to_s} # => [:one, :two]

One of the important properties of Symbols is that once a Symbol is encountered in the program (typed in the source code or created 'on-the-fly'), its value is stored until the end of the program, and is not garbage-collected. That's the "Symbol-table" you mentioned.

If you create a lot of unique Symbols in your program (I talk about millions), then it is possible that your program will run out of memory.

So, a rule of thumb is: "Never convert any user supplied value to Symbols".

# A Rails-like example:
user.role = params["role"].to_sym # DANGEROUS!

I believe this set of information may be sufficient for using Symbols efficiently in Ruby.


Note that in Ruby 1.9 Symbols include Comparable, and so you can do things like

p :yay if :foo > :bar 

What is the use of symbols?

Ok, so the misunderstanding probably stems from this:

A symbol is not a variable, it is a value. like 9 is a value that is a number.

A symbol is a value that is kinda of roughly a string... it's just not a string that you can change... and because you can't change it, we can use a shortcut -> all symbols with the same name/value are stored in the same memory-spot (to save space).

You store the symbol into a variable, or use the value somewhere - eg as the key of a hash.... this last is probably one of the most common uses of a symbol.

you make a hash that contains key-value pairs eg:

thing_attrs = {:name => "My thing", :colour => "blue", :size => 6}
thing_attrs[:colour] # 'blue'

In this has - the symbols are the keys you can use any object as a key, but symbols are good to use as they use english words, and are thus easy to understand what you're storing/fetching... much better than, say numbers. Imagine you had:

thing_attrs = {0 => "My thing", 1 => "blue", 2 => 6}
thing_attrs[1] # => "blue"

It would be annoying and hard to remember that attribute 1 is the colour... it's much nicer to give names that you can read when you're reading the code. Thus we have two options: a string, or a symbol.

There would be very little difference between the two. A string is definitely usable eg:

thing_attrs = {"name" => "My thing", "colour" => "blue", "size" => 6}
thing_attrs["colour"] # 'blue'

except that as we know... symbols use less memory. Not a lot less, but enough less that in a large program, over time, you will notice it.
So it has become a ruby-standard to use symbols instead.

What exactly is a symbol in lisp/scheme?

In Scheme and Racket, a symbol is like an immutable string that happens to be interned so that symbols can be compared with eq? (fast, essentially pointer comparison). Symbols and strings are separate data types.

One use for symbols is lightweight enumerations. For example, one might say a direction is either 'north, 'south, 'east, or 'west. You could of course use strings for the same purpose, but it would be slightly less efficient. Using numbers would be a bad idea; represent information in as obvious and transparent a manner as possible.

For another example, SXML is a representation of XML using lists, symbols, and strings. In particular, strings represent character data and symbols represent element names. Thus the XML <em>hello world</em> would be represented by the value (list 'em "hello world"), which can be more compactly written '(em "hello world").

Another use for symbols is as keys. For example, you could implement a method table as a dictionary mapping symbols to implementation functions. To call a method, you look up the symbol that corresponds to the method name. Lisp/Scheme/Racket makes that really easy, because the language already has a built-in correspondence between identifiers (part of the language's syntax) and symbols (values in the language). That correspondence makes it easy to support macros, which implement user-defined syntactic extensions to the language. For example, one could implement a class system as a macro library, using the implicit correspondence between "method names" (a syntactic notion defined by the class system) and symbols:

(send obj meth arg1 arg2)
=>
(apply (lookup-method obj 'meth) obj (list arg1 arg2))

(In other Lisps, what I've said is mostly truish, but there are additional things to know about, like packages and function vs variable slots, IIRC.)

Practical examples of using symbols in Scala?

Do symbols really fit into Scala?

In the wonderful land of Lisp, code is represented as nested lists of literal objects that denote themselves (strings, numbers, and so on), and symbols, which are used as identifiers for things like classes, functions, and variables. As Lisp code has a very simple structure, Lisp allows the programmer to manipulate it (both at compile-time and run-time). Clearly, when doing this, the programmer will inevitably encounter symbols as data objects.

So symbols are (and need to be) objects in Lisp in any case, so why not use them as hash table keys or as enums as well? It's the natural way of doing things, and it keeps the language simple, as you don't have to define a special enumeration type.

To summarise, symbols are naturally used for code manipulation, enumeration, and keying. But Java people don't use identity as the equivalence relation between hash keys by default (which your traditional Lisp does), so they can just use strings as their keys. Enum types are defined separately in Scala. And finally, code as data isn't supported by the language at all.

So no, my impression is that symbols don't belong in the Scala language. That said, I'm going to keep an eye on the replies to this question. They might still demonstrate a genuine use of symbols in Scala that I can't think of right now.

(Addendum: Depending on the Lisp dialect, Lisp symbols may also be namespace-qualified, which is, of course, an immensely useful feature when manipulating code, and a feature that strings don't have.)

What is the motivation for bringing Symbols to ES6?

The original motivation for introducing symbols to Javascript was to enable private properties.

Unfortunately, they ended up being severely downgraded. They are no longer private, since you can find them via reflection, for example, using Object.getOwnPropertySymbols or proxies.

They are now known as unique symbols and their only intended use is to avoid name clashes between properties. For example, ECMAScript itself can now introduce extension hooks via certain methods that you can put on objects (e.g. to define their iteration protocol) without risking them to clash with user names.

Whether that is strong enough a motivation to add symbols to the language is debatable.

Scheme what are symbols good for and evaluate them

It's as if unquote was only available from within quote itself.

Yes, you can only do unquoting inside a quoted expression as that is where only unquoting make sense, i.e you are quoting an expression (not want to evaluate the items in the expression) but want a specific part of the expression to be evaluated, in that case you use unquote.

(define a 10)
`(a b ,a) ==> '(a b 10)

In the last expression i want a and b to be unevaluated but the 3rd element i.e a should be evaluated and be replaced with its value hence we use unquote for that.

Your code:

(define f (let ((x 1) (y 2)) 
(lambda (h) (eval h))))

The param for f has to be an unevaluated expression because the param h is evalued using eval.

hence you need to use (f 'a).

NOTE: (f 'y) will not work because the scope of y which was created using let is gone.

Purpose of Scala's Symbol?

Symbols are used where you have a closed set of identifiers that you want to be able to compare quickly. When you have two String instances they are not guaranteed to be interned[1], so to compare them you must often check their contents by comparing lengths and even checking character-by-character whether they are the same. With Symbol instances, comparisons are a simple eq check (i.e. == in Java), so they are constant time (i.e. O(1)) to look up.

This sort of structure tends to be used more in dynamic languages (notably Ruby and Lisp code tends to make a lot of use of symbols) since in statically-typed languages one usually wants to restrict the set of items by type.

Having said that, if you have a key/value store where there are a restricted set of keys, where it is going to be unwieldy to use a static typed object, a Map[Symbol, Data]-style structure might well be good for you.

A note about String interning on Java (and hence Scala): Java Strings are interned in some cases anyway; in particular string literals are automatically interned, and you can call the intern() method on a String instance to return an interned copy. Not all Strings are interned, though, which means that the runtime still has to do the full check unless they are the same instance; interning makes comparing two equal interned strings faster, but does not improve the runtime of comparing different strings. Symbols benefit from being guaranteed to be interned, so in this case a single reference equality check is both sufficient to prove equality or inequality.

[1] Interning is a process whereby when you create an object, you check whether an equal one already exists, and use that one if it does. It means that if you have two objects which are equal, they are precisely the same object (i.e. they are reference equal). The downsides to this are that it can be costly to look up which object you need to be using, and allowing objects to be garbage collected can require complex implementation.

What is a symbol in Julia?

Symbols in Julia are the same as in Lisp, Scheme or Ruby. However, the answers to those related questions are not really satisfactory, in my opinion. If you read those answers, it seems that the reason a symbol is different than a string is that strings are mutable while symbols are immutable, and symbols are also "interned" – whatever that means. Strings do happen to be mutable in Ruby and Lisp, but they aren't in Julia, and that difference is actually a red herring. The fact that symbols are interned – i.e. hashed by the language implementation for fast equality comparisons – is also an irrelevant implementation detail. You could have an implementation that doesn't intern symbols and the language would be exactly the same.

So what is a symbol, really? The answer lies in something that Julia and Lisp have in common – the ability to represent the language's code as a data structure in the language itself. Some people call this "homoiconicity" (Wikipedia), but others don't seem to think that alone is sufficient for a language to be homoiconic. But the terminology doesn't really matter. The point is that when a language can represent its own code, it needs a way to represent things like assignments, function calls, things that can be written as literal values, etc. It also needs a way to represent its own variables. I.e., you need a way to represent – as data – the foo on the left hand side of this:

foo == "foo"

Now we're getting to the heart of the matter: the difference between a symbol and a string is the difference between foo on the left hand side of that comparison and "foo" on the right hand side. On the left, foo is an identifier and it evaluates to the value bound to the variable foo in the current scope. On the right, "foo" is a string literal and it evaluates to the string value "foo". A symbol in both Lisp and Julia is how you represent a variable as data. A string just represents itself. You can see the difference by applying eval to them:

julia> eval(:foo)
ERROR: foo not defined

julia> foo = "hello"
"hello"

julia> eval(:foo)
"hello"

julia> eval("foo")
"foo"

What the symbol :foo evaluates to depends on what – if anything – the variable foo is bound to, whereas "foo" always just evaluates to "foo". If you want to construct expressions in Julia that use variables, then you're using symbols (whether you know it or not). For example:

julia> ex = :(foo = "bar")
:(foo = "bar")

julia> dump(ex)
Expr
head: Symbol =
args: Array{Any}((2,))
1: Symbol foo
2: String "bar"
typ: Any

What that dumped out stuff shows, among other things, is that there's a :foo symbol object inside of the expression object you get by quoting the code foo = "bar". Here's another example, constructing an expression with the symbol :foo stored in the variable sym:

julia> sym = :foo
:foo

julia> eval(sym)
"hello"

julia> ex = :($sym = "bar"; 1 + 2)
:(begin
foo = "bar"
1 + 2
end)

julia> eval(ex)
3

julia> foo
"bar"

If you try to do this when sym is bound to the string "foo", it won't work:

julia> sym = "foo"
"foo"

julia> ex = :($sym = "bar"; 1 + 2)
:(begin
"foo" = "bar"
1 + 2
end)

julia> eval(ex)
ERROR: syntax: invalid assignment location ""foo""

It's pretty clear to see why this won't work – if you tried to assign "foo" = "bar" by hand, it also won't work.

This is the essence of a symbol: a symbol is used to represent a variable in metaprogramming. Once you have symbols as a data type, of course, it becomes tempting to use them for other things, like as hash keys. But that's an incidental, opportunistic usage of a data type that has another primary purpose.

Note that I stopped talking about Ruby a while back. That's because Ruby isn't homoiconic: Ruby doesn't represent its expressions as Ruby objects. So Ruby's symbol type is kind of a vestigial organ – a leftover adaptation, inherited from Lisp, but no longer used for its original purpose. Ruby symbols have been co-opted for other purposes – as hash keys, to pull methods out of method tables – but symbols in Ruby are not used to represent variables.

As to why symbols are used in DataFrames rather than strings, it's because it's a common pattern in DataFrames to bind column values to variables inside of user-provided expressions. So it's natural for column names to be symbols, since symbols are exactly what you use to represent variables as data. Currently, you have to write df[:foo] to access the foo column, but in the future, you may be able to access it as df.foo instead. When that becomes possible, only columns whose names are valid identifiers will be accessible with this convenient syntax.

See also:

  • https://docs.julialang.org/en/v1/manual/metaprogramming/
  • In what sense are languages like Elixir and Julia homoiconic?

When to use symbols in Ruby

Symbols, or "internals" as they're also referred to as, are useful for hash keys, common arguments, and other places where the overhead of having many, many duplicate strings with the same value is inefficient.

For example:

params[:name]
my_function(with: { arguments: [ ... ] })
record.state = :completed

These are generally preferable to strings because they will be repeated frequently.

The most common uses are:

  • Hash keys
  • Arguments to methods
  • Option flags or enum-type property values

It's better to use strings when handling user data of an unknown composition. Unlike strings which can be garbage collected, symbols are permanent. Converting arbitrary user data to symbols may fill up the symbol table with junk and possibly crash your application if someone's being malicious.

For example:

user_data = JSON.load(...).symbolize_keys

This would allow an attacker to create JSON data with intentionally long, randomized names that, in time, would bloat your process with all kinds of useless junk.



Related Topics



Leave a reply



Submit