Is ||= in Ruby Thread Safe

Is ||= in Ruby thread safe?

It depends on the implementation. Be aware that x ||= y expands to x || x = y, and that x = y is only executed if x is either false or nil.

With that said, the C implementation of the Ruby language should be completely thread safe.

YARV uses native threads in order to implement concurrency, which do run in true parallel. However, in order to maintain backward compatibility, a global, interpreter-wide lock was introduced.

JRuby, however, imposes no internal locking on your code, so you must manually synchronize your calls when needed.

See another answer I've given about the subject for more details. Also, read this excellent answer by Jörg W Mittag for a more in-depth look at the threading models of the various Ruby implementations.

how to know what is NOT thread-safe in ruby?

None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).

MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like @n = 0; 3.times { Thread.start { 100.times { @n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.

Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?

In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. @n += 1 is not thread safe, because it is multiple operations. @n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). @n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless @started; @started = true, which is not thread safe at all.

I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); @x = x; end is not side-effect free.

One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:

class Thing
attr_reader :stuff

def initialize(initial_stuff)
@stuff = initial_stuff
@state_lock = Mutex.new
end

def add(item)
@state_lock.synchronize do
@stuff << item
end
end
end

A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.

Another classic Ruby example is this:

STANDARD_OPTIONS = {:color => 'red', :count => 10}

def find_stuff
@some_service.load_things('stuff', STANDARD_OPTIONS)
end

find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.

If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.

It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like @n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).

Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.

Ruby thread safe thread creation

You need a mutex. Essentially the only thing the GIL protects you from is accessing uninitialized memory. If something in Ruby could be well-defined without being atomic, you should not assume it is atomic.

A simple example to show that your example ordering is possible. I get the "double set" message every time I run it:

$global = nil
$thread = nil

threads = []

threads = Array.new(1000) do
Thread.new do
sleep 1
$thread ||= Thread.new do
if $global
warn "double set!"
else
$global = true
end
end
end
end

threads.each(&:join)

Thread-safety for hashes in Ruby

No, you cannot rely on Hashes being thread safe, because they aren't built to be thread safe, most probably for performance reasons. In order to overcome these limitations of the standard library, Gems have been created which provide thread safe (concurrent-ruby) or immutable (hamster) data structures. These will make accessing the data thread safe, but your code has a different problem in addition to that:

Your output will not be deterministic; in fact, I tried you code a few times and once I got 544988 as result. In your code, a classical race condition can occur because there are separate reading and writing steps involved (i.e. they are not atomic). Consider the expression h[0] ||= 0, which basically translates to h[0] || h[0] = 0. Now, it is easy to construct a case where a race condition occurs:

  • thread 1 reads h[0] and finds it is nil
  • thread 2 reads h[0] and finds it is nil
  • thread 1 sets h[0] = 0 and increments h[0] += 1
  • thread 2 sets h[0] = 0 and increments h[0] += 1
  • the resulting hash is {0=>1} although the correct result would be {0=>2}

If you want to make sure that your data will not be corrupted, you can lock the operation with a mutex:

require 'thread'
semaphore = Mutex.new

h = {}

10.times do
Thread.start do
semaphore.synchronize do
100000.times {h[0] ||= 0; h[0] += 1;}
end
end
end

NOTE: An earlier version of this answer mentioned the 'thread_safe' gem. 'thread_safe' is deprecated since Feb 2017, becoming part of the 'concurrent-ruby' gem. Use that one instead.

Is access to ruby Array thread-safe?

but will Ruby actually guarantee thread safety in this case

Ruby does not have a defined memory model, so there are no guarantees of any kind.

YARV has a Giant VM Lock which prevents multiple Ruby threads from running at the same time, which gives some implicit guarantees, but this is a private, internal implementation detail of YARV. For example, TruffleRuby, JRuby, and Rubinius can run multiple Ruby threads in parallel.

Since there is no specification of what the behavior should be, any Ruby implementation is free to do whatever they want. Most commonly, Ruby implementors try to mimic the behavior of YARV, but even that is not well-defined. In YARV, data structures are generally not thread-safe, so if you want to mimic the behavior of YARV, do you make all your data structures not thread-safe? But in YARV, also multiple threads cannot run at the same time, so in a lot of cases, operations are implicitly thread-safe, so if you want to mimic YARV, should you make your data structures thread-safe?

Or, in order to mimic YARV, should you prevent multiple threads from running at the same time? But, being able to run multiple threads in parallel is actually one of the reasons why people choose, for example JRuby over YARV.

As you can see, this is very much not a trivial question.

The best solution is to verify the behavior of each Ruby implementation separately. Actually, that is the second best solution.

The best solution is to use something like the concurrent-ruby Gem where someone else has already done the work of verifying the behavior of each Ruby implementation for you. The concurrent-ruby maintainers have a close relationship with several Ruby implementations (Chris Seaton, one of the two lead maintainers of concurrent-ruby is also the lead developer of TruffleRuby, a JRuby core developer, and a member of ruby-core, for example), and so you can generally be certain that everything that is in concurrent-ruby is safe on all supported Ruby implementations (currently YARV, JRuby, and TruffleRuby).

Concurrent Ruby has a Concurrent::Array class which is thread-safe. You can see how it is implemented here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb As you can see, for YARV, Concurrent::Array is actually the same as ::Array, but for other implementations, more work is required.

The concurrent-ruby developers are also working on specifying Ruby's memory model, so that in the future, both programmers know what to expect and what not to expect, and implementors know what they are allowed to optimize and what they aren't.

Thread Safety: Class Variables in Ruby

Instance variables are not thread safe (and class variables are even less thread safe)

Example 2 and 3, both with instance variables, are equivalent, and they are NOT thread safe, like @VincentXie stated. However, here is a better example to demonstrate why they are not:

class Foo
def self.bar(message)
@bar ||= message
end
end

t1 = Thread.new do
puts "bar is #{Foo.bar('thread1')}"
end

t2 = Thread.new do
puts "bar is #{Foo.bar('thread2')}"
end

sleep 2

t1.join
t2.join

=> bar is thread1
=> bar is thread1

Because the instance variable is shared amongst all of the threads, like @VincentXie stated in his comment.

PS: Instance variables are sometimes referred to as "class instance variables", depending on the context in which they are used:

When self is a class, they are instance variables of classes(class
instance variables). When self is a object, they are instance
variables of objects(instance variables). - WindorC's answer to a question about this

Safety of Thread.current[] usage in rails

There is not an specific reason to stay away from thread-local variables, the main issues are:

  • it's harder to test them, as you will have to remember to set the thread-local variables when you're testing out code that uses it
  • classes that use thread locals will need knowledge that these objects are not available to them but inside a thread-local variable and this kind of indirection usually breaks the law of demeter
  • not cleaning up thread-locals might be an issue if your framework reuses threads (the thread-local variable would be already initiated and code that relies on ||= calls to initialize variables might fail

So, while it's not completely out of question to use, the best approach is not to use them, but from time to time you hit a wall where a thread local is going to be the simplest possible solution without changing quite a lot of code and you will have to compromise, have a less than perfect object oriented model with the thread local or changing quite a lot of code to do the same.

So, it's mostly a matter of thinking which is going to be the best solution for your case and if you're really going down the thread-local path, I'd surely advise you to do it with blocks that remember to clean up after they are done, like the following:

around_filter :do_with_current_user

def do_with_current_user
Thread.current[:current_user] = self.current_user
begin
yield
ensure
Thread.current[:current_user] = nil
end
end

This ensures the thread local variable is cleaned up before being used if this thread is recycled.



Related Topics



Leave a reply



Submit