How to Know What Is Not Thread-Safe in Ruby

how to know what is NOT thread-safe in ruby?

None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).

MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like @n = 0; 3.times { Thread.start { 100.times { @n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.

Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?

In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. @n += 1 is not thread safe, because it is multiple operations. @n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). @n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless @started; @started = true, which is not thread safe at all.

I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); @x = x; end is not side-effect free.

One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:

class Thing
  attr_reader :stuff

  def initialize(initial_stuff)
    @stuff = initial_stuff
    @state_lock = Mutex.new
  end

  def add(item)
    @state_lock.synchronize do
      @stuff << item
    end
  end
end

A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.

Another classic Ruby example is this:

STANDARD_OPTIONS = {:color => 'red', :count => 10}

def find_stuff
  @some_service.load_things('stuff', STANDARD_OPTIONS)
end

find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.

If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.

It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like @n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).

Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.

How can I check for non-threadsafe code?

There's no automatic way to do this. I would write a spec/test for each code block that accesses shared resources - global variables and resources that should be modified with an exclusive lock.

If you find yourself suspecting too much code then there are 2 options:

You're doing something wrong and leading to concurrency with no good reason. Refactor ASAP!
You're explicitly specifying in your tests that the code under test is designed and tested for concurrency.

Which is a win-win.

Ruby please give a simple NON-thread safe example

Here is an example which instead of find a race condition with addition, concatenation, or something like that, uses a blocking file write.

To summarize the parts:

file_write method performs a blocking write for 2 seconds.
file_read reads the file and assigns it to a global variable to be referenced elsewhere.
NonThreadsafe#test calls these methods in succession, in their own threads, without a mutex. sleep 0.2 is inserted between the calls to ensure that the blocking file write has begun by the time the read is attempted. join is called on the second thread, so we be sure it's set the read value to a global variable. It returns the read-value from the global variable.
Threadsafe#test does the same thing, but wraps each method call in a mutex.

Here it is:

module FileMethods
  def file_write(text)
    File.open("asd", "w") do |f|
      f.write text
      sleep 2
    end
  end
  def file_read
    $read_val = File.read "asd"
  end
end

class NonThreadsafe
  include FileMethods
  def test
    `rm asd`
    `touch asd`
    Thread.new { file_write("hello") }
    sleep 0.2
    Thread.new { file_read }.join
    $read_val
  end
end

class Threadsafe
  include FileMethods
  def test
    `rm asd`
    `touch asd`
    semaphore = Mutex.new
    Thread.new { semaphore.synchronize { file_write "hello" } }
    sleep 0.2
    Thread.new { semaphore.synchronize { file_read } }.join
    $read_val
  end
end

And tests:

expect(NonThreadsafe.new.test).to be_empty
expect(Threadsafe.new.test).to eq("hello")

As for an explanation. The reason the non-threadsafe shows the file's read val as empty is because the blocking writing operation is still happening when the read takes place. When you use synchronize the Mutex, though, the write will complete before the read. Note also that the .join in the threadsafe example takes longer than in the non-threadsafe value - that's because it's sleeping for the full duration specified in the write thread.

Is access to ruby Array thread-safe?

but will Ruby actually guarantee thread safety in this case

Ruby does not have a defined memory model, so there are no guarantees of any kind.

YARV has a Giant VM Lock which prevents multiple Ruby threads from running at the same time, which gives some implicit guarantees, but this is a private, internal implementation detail of YARV. For example, TruffleRuby, JRuby, and Rubinius can run multiple Ruby threads in parallel.

Since there is no specification of what the behavior should be, any Ruby implementation is free to do whatever they want. Most commonly, Ruby implementors try to mimic the behavior of YARV, but even that is not well-defined. In YARV, data structures are generally not thread-safe, so if you want to mimic the behavior of YARV, do you make all your data structures not thread-safe? But in YARV, also multiple threads cannot run at the same time, so in a lot of cases, operations are implicitly thread-safe, so if you want to mimic YARV, should you make your data structures thread-safe?

Or, in order to mimic YARV, should you prevent multiple threads from running at the same time? But, being able to run multiple threads in parallel is actually one of the reasons why people choose, for example JRuby over YARV.

As you can see, this is very much not a trivial question.

The best solution is to verify the behavior of each Ruby implementation separately. Actually, that is the second best solution.

The best solution is to use something like the concurrent-ruby Gem where someone else has already done the work of verifying the behavior of each Ruby implementation for you. The concurrent-ruby maintainers have a close relationship with several Ruby implementations (Chris Seaton, one of the two lead maintainers of concurrent-ruby is also the lead developer of TruffleRuby, a JRuby core developer, and a member of ruby-core, for example), and so you can generally be certain that everything that is in concurrent-ruby is safe on all supported Ruby implementations (currently YARV, JRuby, and TruffleRuby).

Concurrent Ruby has a Concurrent::Array class which is thread-safe. You can see how it is implemented here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb As you can see, for YARV, Concurrent::Array is actually the same as ::Array, but for other implementations, more work is required.

The concurrent-ruby developers are also working on specifying Ruby's memory model, so that in the future, both programmers know what to expect and what not to expect, and implementors know what they are allowed to optimize and what they aren't.

Thread-safety for hashes in Ruby

No, you cannot rely on Hashes being thread safe, because they aren't built to be thread safe, most probably for performance reasons. In order to overcome these limitations of the standard library, Gems have been created which provide thread safe (concurrent-ruby) or immutable (hamster) data structures. These will make accessing the data thread safe, but your code has a different problem in addition to that:

Your output will not be deterministic; in fact, I tried you code a few times and once I got 544988 as result. In your code, a classical race condition can occur because there are separate reading and writing steps involved (i.e. they are not atomic). Consider the expression h[0] ||= 0, which basically translates to h[0] || h[0] = 0. Now, it is easy to construct a case where a race condition occurs:

thread 1 reads h[0] and finds it is nil
thread 2 reads h[0] and finds it is nil
thread 1 sets h[0] = 0 and increments h[0] += 1
thread 2 sets h[0] = 0 and increments h[0] += 1
the resulting hash is {0=>1} although the correct result would be {0=>2}

If you want to make sure that your data will not be corrupted, you can lock the operation with a mutex:

require 'thread'
semaphore = Mutex.new

h = {}

10.times do
  Thread.start do
    semaphore.synchronize do
      100000.times {h[0] ||= 0; h[0] += 1;}
    end
  end
end

_{NOTE: An earlier version of this answer mentioned the 'thread_safe' gem. 'thread_safe' is deprecated since Feb 2017, becoming part of the 'concurrent-ruby' gem. Use that one instead.}

Is ||= in Ruby thread safe?

It depends on the implementation. Be aware that x ||= y expands to x || x = y, and that x = y is only executed if x is either false or nil.

With that said, the C implementation of the Ruby language should be completely thread safe.

YARV uses native threads in order to implement concurrency, which do run in true parallel. However, in order to maintain backward compatibility, a global, interpreter-wide lock was introduced.

JRuby, however, imposes no internal locking on your code, so you must manually synchronize your calls when needed.

See another answer I've given about the subject for more details. Also, read this excellent answer by Jörg W Mittag for a more in-depth look at the threading models of the various Ruby implementations.

Is Ruby Thread-Safe by default?

Huh!! Finally I found a way to prove, that it will not result 100000 always on irb.

Running following code gave me the idea,

100.times do
i = 0
1000.times do
Thread.start { 100.times { i += 1 } }
end
puts i
end

I see different values, most of the times. Mostly, it ranges from 91k to 100000.

In Ruby, why `while true do i += 1 end` is not thread safe?

According to this post, i += 1 is thread safe in MRI

Not quite. The blog post states that method invocations are effectively thread-safe in MRI.

The abbreviated assignment i += 1 is syntactic sugar for:

i = i + 1

So we have an assignment i = ... and a method call i + 1. According to the blog post, the latter is thread-safe. But it also says that a thread-switch can occur right before returning the method's result, i.e. before the result is re-assigned to i:

i = i + 1
#  ^
# here

Unfortunately this isn't easy do demonstrate from within Ruby.

We can however hook into Integer#+ and randomly ask the thread scheduler to pass control to another thread:

module Mayhem
  def +(other)
    Thread.pass if rand < 0.5
    super
  end
end

If MRI ensures thread-safety for the whole i += 1 statement, the above shouldn't have any effect. But it does:

Integer.prepend(Mayhem)

10.times do
  i = 0
  Array.new(10) { Thread.new { i += 1 } }.each(&:join)
  puts i
end

Output:

If you want thread-safe code, don't rely on implementation details (those can change). In the above example, you could wrap the sensitive part in a Mutex#synchronize call:

Integer.prepend(Mayhem)

m = Mutex.new

10.times do
  i = 0
  Array.new(10) { Thread.new { m.synchronize { i += 1 } } }.each(&:join)
  puts i
end

Output:

How to Know What Is Not Thread-Safe in Ruby