How to Deploy a Threadsafe Asynchronous Rails App

Rack concurrency - rack.multithread, async.callback, or both?

Note: I use Thin as synonym for all web servers implementing the async Rack extension (i.e. Rainbows!, Ebb, future versions of Puma, ...)

Q1. Correct. It will wrap the response generation (aka call) in EventMachine.defer { ... }, which will cause EventMachine to push it onto its built-in thread pool.

Q2. Using async.callback in conjunction with EM.defer actually makes not too much sense, as it would basically use the thread-pool, too, ending up with a similar construct as described in Q1. Using async.callback makes sense when only using eventmachine libraries for IO. Thin will send the response to the client once env['async.callback'] is called with a normal Rack response as argument.

If the body is an EM::Deferrable, Thin will not close the connection until that deferrable succeeds. A rather well kept secret: If you want more than just long polling (i.e. keep the connection open after sending a partial response), you can also return an EM::Deferrable as body object directly without having to use throw :async or a status code of -1.

Q3. You're guessing correct. Threaded serving might improve the load on an otherwise unchanged Rack application. I see a 20% improve for simple Sinatra applications on my machine with Ruby 1.9.3, even more when running on Rubinius or JRuby, where all cores can be utilized. The second approach is useful if you write your application in an evented manner.

You can throw a lot of magic and hacks on top of Rack to have a non-evented application make use of those mechanisms (see em-synchrony or sinatra-synchrony), but that will leave you in debugging and dependency hell.

The async approach makes real sense with applications that tend to be best solved with an evented approach, like a web chat. However, I would not recommend using the threaded approach for implementing long-polling, because every polling connection will block a thread. This will leave you with either a ton of threads or connections you can't deal with. EM's thread pool has a size of 20 threads by default, limiting you to 20 waiting connections per process.

You could use a server that creates a new thread for every incoming connection, but creating threads is expensive (except on MacRuby, but I would not use MacRuby in any production app). Examples are serv and net-http-server. Ideally, what you want is an n:m mapping of requests and threads. But there's no server out there offering that.

If you want to learn more on the topic: I gave a presentation about this at Rocky Mountain Ruby (and a ton of other conferences). A video recording can be found on confreaks.

how to know what is NOT thread-safe in ruby?

None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).

MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like @n = 0; 3.times { Thread.start { 100.times { @n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.

Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?

In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. @n += 1 is not thread safe, because it is multiple operations. @n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). @n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless @started; @started = true, which is not thread safe at all.

I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); @x = x; end is not side-effect free.

One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:

class Thing
attr_reader :stuff

def initialize(initial_stuff)
@stuff = initial_stuff
@state_lock = Mutex.new
end

def add(item)
@state_lock.synchronize do
@stuff << item
end
end
end

A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.

Another classic Ruby example is this:

STANDARD_OPTIONS = {:color => 'red', :count => 10}

def find_stuff
@some_service.load_things('stuff', STANDARD_OPTIONS)
end

find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.

If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.

It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like @n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).

Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.

In what way is Ruby on Rails NOT multithreaded?

Question 1:
You can spawn more Ruby threads in one request if you want, although that seems to be outside the typical use case for Rails. There are uses for it for certain long-running IO or external operations.

Question 2:
The limiting factor for Ruby concurrency in general, not just with Rails, is the Global Interpreter Lock. This feature of Ruby prevents more than 1 thread of Ruby from executing at any given time per process. The lock is released whenever there is non-Ruby code executing, such as waiting for disk IO or SQL responses. You can get around this by using a different implementation of Ruby than the default, such as JRuby, but not all.

Phusion Passenger uses process based concurrency to handle a few requests concurrently, so, strictly speaking, is not "multithreaded," but is still concurrent.

This talk from Ruby MidWest 2011 has some good thoughts on getting multithreaded Ruby on Rails going.

Is there an Asynchronous Logging Library for Ruby?

I know you shouldn't really answer your own question, but it seems everything is easy in ruby:

require 'thread'
require 'singleton'
require 'delegate'
require 'monitor'

class Async
include Singleton

def initialize
@queue = Queue.new
Thread.new { loop { @queue.pop.call } }
end

def run(&blk)
@queue.push blk
end
end

class Work < Delegator
include MonitorMixin

def initialize(&work)
super work; @work, @done, @lock = work, false, new_cond
end

def calc
synchronize {
@result, @done = @work.call, true;
@lock.signal
}
end

def __getobj__
synchronize { @lock.wait_while { !@done } }
@result
end
end

Module.class.class_exec {
def async(*method_names)
method_names.each do |method_name|
original_method = instance_method(method_name)
define_method(method_name) do |*args,&blk|
work = Work.new { original_method.bind(self).call(*args,&blk) }
Async.instance.run { work.calc }
return work
end
end
end
}

And for my logging example:

require 'Logger'
class Logger
async :debug
end
log = Logger.new STDOUT
log.debug "heloo"

As return values work, you can use this for just about anything:

require "test/unit"
class ReturnValues < Test::Unit::TestCase
def do_it
5 + 7
end
async :do_it
def test_simple
assert_equal 10, do_it - 2
end
end


Related Topics



Leave a reply



Submit