What Practical Effect Will Different Ruby Threading Models (Ruby VS Jruby) Have on Your Code as a Developer

Ruby threads not good enough?

"Normal" ruby (or mri) has a great big lock that prevents more than one thread from running ruby code at a time (known as the GIL or GVL).

Rubinius and jruby don't have this lock. In ruby 1.8.x the threads were green threads too, but as of ruby 1.9 ruby threads are mapped to native threads. The GVL stops you from gaining much benefit though.

Native extensions can run code outside of the lock so that, for example, multiple MySQL queries can run simultaneously from different threads but they can't call into the regular ruby api when they don't hold the lock

Which is faster: MRI Ruby or JRuby?

The answer depends on many variables.

But in general, Ruby 1.9 is faster than JRuby, but Ruby 1.8 is slower than JRuby.

e.g. according to the Computer Language Benchmarks Game:

  • Ruby 1.9 vs JRuby
  • Ruby 1.8 vs JRuby

Also, if your application is multi-threaded, JRuby may have some advantages over standard Ruby

(a.k.a. MRI)], depending on how many cores you have.

Does ruby have real multithreading?

Updated with Jörg's Sept 2011 comment

You seem to be confusing two very different things here: the
Ruby Programming Language and the specific threading model of one
specific implementation of the Ruby Programming Language. There
are currently around 11 different implementations of the Ruby
Programming Language, with very different and unique threading
models.

(Unfortunately, only two of those 11 implementations are actually
ready for production use, but by the end of the year that number
will probably go up to four or five.) (Update: it's now 5: MRI, JRuby, YARV (the interpreter for Ruby 1.9), Rubinius and IronRuby).

  1. The first implementation doesn't actually have a name, which
    makes it quite awkward to refer to it and is really annoying and
    confusing. It is most often referred to as "Ruby", which is even
    more annoying and confusing than having no name, because it
    leads to endless confusion between the features of the Ruby
    Programming Language and a particular Ruby Implementation.

    It is also sometimes called "MRI" (for "Matz's Ruby
    Implementation"), CRuby or MatzRuby.

    MRI implements Ruby Threads as Green Threads within its
    interpreter. Unfortunately, it doesn't allow those threads
    to be scheduled in parallel, they can only run one thread at a
    time.

    However, any number of C Threads (POSIX Threads etc.) can run
    in parallel to the Ruby Thread, so external C Libraries, or MRI
    C Extensions that create threads of their own can still run in
    parallel.

  2. The second implementation is YARV (short for "Yet
    Another Ruby VM"). YARV implements Ruby Threads as POSIX or
    Windows NT Threads, however, it uses a Global Interpreter
    Lock (GIL) to ensure that only one Ruby Thread can actually be
    scheduled at any one time.

    Like MRI, C Threads can actually run parallel to Ruby Threads.

    In the future, it is possible, that the GIL might get broken
    down into more fine-grained locks, thus allowing more and more
    code to actually run in parallel, but that's so far away, it is
    not even planned yet.

  3. JRuby implements Ruby Threads as Native Threads,
    where "Native Threads" in case of the JVM obviously means "JVM
    Threads". JRuby imposes no additional locking on them. So,
    whether those threads can actually run in parallel depends on
    the JVM: some JVMs implement JVM Threads as OS Threads and some
    as Green Threads. (The mainstream JVMs from Sun/Oracle use exclusively OS threads since JDK 1.3)

  4. XRuby also implements Ruby Threads as JVM Threads. Update: XRuby is dead.

  5. IronRuby implements Ruby Threads as Native Threads,
    where "Native Threads" in case of the CLR obviously means
    "CLR Threads". IronRuby imposes no additional locking on them,
    so, they should run in parallel, as long as your CLR supports
    that.

  6. Ruby.NET also implements Ruby Threads as CLR
    Threads. Update: Ruby.NET is dead.

  7. Rubinius implements Ruby Threads as Green Threads
    within its Virtual Machine. More precisely: the Rubinius
    VM exports a very lightweight, very flexible
    concurrency/parallelism/non-local control-flow construct, called
    a "Task", and all other concurrency constructs (Threads in
    this discussion, but also Continuations, Actors and
    other stuff) are implemented in pure Ruby, using Tasks.

    Rubinius can not (currently) schedule Threads in parallel,
    however, adding that isn't too much of a problem: Rubinius can
    already run several VM instances in several POSIX Threads in
    parallel, within one Rubinius process. Since Threads are
    actually implemented in Ruby, they can, like any other Ruby
    object, be serialized and sent to a different VM in a different
    POSIX Thread. (That's the same model the BEAM Erlang VM
    uses for SMP concurrency. It is already implemented for
    Rubinius Actors.)

    Update: The information about Rubinius in this answer is about the Shotgun VM, which doesn't exist anymore. The "new" C++ VM does not use green threads scheduled across multiple VMs (i.e. Erlang/BEAM style), it uses a more traditional single VM with multiple native OS threads model, just like the one employed by, say, the CLR, Mono, and pretty much every JVM.

  8. MacRuby started out as a port of YARV on top of the
    Objective-C Runtime and CoreFoundation and Cocoa Frameworks. It
    has now significantly diverged from YARV, but AFAIK it currently
    still shares the same Threading Model with YARV.
    Update: MacRuby depends on apples garbage collector which is declared deprecated and will be removed in later versions of MacOSX, MacRuby is undead.

  9. Cardinal is a Ruby Implementation for the Parrot
    Virtual Machine. It doesn't implement threads yet, however,
    when it does, it will probably implement them as Parrot
    Threads. Update: Cardinal seems very inactive/dead.

  10. MagLev is a Ruby Implementation for the GemStone/S
    Smalltalk VM. I have no information what threading model
    GemStone/S uses, what threading model MagLev uses or even if
    threads are even implemented yet (probably not).

  11. HotRuby is not a full Ruby Implementation of its
    own. It is an implementation of a YARV bytecode VM in
    JavaScript. HotRuby doesn't support threads (yet?) and when it
    does, they won't be able to run in parallel, because JavaScript
    has no support for true parallelism. There is an ActionScript
    version of HotRuby, however, and ActionScript might actually
    support parallelism. Update: HotRuby is dead.

Unfortunately, only two of these 11 Ruby Implementations are
actually production-ready: MRI and JRuby.

So, if you want true parallel threads, JRuby is currently your
only choice – not that that's a bad one: JRuby is actually faster
than MRI, and arguably more stable.

Otherwise, the "classical" Ruby solution is to use processes
instead of threads for parallelism. The Ruby Core Library
contains the Process module with the Process.fork
method which makes it dead easy to fork off another Ruby
process. Also, the Ruby Standard Library contains the
Distributed Ruby (dRuby / dRb) library, which allows Ruby
code to be trivially distributed across multiple processes, not
only on the same machine but also across the network.

How reliable is JRuby?

I am doing a lot of jruby work right now and can tell you that rails is certainly a viable option under the jruby interpreter. I've been pretty pleased and in my case have to use many native Java libraries, so jRuby is just such an awesome wrapper around that java code. I will say that I have had some technical challenges some that are worked out, some that are not yet.

  1. unit testing: spooling up the jvm and then spooling up rails takes much longer, so your tests take longer to run - solution, potentially something like nailgun that keeps the jvm running
  2. deployment: i have not gotten warbler to work for me, with my flavor of tomcat. this is a major issue for us.
  3. pick your libraries, while some c extensions work, they are not all equally compatible.

if you are interested I would highly reccomend the jruby book, in beta, from Pragmantic Programmers at http://pragprog.com

Why Module.methods() and respond_to? works differently in irb than in script?

This function - method on "main" object - is defined as private in ruby script.
You can check this easily:

ml = private_methods
def myfun; end
p private_methods - ml #=> [:myfun]
p respond_to?(:myfun, true) #=> true

If you call it explicitly on self you will get an error:

self.myfun
# NoMethodError: private method ‘myfun’ called for main:Object

On the other hand, in IRB your method is defined as public. Under the hood it looks something like this:

class Object
def irb_binding
# this is where your entered code is evaluated
def myfun; :ok; end # you can define methods in other methods
self.myfun # and they are public by default
end
end

p irb_binding # :ok

IRB could easily evaluate at the top level but instead it creates a separate environment with the method so that local variables are not shared:

require "irb"
foo = :ok
IRB.start
#>> foo
# NameError: undefined local variable or method `foo' for main:Object

I think methods being public is just a coincidence due to the implementation and it doesn't matter much. These methods are temporary anyway.

specifying test framework with newgem

I did it manually:

  • mkdir test
  • Add test/test_helper.rb and test/my_class_test.rb
  • Add the following to the Rakefile:

task :default => :test
desc "Run all Unit tests"
task :test do
require 'rake/runtest'
Rake.run_tests 'test/*.rb'
end

Confused, are languages like python, ruby single threaded? unlike say java? (for web apps)

Both Python and Ruby have full support for multi-threading. There are some implementations (e.g. CPython, MRI, YARV) which cannot actually run threads in parallel, but that's a limitation of those specific implementations, not the language. This is similar to Java, where there are also some implementations which cannot run threads in parallel, but that doesn't mean that Java is single-threaded.

Note that in both cases there are lots of implementations which can run threads in parallel: PyPy, IronPython, Jython, IronRuby and JRuby are only few of the examples.

The main difference between Clojure on the one side and Python, Ruby, Java, C#, C++, C, PHP and pretty much every other mainstream and not-so-mainstream language on the other side is that Clojure has a sane concurrency model. All the other languages use threads, which we have known to be a bad concurrency model for at least 40 years. Clojure OTOH has a sane update model which allows it to not only present one but actually multiple sane concurrency models to the programmer: atomic updates, software transactional memory, asynchronous agents, concurrency-aware thread-local global variables, futures, promises, dataflow concurrency and in the future possibly even more.

how to know what is NOT thread-safe in ruby?

None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).

MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like @n = 0; 3.times { Thread.start { 100.times { @n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.

Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?

In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. @n += 1 is not thread safe, because it is multiple operations. @n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). @n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless @started; @started = true, which is not thread safe at all.

I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); @x = x; end is not side-effect free.

One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:

class Thing
attr_reader :stuff

def initialize(initial_stuff)
@stuff = initial_stuff
@state_lock = Mutex.new
end

def add(item)
@state_lock.synchronize do
@stuff << item
end
end
end

A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.

Another classic Ruby example is this:

STANDARD_OPTIONS = {:color => 'red', :count => 10}

def find_stuff
@some_service.load_things('stuff', STANDARD_OPTIONS)
end

find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.

If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.

It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like @n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).

Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.



Related Topics



Leave a reply



Submit