Does Ruby Have Real Multithreading

Does ruby have real multithreading?

Updated with Jörg's Sept 2011 comment

You seem to be confusing two very different things here: the
Ruby Programming Language and the specific threading model of one
specific implementation of the Ruby Programming Language. There
are currently around 11 different implementations of the Ruby
Programming Language, with very different and unique threading
models.

(Unfortunately, only two of those 11 implementations are actually
ready for production use, but by the end of the year that number
will probably go up to four or five.) (Update: it's now 5: MRI, JRuby, YARV (the interpreter for Ruby 1.9), Rubinius and IronRuby).

The first implementation doesn't actually have a name, which
makes it quite awkward to refer to it and is really annoying and
confusing. It is most often referred to as "Ruby", which is even
more annoying and confusing than having no name, because it
leads to endless confusion between the features of the Ruby
Programming Language and a particular Ruby Implementation.
It is also sometimes called "MRI" (for "Matz's Ruby
Implementation"), CRuby or MatzRuby.
MRI implements Ruby Threads as Green Threads within its
interpreter. Unfortunately, it doesn't allow those threads
to be scheduled in parallel, they can only run one thread at a
time.
However, any number of C Threads (POSIX Threads etc.) can run
in parallel to the Ruby Thread, so external C Libraries, or MRI
C Extensions that create threads of their own can still run in
parallel.
The second implementation is YARV (short for "Yet
Another Ruby VM"). YARV implements Ruby Threads as POSIX or
Windows NT Threads, however, it uses a Global Interpreter
Lock (GIL) to ensure that only one Ruby Thread can actually be
scheduled at any one time.
Like MRI, C Threads can actually run parallel to Ruby Threads.
In the future, it is possible, that the GIL might get broken
down into more fine-grained locks, thus allowing more and more
code to actually run in parallel, but that's so far away, it is
not even planned yet.
JRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the JVM obviously means "JVM
Threads". JRuby imposes no additional locking on them. So,
whether those threads can actually run in parallel depends on
the JVM: some JVMs implement JVM Threads as OS Threads and some
as Green Threads. (The mainstream JVMs from Sun/Oracle use exclusively OS threads since JDK 1.3)
XRuby also implements Ruby Threads as JVM Threads. Update: XRuby is dead.
IronRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the CLR obviously means
"CLR Threads". IronRuby imposes no additional locking on them,
so, they should run in parallel, as long as your CLR supports
that.
Ruby.NET also implements Ruby Threads as CLR
Threads. Update: Ruby.NET is dead.
Rubinius implements Ruby Threads as Green Threads
within its Virtual Machine. More precisely: the Rubinius
VM exports a very lightweight, very flexible
concurrency/parallelism/non-local control-flow construct, called
a "Task", and all other concurrency constructs (Threads in
this discussion, but also Continuations, Actors and
other stuff) are implemented in pure Ruby, using Tasks.
Rubinius can not (currently) schedule Threads in parallel,
however, adding that isn't too much of a problem: Rubinius can
already run several VM instances in several POSIX Threads in
parallel, within one Rubinius process. Since Threads are
actually implemented in Ruby, they can, like any other Ruby
object, be serialized and sent to a different VM in a different
POSIX Thread. (That's the same model the BEAM Erlang VM
uses for SMP concurrency. It is already implemented for
Rubinius Actors.)
Update: The information about Rubinius in this answer is about the Shotgun VM, which doesn't exist anymore. The "new" C++ VM does not use green threads scheduled across multiple VMs (i.e. Erlang/BEAM style), it uses a more traditional single VM with multiple native OS threads model, just like the one employed by, say, the CLR, Mono, and pretty much every JVM.
MacRuby started out as a port of YARV on top of the
Objective-C Runtime and CoreFoundation and Cocoa Frameworks. It
has now significantly diverged from YARV, but AFAIK it currently
still shares the same Threading Model with YARV.
Update: MacRuby depends on apples garbage collector which is declared deprecated and will be removed in later versions of MacOSX, MacRuby is undead.
Cardinal is a Ruby Implementation for the Parrot
Virtual Machine. It doesn't implement threads yet, however,
when it does, it will probably implement them as Parrot
Threads. Update: Cardinal seems very inactive/dead.
MagLev is a Ruby Implementation for the GemStone/S
Smalltalk VM. I have no information what threading model
GemStone/S uses, what threading model MagLev uses or even if
threads are even implemented yet (probably not).
HotRuby is not a full Ruby Implementation of its
own. It is an implementation of a YARV bytecode VM in
JavaScript. HotRuby doesn't support threads (yet?) and when it
does, they won't be able to run in parallel, because JavaScript
has no support for true parallelism. There is an ActionScript
version of HotRuby, however, and ActionScript might actually
support parallelism. Update: HotRuby is dead.

Unfortunately, only two of these 11 Ruby Implementations are
actually production-ready: MRI and JRuby.

So, if you want true parallel threads, JRuby is currently your
only choice – not that that's a bad one: JRuby is actually faster
than MRI, and arguably more stable.

Otherwise, the "classical" Ruby solution is to use processes
instead of threads for parallelism. The Ruby Core Library
contains the Process module with the Process.fork
method which makes it dead easy to fork off another Ruby
process. Also, the Ruby Standard Library contains the
Distributed Ruby (dRuby / dRb) library, which allows Ruby
code to be trivially distributed across multiple processes, not
only on the same machine but also across the network.

Multi Threading in Ruby

There are three bugs in your code:

First bug

Your wait calls use a timeout. This means your threads will become de-synchronized from your intended sequence, because the timeout will let each thread slip past your intended wait point.

Solution: change all your wait calls to NOT use a timeout:

@xxxxFlag.wait(@lock)

Second bug

You put your sequence trigger AFTER your Thread.join call in the end. Your join call will never return, and hence the last statement in your code will never be executed, and your thread sequence will never start.

Solution: change the order to signal the sequence start first, and then join the threads:

@firstFlag.signal
@threads.each {|t| t.join}

Third bug

The problem with a wait/signal construction is that it does not buffer the signals.
Therefore you have to ensure all threads are in their wait state before calling signal, otherwise you may encounter a race condition where a thread calls signal before another thread has called wait.

Solution: This a bit harder to solve, although it is possible to solve with Queue. But I propose a complete rethinking of your code instead. See below for the full solution.

Better solution

I think you need to rethink the whole construction, and instead of condition variables just use Queue for everything. Now the code becomes much less brittle, and because Queue itself is thread safe, you do not need any critical sections any more.

The advantage of Queue is that you can use it like a wait/signal construction, but it buffers the signals, which makes everything much simpler in this case.

Now we can rewrite the code:

redq    = Queue.new
yellowq = Queue.new
greenq  = Queue.new

Then each thread becomes like this:

@threads << Thread.new() {
  t = Random.rand(1..3)
  n = 0

  for i in 0...@n do
    redq.pop
    puts "red : #{t}s"
    sleep(t)
    yellowq.push(1)
  end
}

And finally to kick off the whole sequence:

redq.push(1)
@threads.each { |t| t.join }

Ruby Multithreading timing versus cpu core count

To measure concurrency, you'll need to do some work. Here's an implementation which computes the Fibbinaci sequence (in an intentionally slow manner).

require 'thread'
require 'benchmark'

def fibbinaci(n=33)
  return n if n <= 1
  fibbinaci(n-1) + fibbinaci(n-2)
end

LOOPS = 5

Benchmark.bm do |x|
  x.report("Single threaded") do
    LOOPS.times { fibbinaci }
  end

  x.report("Multithreaded") do
    LOOPS.times.map do
      Thread.new { fibbinaci }
    end.each(&:join)
  end

  x.report("Forked") do
    LOOPS.times do
      fork do
        fibbinaci
      end
    end
    Process.waitall
  end unless RUBY_PLATFORM == "java"
end

This gives something like:

$ ruby fib.rb
                 user       system     total    real
Single threaded  4.050000   0.000000   4.050000 (  4.054188)
Multithreaded    4.100000   0.000000   4.100000 (  4.114595)
Forked           0.000000   0.000000   4.000000 (  2.054361)

This is expected - Ruby uses green threads, which means that a single Ruby process can't consume more than 1 CPU core at a time. My machine has 2 cores, so it runs roughly twice as fast when forking (permitting for forking overhead).

If I run this under JRuby, which does have native threads (ie, actual in-process concurrency) I get something like:

$ ruby fib.rb
                user        system    total     real
Single threaded 27.850000   0.100000  27.950000 ( 27.812978)
Multithreaded   27.870000   0.060000  27.930000 ( 14.355506)

The in-process threads do halve the runtime of the task (though, yikes, that appears to be one that JRuby is particularly bad at).

You might wonder why Ruby would offer threading if it can't use more than one core per process - it's because you can still do work across threads when waiting on IO! If you have an application that spends a lot of time talking to network sockets (eg making database queries) then you will see concurrency gains from multithreading by letting other threads work while you're blocking on a socket.

Multithreading vs Background jobs in Rails

Sounds like you need a thread pool for performing the operation, and a database thread to commit the results.

You can build one of these really simply:

require 'thread'

db_queue = Queue.new

Thread.new do
  while (item = db_queue.pop)
    # ... Deal with item in queue
  end
end

# Example of supplying a job

db_queue.push(api_response)

# When finished
db_queue.push(nil)

Due to the Global Interpreter Lock in the standard Ruby runtime threads are only really useful for managing many lightly loaded threads. If you need something more heavy-duty, JRuby might be what you're looking for.

Confused, are languages like python, ruby single threaded? unlike say java? (for web apps)

Both Python and Ruby have full support for multi-threading. There are some implementations (e.g. CPython, MRI, YARV) which cannot actually run threads in parallel, but that's a limitation of those specific implementations, not the language. This is similar to Java, where there are also some implementations which cannot run threads in parallel, but that doesn't mean that Java is single-threaded.

Note that in both cases there are lots of implementations which can run threads in parallel: PyPy, IronPython, Jython, IronRuby and JRuby are only few of the examples.

The main difference between Clojure on the one side and Python, Ruby, Java, C#, C++, C, PHP and pretty much every other mainstream and not-so-mainstream language on the other side is that Clojure has a sane concurrency model. All the other languages use threads, which we have known to be a bad concurrency model for at least 40 years. Clojure OTOH has a sane update model which allows it to not only present one but actually multiple sane concurrency models to the programmer: atomic updates, software transactional memory, asynchronous agents, concurrency-aware thread-local global variables, futures, promises, dataflow concurrency and in the future possibly even more.

Ruby performance with multiple threads vs one thread

I think (but I'm not sure) the problem is that you are reading (using multiple threads) contents placed on the same disk, so all your threads can't run simultaneously because they wait for IO (disk).

Some days ago I had to do a similar thing (but fetching data from network) and the difference between sequential vs threads was huge.

A possible solution could be to load all file content instead of load it like you did in your code. In your code you read contents line by line. If you load all the content and then process it you should be able to perform much better (because threads should not wait for IO)

Ruby thread safe thread creation

You need a mutex. Essentially the only thing the GIL protects you from is accessing uninitialized memory. If something in Ruby could be well-defined without being atomic, you should not assume it is atomic.

A simple example to show that your example ordering is possible. I get the "double set" message every time I run it:

$global = nil
$thread = nil

threads = []

threads = Array.new(1000) do
  Thread.new do
    sleep 1
    $thread ||= Thread.new do
      if $global
        warn "double set!"
      else
        $global = true
      end
    end
  end
end

threads.each(&:join)

Can we run multi-threads in parallel in Ruby?

Not with MRI (only concurrency) but Yes with jRuby.

See this great article, there are plenty others on the subject but this one is pretty recent and provides good advice.

Python, Ruby, Haskell - Do they provide true multithreading?

1) Do Python, Ruby, or Haskell support true multithreading?

This has nothing to do with the language. It is a question of the hardware (if the machine only has 1 CPU, it is simply physically impossible to execute two instructions at the same time), the Operating System (again, if the OS doesn't support true multithreading, there is nothing you can do) and the language implementation / execution engine.

Unless the language specification explicitly forbids or enforces true multithreading, this has absolutely nothing whatsoever to do with the language.

All the languages that you mention, plus all the languages that have been mentioned in the answers so far, have multiple implementations, some of which support true multithreading, some don't, and some are built on top of other execution engines which might or might not support true multithreading.

Take Ruby, for example. Here are just some of its implementations and their threading models:

MRI: green threads, no true multithreading
YARV: OS threads, no true multithreading
Rubinius: OS threads, true multithreading
MacRuby: OS threads, true multithreading
JRuby, XRuby: JVM threads, depends on the JVM (if the JVM supports true multithreading, then JRuby/XRuby does, too, if the JVM doesn't, then there's nothing they can do about it)
IronRuby, Ruby.NET: just like JRuby, XRuby, but on the CLI instead of on the JVM

See also my answer to another similar question about Ruby. (Note that that answer is more than a year old, and some of it is no longer accurate. Rubinius, for example, uses truly concurrent native threads now, instead of truly concurrent green threads. Also, since then, several new Ruby implementations have emerged, such as BlueRuby, tinyrb, Ruby Go Lightly, Red Sun and SmallRuby.)

Similar for Python:

CPython: native threads, no true multithreading
PyPy: native threads, depends on the execution engine (PyPy can run natively, or on top of a JVM, or on top of a CLI, or on top of another Python execution engine. Whenever the underlying platform supports true multithreading, PyPy does, too.)
Unladen Swallow: native threads, currently no true multithreading, but fix is planned
Jython: JVM threads, see JRuby
IronPython: CLI threads, see IronRuby

For Haskell, at least the Glorious Glasgow Haskell Compiler supports true multithreading with native threads. I don't know about UHC, LHC, JHC, YHC, HUGS or all the others.

For Erlang, both BEAM and HiPE support true multithreading with green threads.

2) If a program contains threads, will a Virtual Machine automatically assign work to multiple cores (or to physical CPUs if there is more than 1 CPU on the mainboard)?

Again: this depends on the Virtual Machine, the Operating System and the hardware. Also, some of the implementations mentioned above, don't even have Virtual Machines.

Does Ruby Have Real Multithreading