Ruby - How to Thread Across Cores/Processors

Ruby - how to thread across cores / processors

CRuby has a global interpreter lock, so it cannot run threads in parallel. Jruby and some other implementations can do it, but CRuby will never run any kind of code in parallel. This means that, no matter how smart your OS is, it can never share the load.

This is different in threading in C++. pthreads create real OS threads, and the kernal's scheduler will run them on multiple cores at the same time. Technically Ruby uses pthreads as well, but the GIL prevents them from running in parallel.

Fork creates a new process, and your OS's scheduler will almost certainly be smart enough to run it on a separate core. If you need parallelism in Ruby, either use an implementation without a GIL, or use fork.

Do Ruby threads run on multiple cores?

First off, we have to clearly distinguish between "Ruby Threads" and "Ruby Threads as implemented by YARV". Ruby Threads make no guarantees how they are scheduled. They might be scheduled concurrently, they might not. They might be scheduled on multiple CPUs, they might not. They might be implemented as native platform threads, they might be implemented as green threads, they might be implemented as something else.

YARV implements Ruby Threads as native platform threads (e.g. pthreads on POSIX and Windows threads on Windows). However, unlike other Ruby implementations which use native platform threads (e.g. JRuby, IronRuby, Rubinius), YARV has a Giant VM Lock (GVL) which prevents two threads to enter the YARV bytecode interpreter at the same time. This makes it effectively impossible to run Ruby code in multiple threads at the same time.

Note however, that the GVL only protects the YARV interpreter and runtime. This means that, for example, multiple threads can execute C code at the same time, and at the same time as another thread executed Ruby code. It just means that no two threads can execute Ruby code at the same time on YARV.

Note also that in recent versions of YARV, the "Giant" VM Lock is becoming ever smaller. Sections of code are moved out from under the lock, and the lock itself is broken down in smaller, more fine-grained locks. That is a very long process, but it means that in the future more and more Ruby code will be able to run in parallel on YARV.

But, all of this has nothing to do with how the platform schedules the threads. Many platforms have some sort of heuristics for thread affinity to CPU cores, e.g they may try to schedule the same thread to the same core, under the assumption that its working set is still in that core's cache, or they may try to identify threads that operate on shared data, and schedule those threads to the same CPU and so on. Therefore, it is hard to impossible to predict how and where a thread will be scheduled.

Many platforms also provide a way to influence this CPU affinity, e.g. on Linux and Windows, you can set a thread to only be scheduled on one specific or a set of specific cores. However, YARV does not do that by default. (In fact, on some platforms influencing CPU affinity requires elevated privileges, so it would mean that YARV would have to run with elevated privileges, which is not a good idea.)

So, in short: yes, depending on the platform, the hardware, and the environment, YARV threads may and probably will be scheduled on different cores. But, they won't be able to take advantage of that fact, i.e. they won't be able to run faster than on a single core (at least when running Ruby code).

Ruby Multithreading timing versus cpu core count

To measure concurrency, you'll need to do some work. Here's an implementation which computes the Fibbinaci sequence (in an intentionally slow manner).

require 'thread'
require 'benchmark'

def fibbinaci(n=33)
return n if n <= 1
fibbinaci(n-1) + fibbinaci(n-2)
end

LOOPS = 5

Benchmark.bm do |x|
x.report("Single threaded") do
LOOPS.times { fibbinaci }
end

x.report("Multithreaded") do
LOOPS.times.map do
Thread.new { fibbinaci }
end.each(&:join)
end

x.report("Forked") do
LOOPS.times do
fork do
fibbinaci
end
end
Process.waitall
end unless RUBY_PLATFORM == "java"
end

This gives something like:

$ ruby fib.rb
user system total real
Single threaded 4.050000 0.000000 4.050000 ( 4.054188)
Multithreaded 4.100000 0.000000 4.100000 ( 4.114595)
Forked 0.000000 0.000000 4.000000 ( 2.054361)

This is expected - Ruby uses green threads, which means that a single Ruby process can't consume more than 1 CPU core at a time. My machine has 2 cores, so it runs roughly twice as fast when forking (permitting for forking overhead).

If I run this under JRuby, which does have native threads (ie, actual in-process concurrency) I get something like:

$ ruby fib.rb
user system total real
Single threaded 27.850000 0.100000 27.950000 ( 27.812978)
Multithreaded 27.870000 0.060000 27.930000 ( 14.355506)

The in-process threads do halve the runtime of the task (though, yikes, that appears to be one that JRuby is particularly bad at).

You might wonder why Ruby would offer threading if it can't use more than one core per process - it's because you can still do work across threads when waiting on IO! If you have an application that spends a lot of time talking to network sockets (eg making database queries) then you will see concurrency gains from multithreading by letting other threads work while you're blocking on a socket.

Multithreaded ruby program only uses 100% cpu

You failed to mention which Ruby implementation you are using. Not all Ruby implementations are capable of scheduling Ruby threads to multiple CPUs.

In particular:

  • MRI implements Ruby threads as green threads inside the interpreter and schedules them itself; it cannot schedule more than one thread at a time and it cannot schedule them to multiple CPUs
  • YARV implements Ruby threads as native OS threads (POSIX threads or Windows threads) and lets the OS schedule them, however it puts a Giant VM Lock (GVL) around them, so that only one Ruby thread can be running at any given time
  • Rubinius implements Ruby threads as native OS threads (POSIX threads or Windows threads) and lets the OS schedule them, however it puts a Global Interpreter Lock (GIL) around them, so that only one Ruby thread can be running at any given time; Rubinius 2.0 is going to have fine-grained locks so that multiple Ruby threads can run at any given time
  • JRuby implements Ruby threads as JVM threads and uses fine-grained locking so that multiple threads can be running; however, whether or not those threads are scheduled to multiple CPUs depends on the JVM being used, some allow this, some don't
  • IronRuby implements Ruby threads as CLI threads and uses fine-grained locking so that multiple threads can be running; however, whether or not those threads are scheduled to multiple CPUs depends on the VES being used, some allow this, some don't
  • MacRuby implements Ruby threads as native OS threads and uses fine-grained locking so that multiple threads can be running on multiple CPUs at the same time

I don't know enough about Topaz, Cardinal, MagLev, MRuby and all the others.

Making ruby program run on all processors

Use JRuby and the peach gem, and it couldn't be easier. Just replace an .each with .peach and voila, you're executing in parallel. And there are additional options to control exactly how many threads are spawned, etc. I have used this and it works great.

You get close to n times speedup, where n is the number of CPUs/cores available. I find that the optimal number of threads is slightly more than the number of CPUs/cores.

Ruby: CPU-Load degradation of concurent/multithreaded task?

It's hard to say why exactly you aren't seeing the benefits of multithreading. But here's my guess.

Let's say you have a really intensive Ruby method that takes 10 seconds to run called do_work. And, even worse, you need to run this method 100 times. Rather than wait 1000 seconds, you might try to multithread it. That could divide the work among your CPU cores, halving or maybe even quartering the runtime:

Array.new(100) { Thread.new { do_work } }.each(&:join)

But no, this is probably still going to take 1000 seconds to finish. Why?

The Global VM Lock

Consider this example:

thread1 = Thread.new { class Foo; end; Foo.new }
thread2 = Thread.new { class Foo; end; Foo.new }

Creating a class in Ruby does a lot of stuff under the hood, for example it has to create an actual class object and assign that object's pointer to a global constant (in some order). What happens if thread1 registers that global constant, gets half way through creating the actual class object and then thread2 starts running, says "Oh, Foo already exists. Let's go ahead and run Foo.new". What happens since the class hasn't been fully defined? Or what if both thread1 and thread2 create a new class object and then both try to register their class as Foo? Which one wins? What about the class object that was created and now doesn't get registered?

The official Ruby solution for this is simple: don't actually run this code in parallel. Instead, there is one single, massive mutex called "the global VM lock" that protects anything that modifies the Ruby VM's state (such as making a class). So while the two threads above may be interleaved in various ways, it's impossible for the VM to end up in an invalid state because each VM operation is essentially atomic.

Example

This takes about 6 seconds to run on my laptop:

def do_work
Array.new(100000000) { |i| i * i }
end

This takes about 18 seconds, obviously

3.times { do_work }

But, this also takes about 18, because the GVL prevents the threads from actually running in parallel

Array.new(3) { Thread.new { do_work } }.each(&:join)

This also takes 6 seconds to run

def do_work2
sleep 6
end

But now this also takes about 6 seconds to run:

Array.new(3) { Thread.new { do_work2 } }.each(&:join)

Why? If you dig through the Ruby source code, you find that sleep ultimately calls the C function native_sleep and in there we see

GVL_UNLOCK_BEGIN(th);
{
//...
}
GVL_UNLOCK_END(th);

The Ruby devs know that sleep doesn't affect the VM state, so they explicitly unlocked the GVL to allow it to run in parallel. It can be tricky to figure out exactly what locks/unlocks the GVL and when you're going to see the performance benefit of it.

How to fix your code

My guess is that something in your code is hitting the GVL so while some parts of your threads are running in parallel (generally any subprocess/PTY stuff does), there's still contention between them in the Ruby VM causing some parts to serialize.

Your best bet with getting truly parallel Ruby code is to simplify it to something like this:

Array.new(x) { Thread.new { do_work } }

where you're sure that do_work is something simple that definitely unlocks the GVL, such as spawning a subprocess. You could try moving your Truecrypt code into a little shell script so that Ruby doesn't have to interact with it anymore once it gets going.

I recommend starting with a little benchmark that just starts a few subprocesses, and make sure that they are actually running in parallel by comparing the time to running them serially.

How does multithreading utilizes multiple cores?

Your confusion lies here:

[...] while one process is running under one CPU core.

[...] threads created by one process should run only under that specific process, which means that it should only run under that very one CPU core.

This is not true. I think what the various explanations you have read meant that any process have at least one thread (where a 'thread' is a sequence of instructions ran by a CPU core).

If you have a multithreaded program, the process will have several threads (sequences of instructions ran by a CPU core) that can run concurrently on different CPU cores.

There are many processes executing on your computer at any given time. The Operating System (OS) is the program that allocates the hardware resources (CPU cores) to all these processes and decides which process can use which cores for what amount of time before another process gets to use the CPU. Whether or not a process gets to use multiple cores is not entirely up to the process. More confusing still, multithreaded programs can use more threads than there are cores on the computer's CPU. In that case you can be certain that all your threads do not run in parallel.

One more thing:

[...] threads utilizes multiple cores and make the whole program executes more effective

I am going to sound very pedantic, but it is more complicated than that. It depends on what you mean by "effective". Are we talking about total computation time, energy consumption ..?

A sequential (1 thread) program may be very effective in terms of power consumption but taking a very long time to compute. If you are able to use multiple threads, you may be able to reduce that computation time but it will probably incur new costs (synchronization between threads, additional protection mechanisms against concurrent accesses ...).

Also, multithreading cannot help for certain tasks that fall outside of the CPU realm. For example, unless you have some very specific hardware support, reading a file from the hard-drive with 2 or more concurrent threads cannot be parallelized efficiently.

Ruby on Rails and multi-core CPUs

It's OS related.

When you run a single-threaded application on a multi-core CPU both cores can be affected because:

  • Your OS (Windows in your case) does its job at the same time.
  • Multiple other Processes (Threads) are requesting CPU time constantly (multiple times per seconds). If your OS and the processes have no thread affinity this will cause your threads to swap from one CPU to the other and vice-verse.

Using Ruby threads, but not seeing a speed up

There are many different implementations of Ruby. The most referred is MRI (see: other question.

MRI has threads, but unfortunately uses only one CPU core at a time. That means: Only one thread will actually run at the time.

If your thread had to wait for IO to happen, there may be a speed up. Because if one thread has to wait, another thread can catch up. But your problem need the CPU all the time.

I would suggest investigate another Ruby implementation like JRuby for this kind of problem. JRuby has real threads.

Perhaps you will have a greater speed up, if you change your implementation. In the moment you recalculate every max_length over and over again. For example: The sequence length for n = 4 will be 3. If you calculate the length for n = 8, you do one step (n / 2) and than have a current of 4 and you will already know that n = 4 has length = 3: Therefore length(8) = 1 + length(4) = 1 + 4 = 5. Example:

class CollatzSequence

def initialize
@lengths = Hash.new { |h, n| cache_length(h, n) }
end

def length(n)
@lengths[n]
end

private

def cache_length(h, n)
if n <= 1
h[n] = 1
else
next_in_seqence = n.even? ? (n / 2) : (n * 3 + 1)
h[n] = 1 + h[next_in_seqence]
end
end

end

require 'benchmark'
sequencer = CollatzSequence.new

Benchmark.bm(10) do |bm|
bm.report('not cached') { sequencer.length(837799) }
bm.report('cache hit 1') { sequencer.length(837799) }
bm.report('cache hit 2') { sequencer.length(837799 * 2) }
end

# user system total real
# not cached 0.000000 0.000000 0.000000 ( 0.001489)
# cache hit 1 0.000000 0.000000 0.000000 ( 0.000007)
# cache hit 2 0.000000 0.000000 0.000000 ( 0.000011)


Related Topics



Leave a reply



Submit