Concurrent Requests with Mri Ruby

Concurrent requests with MRI Ruby

I invite you to read the series of Jesse Storimer's Nobody understands the GIL
It might help you understand better some MRI internals.

I have also found Pragmatic Concurrency with Ruby, which reads interesting. It has some examples of testing concurrently.

EDIT:
In addition I can recommend the article Removing config.threadsafe!
Might not be relevant for Rails 4, but it explains the configuration options, one of which you can use to allow concurrency.


Let's discuss the answer to your question.

You can have several threads (using MRI), even with Puma. The GIL ensures that only one thread is active at a time, that is the constraint that developers dub as restrictive (because of no real parallel execution). Bear in mind that GIL does not guarantee thread safety.
This does not mean that the other threads are not running, they are waiting for their turn. They can interleave (the articles can help understanding better).

Let me clear up some terms: worker process, thread.
A process runs in a separate memory space and can serve several threads.
Threads of the same process run in a shared memory space, which is that of their process. With threads we mean Ruby threads in this context, not CPU threads.

In regards to your question's configuration and the GitHub repo you shared, I think an appropriate configuration (I used Puma) is to set up 4 workers and 1 to 40 threads. The idea is that one worker serves one tab. Each tab sends up to 10 requests.

So let's get started:

I work on Ubuntu on a virtual machine. So first I enabled the 4 cores in my virtual machine's setting (and some other settings of which I thought it might help).
I could verify this on my machine. So I went with that.

Linux command --> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 69
Stepping: 1
CPU MHz: 2306.141
BogoMIPS: 4612.28
L1d cache: 32K
L1d cache: 32K
L2d cache: 6144K
NUMA node0 CPU(s): 0-3

I used your shared GitHub project and modified it slightly. I created a Puma configuration file named puma.rb (put it in the config directory) with the following content:

workers Integer(ENV['WEB_CONCURRENCY'] || 1)
threads_count = Integer(ENV['MAX_THREADS'] || 1)
threads 1, threads_count

preload_app!

rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
# Worker specific setup for Rails 4.1+
# See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
#ActiveRecord::Base.establish_connection
end

By default Puma is started with 1 worker and 1 thread. You can use environment variables to modify those parameters. I did so:

export MAX_THREADS=40
export WEB_CONCURRENCY=4

To start Puma with this configuration I typed

bundle exec puma -C config/puma.rb

in the Rails app directory.

I opened the browser with four tabs to call the app's URL.

The first request started around 15:45:05 and the last request was around 15h49:44. That is an elapsed time of 4 minutes and 39 seconds.
Also you can see the request's id's in non sorted order in the log file. (See below)

Each API call in the GitHub project sleeps for 15 seconds. We have four 4 tabs, each with 10 API calls. That makes a maximum elapsed time of 600 seconds, i.e. 10 minutes (in a strictly serial mode).

The ideal result in theory would be all in parallel and an elapsed time not far from 15 seconds, but I did not expect that at all.
I was not sure what to expect as a result exactly, but I was still positively surprised (considering that I ran on a virtual machine and MRI is restrained by the GIL and some other factors). The elapsed time of this test was less than half the maximum elapsed time (in strictly serial mode), we cut the result into less than half.

EDIT I read further about the Rack::Lock that wraps a mutex around each request (Third article above). I found the option
config.allow_concurrency = true to be a time saver. A little caveat
was to increase the connection pool (though the request do no query
the database had to be set accordingly); the number of maximum threads is
a good default. 40 in this case.

I tested the app with jRuby and the actual elapsed time was 2mins,
with allow_concurrency=true.

I tested the app with MRI and the actual elapsed time was 1min47s,
with allow_concurrency=true. This was a big surprise to me.
This really surprised me, because I expected MRI to be slower than JRuby. It was not. This makes me questioning the widespread discussion about the speed differences between MRI and JRuby.

Watching the responses on the different tabs are "more random" now. It happens that tab 3 or 4 completes before tab 1, which I requested first.

I think because you don't have race conditions the test seems to be
OK. However, I am not sure about the application wide consequences if
you set config.allow_concurrency=true in a real world application.

Feel free to check it out and let me know any feedback you readers might have.
I still have the clone on my machine. Let me know if you are interested.

To answer your questions in order:

  • I think your example is valid by result. For concurrency however, it is better to test with shared resources (as for example in the second article).
  • In regards to your statements, as mentioned in the beginning of this
    answer, MRI is multi-threaded, but restricted by GIL to one active
    thread at a time. This raises the question: With MRI isn't it better
    to test with more processes and less threads? I do not know really, a
    first guess would be rather no or not much of a difference. Maybe someone can shed light on this.
  • Your example is just fine I think. Just needed some slight
    modifications.

Appendix

Log file Rails app:

**config.allow_concurrency = false (by default)**
-> Ideally 1 worker per core, each worker servers up to 10 threads.

[3045] Puma starting in cluster mode...
[3045] * Version 2.11.2 (ruby 2.1.5-p273), codename: Intrepid Squirrel
[3045] * Min threads: 1, max threads: 40
[3045] * Environment: development
[3045] * Process workers: 4
[3045] * Preloading application
[3045] * Listening on tcp://0.0.0.0:3000
[3045] Use Ctrl-C to stop
[3045] - Worker 0 (pid: 3075) booted, phase: 0
[3045] - Worker 1 (pid: 3080) booted, phase: 0
[3045] - Worker 2 (pid: 3087) booted, phase: 0
[3045] - Worker 3 (pid: 3098) booted, phase: 0
Started GET "/assets/angular-ui-router/release/angular-ui-router.js?body=1" for 127.0.0.1 at 2015-05-11 15:45:05 +0800
...
...
...
Processing by ApplicationController#api_call as JSON
Parameters: {"t"=>"15?id=9"}
Completed 200 OK in 15002ms (Views: 0.2ms | ActiveRecord: 0.0ms)
[3075] 127.0.0.1 - - [11/May/2015:15:49:44 +0800] "GET /api_call.json?t=15?id=9 HTTP/1.1" 304 - 60.0230

**config.allow_concurrency = true**
-> Ideally 1 worker per core, each worker servers up to 10 threads.

[22802] Puma starting in cluster mode...
[22802] * Version 2.11.2 (ruby 2.2.0-p0), codename: Intrepid Squirrel
[22802] * Min threads: 1, max threads: 40
[22802] * Environment: development
[22802] * Process workers: 4
[22802] * Preloading application
[22802] * Listening on tcp://0.0.0.0:3000
[22802] Use Ctrl-C to stop
[22802] - Worker 0 (pid: 22832) booted, phase: 0
[22802] - Worker 1 (pid: 22835) booted, phase: 0
[22802] - Worker 3 (pid: 22852) booted, phase: 0
[22802] - Worker 2 (pid: 22843) booted, phase: 0
Started GET "/" for 127.0.0.1 at 2015-05-13 17:58:20 +0800
Processing by ApplicationController#index as HTML
Rendered application/index.html.erb within layouts/application (3.6ms)
Completed 200 OK in 216ms (Views: 200.0ms | ActiveRecord: 0.0ms)
[22832] 127.0.0.1 - - [13/May/2015:17:58:20 +0800] "GET / HTTP/1.1" 200 - 0.8190
...
...
...
Completed 200 OK in 15003ms (Views: 0.1ms | ActiveRecord: 0.0ms)
[22852] 127.0.0.1 - - [13/May/2015:18:00:07 +0800] "GET /api_call.json?t=15?id=10 HTTP/1.1" 304 - 15.0103

**config.allow_concurrency = true (by default)**
-> Ideally each thread serves a request.

Puma starting in single mode...
* Version 2.11.2 (jruby 2.2.2), codename: Intrepid Squirrel
* Min threads: 1, max threads: 40
* Environment: development
NOTE: ActiveRecord 4.2 is not (yet) fully supported by AR-JDBC, please help us finish 4.2 support - check http://bit.ly/jruby-42 for starters
* Listening on tcp://0.0.0.0:3000
Use Ctrl-C to stop
Started GET "/" for 127.0.0.1 at 2015-05-13 18:23:04 +0800
Processing by ApplicationController#index as HTML
Rendered application/index.html.erb within layouts/application (35.0ms)
...
...
...
Completed 200 OK in 15020ms (Views: 0.7ms | ActiveRecord: 0.0ms)
127.0.0.1 - - [13/May/2015:18:25:19 +0800] "GET /api_call.json?t=15?id=9 HTTP/1.1" 304 - 15.0640

How can I serve requests concurrently with Rails 4?

I invite you to read about the configuration options of config.threadsafe! in this article Removing config.threadsafe!
It will help you to understand better the options of config.threadsafe!, in particular to allow concurrency.

In Rails 4 config.threadsafe! is set by default.


Now to the answer

In Rails 4 requests are wrapped around a Mutex by the Rack::Lock middleware in DEV environments by default.

If you want to enable concurrency you can set config.allow_concurrency=true. This will disable the Rack::Lock middleware. I would not delete it as mentioned in another answer to your question; that looks like a hack to me.

Note: If you have config.cache_classes=true then an assignment to config.allow_concurrency (Rack::Lock request mutex) won't take
effect, concurrent requests are allowed by default. If you have
config.cache_classes=false, then you can set
config.allow_concurrency to either true or false. In DEV
environment you would want to have it like this

config.cache_classes=false
config.allow_concurrency=true

The statement: Which means that if config.cache_classes = false
(which it is by default in dev env) we can't have concurrent
requests.
is not correct.

Appendix

You can refer to this answer, which sets up an experiment testing concurrency using MRI and JRuby. The results are surprising. MRI was faster than JRuby.

The experiment with MRI concurrency is on GitHub.
The experiment only tests concurrent request. There are no race conditions in the controller. However, I think it is not too difficult to implement example from the article above to test race conditions in a controller.

Are there still benefits to running JRuby vs. the latest MRI with Puma?

Does the latest version of MRI negate the need to adopt JRuby to
achieve the same benefits that native threads give you?

The answer is no. It does not negate the need, and it depends on your application as mentioned in other answers.

Also, JRuby does not allow you to run in cluster mode, but that is not really a problem in regards to your question, because it is multithreaded and parallel.
Simply run in Single mode with as many threads as you need. It should be perfectly fine, if not even more lightweight.


Let me give you some references that give more insight and allow you to dig further.

This answer discusses experiments with MRI and JRuby testing concurrent requests using Puma (up to 40 threads). It is quite comprehensive.

The experiments are available on GitHub, MRI and JRuby.

The caveat is that it only tests concurrent requests, but does not have a race condition in the controller. However, I think you could implement the test from this article Removing config.threadsafe! without too much effort.

The difference between JRuby and MRI is that JRuby can execute code in parallel. MRI is limited by the GIL and only one thread at a time can be executed. You can read more information about the GIL in this article Nobody understands the GIL.

The results are quite surprising. MRI is faster than JRuby. Feel free to improve and add race conditions.

Note that both are multi-threaded and not thread safe. The difference really is that MRI cannot execute code in parallel and JRuby can.


You might be tempted to say why I answer "No" if the experiment shows that MRI is faster.

I think we need more experiments and in particular real world applications.

If you believe that JRuby should be faster because it can execute code in parallel then reasons could be:

  • The experiments should be executed in a highly parallel environment
    to be able leverage the potential of JRuby.
  • It could be the web server itself. Maybe Puma does not leverage the full potential of JRuby. MRI has a GIL, so why is it faster than JRuby in handling requests?
  • Other factors might be relevant that are more in depth and we did not discover yet...

WebService Concurrency on Rails

You're right, each Puma/Thin/Unicorn/Passenger/Webrick workers houses a single Rails app instance (or Sinatra app instance, etc) per Ruby process. So it's 1 web worker = 1 app instance = 1 Ruby process.

Each request blocks the process until the response is ready. So it's usually 1 request per process.

Ruby itself has the so called "GIL" (Global Interpreter Lock) which blocks execution of multiple threads because of C extensions lack of thread-safe controles such as mutexes and semaphores. It means that threads won't run concurrently. In practice, they "can". I/O operations can block execution waiting for a response. For example, reading a file or waiting response from a network socket. In this case, Ruby allow another thread to resume until the I/O operation of the previous thread finishes.

But Rails used to have a single block of execution per request as well, it's own lock. But in Rails 3, they added thread-safe controls through the Rails code to ensure it could run in JRuby for example. And in Rails 4 they decided to have the thread-safe controls on by default.

In theory this means that more than one request can run in parallel even in Ruby MRI (as it supports native threads since 1.9). In practice one request can run while another is waiting for a database process to return for example. So you should see a few more requests running in parallel. If your example is CPU bound (more internal processing than I/O blocks) the effect should be as if the requests are running one after the other. Now, if you have more I/O blocks (as waiting for a large SQL select to return), you should see it running more in parallel (not completely though).

You will see parallel requests more often if you use a virtual machine with not only native threads but no Global Interpreter Lock, such is the case of JRuby. So I recommend using JRuby with Puma.

Puma and Passenger are both multi-threaded. Unicorn is fork-based. Thin is Eventmachine based. I'd personally recommend testing Passenger as well.

  • http://tenderlovemaking.com/2012/06/18/removing-config-threadsafe.html

  • https://bearmetal.eu/theden/how-do-i-know-whether-my-rails-app-is-thread-safe-or-not/



Related Topics



Leave a reply



Submit