Can Ruby Fibers be Concurrent?
No, you cannot do concurrency with Fiber
s. Fiber
s simply aren't a concurrency construct, they are a control-flow construct, like Exception
s. That's the whole point of Fiber
s: they never run in parallel, they are cooperative and they are deterministic. Fiber
s are coroutines. (In fact, I never understood why they aren't simply called Coroutine
s.)
The only concurrency construct in Ruby is Thread
.
How does one achieve parallel tasks with Ruby's Fibers?
There is a lot of confusion regarding fibers in Ruby. Fibers are not a tool with which to implement concurrency; they are merely a way of organizing code in a way that may more clearly represent what is going on.
That the name 'fibers' is similar to 'threads' in my opinion contributes to the confusion.
If you want true concurrency, that is, distributing the CPU load across all available CPU's, you have the following options:
In MRI Ruby
Running multiple Ruby VM's (i.e. OS processes), using fork, etc. Even with multiple threads in Ruby, the GIL (Global Interpreter Lock) prevents the use of more than 1 CPU by the Ruby runtime.
In JRuby
Unlike MRI Ruby, JRuby will use multiple CPU's when assigning threads, so you can get truly concurrent processing.
If your code is spending most of its time waiting for external resources, then you may not have any need for this true concurrency. MRI threads or some kind of event handling loop will probably work fine for you.
How to use ruby fibers to avoid blocking IO
I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html
Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.
If you want to upload multiple files at once, simply launch a new Thread for each file:
t = Thread.new do
upload_file(param1, param2)
end
@all_threads << t
Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):
@all_threads.each do |t|
t.join if t.alive?
end
The Queue can either be a @member_variable or a $global.
Fibers vs. explicit enumerators
I would use Enumerator
, it allows you to use take
, take_while
, even each
if your sequence is finite. While Fiber
is designed for light weight concurrency and is pretty limited as enumerator.
prime_enum.take(ARGV[0].to_i).each { |x| puts x }
or
prime_enum.take_while { |x| x < ARGV[0].to_i }.each { |x| puts x }
Ruby concurrency, Revactor vs Process Forking
Revactor is "single threaded with fibers" (so just one fiber at a time). This is theoretically better than "multi threaded" since it does provide concurrency but just requires one thread, so it can scale to lots of "threads" (fibers).
How many Ruby Fibers can I use on Heroku?
Like you, I am currently experimenting with ruby and fibers to increase app performance. And from what I have read and think to remember, there should be almost no limit to the amount of fibers you can use.
I think to remember (please check that yourself), that each dyno can use up to 500 Mb of RAM. Each fiber adds a few kb (I think 2kb) to the RAM usage of your app. As long as your app does not use the full 500Mb, you should be fine even with 1000 fibers.
But you should hit a performance / concurrency boundary before, as your app still processes only one fiber at a time. In your case, it will depend on the external service.
Does ruby have real multithreading?
Updated with Jörg's Sept 2011 comment
You seem to be confusing two very different things here: the
Ruby Programming Language and the specific threading model of one
specific implementation of the Ruby Programming Language. There
are currently around 11 different implementations of the Ruby
Programming Language, with very different and unique threading
models.
(Unfortunately, only two of those 11 implementations are actually
ready for production use, but by the end of the year that number
will probably go up to four or five.) (Update: it's now 5: MRI, JRuby, YARV (the interpreter for Ruby 1.9), Rubinius and IronRuby).
The first implementation doesn't actually have a name, which
makes it quite awkward to refer to it and is really annoying and
confusing. It is most often referred to as "Ruby", which is even
more annoying and confusing than having no name, because it
leads to endless confusion between the features of the Ruby
Programming Language and a particular Ruby Implementation.It is also sometimes called "MRI" (for "Matz's Ruby
Implementation"), CRuby or MatzRuby.MRI implements Ruby Threads as Green Threads within its
interpreter. Unfortunately, it doesn't allow those threads
to be scheduled in parallel, they can only run one thread at a
time.However, any number of C Threads (POSIX Threads etc.) can run
in parallel to the Ruby Thread, so external C Libraries, or MRI
C Extensions that create threads of their own can still run in
parallel.The second implementation is YARV (short for "Yet
Another Ruby VM"). YARV implements Ruby Threads as POSIX or
Windows NT Threads, however, it uses a Global Interpreter
Lock (GIL) to ensure that only one Ruby Thread can actually be
scheduled at any one time.Like MRI, C Threads can actually run parallel to Ruby Threads.
In the future, it is possible, that the GIL might get broken
down into more fine-grained locks, thus allowing more and more
code to actually run in parallel, but that's so far away, it is
not even planned yet.JRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the JVM obviously means "JVM
Threads". JRuby imposes no additional locking on them. So,
whether those threads can actually run in parallel depends on
the JVM: some JVMs implement JVM Threads as OS Threads and some
as Green Threads. (The mainstream JVMs from Sun/Oracle use exclusively OS threads since JDK 1.3)XRuby also implements Ruby Threads as JVM Threads. Update: XRuby is dead.
IronRuby implements Ruby Threads as Native Threads,
where "Native Threads" in case of the CLR obviously means
"CLR Threads". IronRuby imposes no additional locking on them,
so, they should run in parallel, as long as your CLR supports
that.Ruby.NET also implements Ruby Threads as CLR
Threads. Update: Ruby.NET is dead.Rubinius implements Ruby Threads as Green Threads
within its Virtual Machine. More precisely: the Rubinius
VM exports a very lightweight, very flexible
concurrency/parallelism/non-local control-flow construct, called
a "Task", and all other concurrency constructs (Threads in
this discussion, but also Continuations, Actors and
other stuff) are implemented in pure Ruby, using Tasks.Rubinius can not (currently) schedule Threads in parallel,
however, adding that isn't too much of a problem: Rubinius can
already run several VM instances in several POSIX Threads in
parallel, within one Rubinius process. Since Threads are
actually implemented in Ruby, they can, like any other Ruby
object, be serialized and sent to a different VM in a different
POSIX Thread. (That's the same model the BEAM Erlang VM
uses for SMP concurrency. It is already implemented for
Rubinius Actors.)Update: The information about Rubinius in this answer is about the Shotgun VM, which doesn't exist anymore. The "new" C++ VM does not use green threads scheduled across multiple VMs (i.e. Erlang/BEAM style), it uses a more traditional single VM with multiple native OS threads model, just like the one employed by, say, the CLR, Mono, and pretty much every JVM.
MacRuby started out as a port of YARV on top of the
Objective-C Runtime and CoreFoundation and Cocoa Frameworks. It
has now significantly diverged from YARV, but AFAIK it currently
still shares the same Threading Model with YARV.
Update: MacRuby depends on apples garbage collector which is declared deprecated and will be removed in later versions of MacOSX, MacRuby is undead.Cardinal is a Ruby Implementation for the Parrot
Virtual Machine. It doesn't implement threads yet, however,
when it does, it will probably implement them as Parrot
Threads. Update: Cardinal seems very inactive/dead.MagLev is a Ruby Implementation for the GemStone/S
Smalltalk VM. I have no information what threading model
GemStone/S uses, what threading model MagLev uses or even if
threads are even implemented yet (probably not).HotRuby is not a full Ruby Implementation of its
own. It is an implementation of a YARV bytecode VM in
JavaScript. HotRuby doesn't support threads (yet?) and when it
does, they won't be able to run in parallel, because JavaScript
has no support for true parallelism. There is an ActionScript
version of HotRuby, however, and ActionScript might actually
support parallelism. Update: HotRuby is dead.
Unfortunately, only two of these 11 Ruby Implementations are
actually production-ready: MRI and JRuby.
So, if you want true parallel threads, JRuby is currently your
only choice – not that that's a bad one: JRuby is actually faster
than MRI, and arguably more stable.
Otherwise, the "classical" Ruby solution is to use processes
instead of threads for parallelism. The Ruby Core Library
contains the Process
module with the Process.fork
method which makes it dead easy to fork off another Ruby
process. Also, the Ruby Standard Library contains the
Distributed Ruby (dRuby / dRb) library, which allows Ruby
code to be trivially distributed across multiple processes, not
only on the same machine but also across the network.
Is access to ruby Array thread-safe?
but will Ruby actually guarantee thread safety in this case
Ruby does not have a defined memory model, so there are no guarantees of any kind.
YARV has a Giant VM Lock which prevents multiple Ruby threads from running at the same time, which gives some implicit guarantees, but this is a private, internal implementation detail of YARV. For example, TruffleRuby, JRuby, and Rubinius can run multiple Ruby threads in parallel.
Since there is no specification of what the behavior should be, any Ruby implementation is free to do whatever they want. Most commonly, Ruby implementors try to mimic the behavior of YARV, but even that is not well-defined. In YARV, data structures are generally not thread-safe, so if you want to mimic the behavior of YARV, do you make all your data structures not thread-safe? But in YARV, also multiple threads cannot run at the same time, so in a lot of cases, operations are implicitly thread-safe, so if you want to mimic YARV, should you make your data structures thread-safe?
Or, in order to mimic YARV, should you prevent multiple threads from running at the same time? But, being able to run multiple threads in parallel is actually one of the reasons why people choose, for example JRuby over YARV.
As you can see, this is very much not a trivial question.
The best solution is to verify the behavior of each Ruby implementation separately. Actually, that is the second best solution.
The best solution is to use something like the concurrent-ruby Gem where someone else has already done the work of verifying the behavior of each Ruby implementation for you. The concurrent-ruby maintainers have a close relationship with several Ruby implementations (Chris Seaton, one of the two lead maintainers of concurrent-ruby is also the lead developer of TruffleRuby, a JRuby core developer, and a member of ruby-core, for example), and so you can generally be certain that everything that is in concurrent-ruby is safe on all supported Ruby implementations (currently YARV, JRuby, and TruffleRuby).
Concurrent Ruby has a Concurrent::Array
class which is thread-safe. You can see how it is implemented here: https://github.com/ruby-concurrency/concurrent-ruby/blob/master/lib/concurrent-ruby/concurrent/array.rb As you can see, for YARV, Concurrent::Array
is actually the same as ::Array
, but for other implementations, more work is required.
The concurrent-ruby developers are also working on specifying Ruby's memory model, so that in the future, both programmers know what to expect and what not to expect, and implementors know what they are allowed to optimize and what they aren't.
Related Topics
Differencebetween Methods and Attributes in Ruby
Redirect the "Puts" Command Output to a Log File
Rails: Ensure Only One Boolean Field Is Set to True at a Time
Resque Multiple Workers in Development Mode
Best Way to Combine Fragment and Object Caching for Memcached and Rails
How to Delete Specific Characters from a String in Ruby
How to Write Specs for Code That Depends on Environment Variables
Capistrano & Bash: Ignore Command Exit Status
Vcrproxy: Record Phantomjs Ajax Calls with Vcr Inside Capybara
How to Get an Array with Column Names of a Table
Ruby: What Does the Asterisk in "P *1..10" Mean
Rails Console - Find Where Created at = Certain Day
Is Ruby a Scripting Language or an Interpreted Language
How to Access a Ruby Module Method
Better Way to Turn a Ruby Class into a Module Than Using Refinements
Contact Form in Ruby, Sinatra, and Haml
Ruby Koans: Why Convert List of Symbols to Strings
Specifying a Layout and a Template in a Standalone (Not Rails) Ruby App, Using Slim or Haml