Why Do We Need Fibers

Why do we need fibers

Fibers are something you will probably never use directly in application-level code. They are a flow-control primitive which you can use to build other abstractions, which you then use in higher-level code.

Probably the #1 use of fibers in Ruby is to implement Enumerators, which are a core Ruby class in Ruby 1.9. These are incredibly useful.

In Ruby 1.9, if you call almost any iterator method on the core classes, without passing a block, it will return an Enumerator.

irb(main):001:0> [1,2,3].reverse_each
=> #<Enumerator: [1, 2, 3]:reverse_each>
irb(main):002:0> "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):003:0> 1.upto(10)
=> #<Enumerator: 1:upto(10)>

These Enumerators are Enumerable objects, and their each methods yield the elements which would have been yielded by the original iterator method, had it been called with a block. In the example I just gave, the Enumerator returned by reverse_each has a each method which yields 3,2,1. The Enumerator returned by chars yields "c","b","a" (and so on). BUT, unlike the original iterator method, the Enumerator can also return the elements one by one if you call next on it repeatedly:

irb(main):001:0> e = "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):002:0> e.next
=> "a"
irb(main):003:0> e.next
=> "b"
irb(main):004:0> e.next
=> "c"

You may have heard of "internal iterators" and "external iterators" (a good description of both is given in the "Gang of Four" Design Patterns book). The above example shows that Enumerators can be used to turn an internal iterator into an external one.

This is one way to make your own enumerators:

class SomeClass
  def an_iterator
    # note the 'return enum_for...' pattern; it's very useful
    # enum_for is an Object method
    # so even for iterators which don't return an Enumerator when called
    #   with no block, you can easily get one by calling 'enum_for'
    return enum_for(:an_iterator) if not block_given?
    yield 1
    yield 2
    yield 3
  end
end

Let's try it:

e = SomeClass.new.an_iterator
e.next  # => 1
e.next  # => 2
e.next  # => 3

Wait a minute... does anything seem strange there? You wrote the yield statements in an_iterator as straight-line code, but the Enumerator can run them one at a time. In between calls to next, the execution of an_iterator is "frozen". Each time you call next, it continues running down to the following yield statement, and then "freezes" again.

Can you guess how this is implemented? The Enumerator wraps the call to an_iterator in a fiber, and passes a block which suspends the fiber. So every time an_iterator yields to the block, the fiber which it is running on is suspended, and execution continues on the main thread. Next time you call next, it passes control to the fiber, the block returns, and an_iterator continues where it left off.

It would be instructive to think of what would be required to do this without fibers. EVERY class which wanted to provide both internal and external iterators would have to contain explicit code to keep track of state between calls to next. Each call to next would have to check that state, and update it before returning a value. With fibers, we can automatically convert any internal iterator to an external one.

This doesn't have to do with fibers persay, but let me mention one more thing you can do with Enumerators: they allow you to apply higher-order Enumerable methods to other iterators other than each. Think about it: normally all the Enumerable methods, including map, select, include?, inject, and so on, all work on the elements yielded by each. But what if an object has other iterators other than each?

irb(main):001:0> "Hello".chars.select { |c| c =~ /[A-Z]/ }
=> ["H"]
irb(main):002:0> "Hello".bytes.sort
=> [72, 101, 108, 108, 111]

Calling the iterator with no block returns an Enumerator, and then you can call other Enumerable methods on that.

Getting back to fibers, have you used the take method from Enumerable?

class InfiniteSeries
  include Enumerable
  def each
    i = 0
    loop { yield(i += 1) }
  end
end

If anything calls that each method, it looks like it should never return, right? Check this out:

InfiniteSeries.new.take(10) # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I don't know if this uses fibers under the hood, but it could. Fibers can be used to implement infinite lists and lazy evaluation of a series. For an example of some lazy methods defined with Enumerators, I have defined some here: https://github.com/alexdowad/showcase/blob/master/ruby-core/collections.rb

You can also build a general-purpose coroutine facility using fibers. I've never used coroutines in any of my programs yet, but it's a good concept to know.

I hope this gives you some idea of the possibilities. As I said at the beginning, fibers are a low-level flow-control primitive. They make it possible to maintain multiple control-flow "positions" within your program (like different "bookmarks" in the pages of a book) and switch between them as desired. Since arbitrary code can run in a fiber, you can call into 3rd-party code on a fiber, and then "freeze" it and continue doing something else when it calls back into code you control.

Imagine something like this: you are writing a server program which will service many clients. A complete interaction with a client involves going through a series of steps, but each connection is transient, and you have to remember state for each client between connections. (Sound like web programming?)

Rather than explicitly storing that state, and checking it each time a client connects (to see what the next "step" they have to do is), you could maintain a fiber for each client. After identifying the client, you would retrieve their fiber and re-start it. Then at the end of each connection, you would suspend the fiber and store it again. This way, you could write straight-line code to implement all the logic for a complete interaction, including all the steps (just as you naturally would if your program was made to run locally).

I'm sure there's many reasons why such a thing may not be practical (at least for now), but again I'm just trying to show you some of the possibilities. Who knows; once you get the concept, you may come up with some totally new application which no-one else has thought of yet!

Simple parallelism with Fibers?

Fibers by themselves will not let you achieve parallelism, at least not without using some sort of callback mechanism such as eventmachine framework.

What you wrote is simply trying to interleave synchronous execution among code blocks. The reason you do not get expected sequence is because while you did simulate the kick-off, you never resumed the fibers after yeilding.

You might find the following post useful, particularly the example at the end:

http://schmurfy.github.io/2011/09/25/on_fibers_and_threads.html

Another example showing fibers transferring control to each other:

https://gist.github.com/aprescott/971008

This should give you expected results:

#!/usr/bin/env ruby

require 'fiber'

f1 = Fiber.new do
    puts "Fiber1 starting @ #{Time.new}."
    Fiber.yield
    sleep 2
    puts "Fiber1 done @ #{Time.new}."
    1
end
f2 = Fiber.new do
    puts "Fiber2 starting @ #{Time.new}."
    Fiber.yield
    sleep 2
    puts "Fiber2 done @ #{Time.new}."
    2
end

puts "Waiting @ #{Time.new}."
r1 = f1.resume
puts "f1 back @ #{Time.new} - #{r1}."
r2 = f2.resume
puts "f2 back @ #{Time.new} - #{r2}."

# Resume right after the yield in the fiber block and 
# execute until it encounters another yield or the block ends.
puts "Resuming f1"
f1.resume 
puts "Resuming f2"
f2.resume

sleep 1
puts "Done @ #{Time.now}."

Output:

Waiting @ 2016-06-05 00:35:29 -0700.
Fiber1 starting @ 2016-06-05 00:35:29 -0700.
f1 back @ 2016-06-05 00:35:29 -0700 - .
Fiber2 starting @ 2016-06-05 00:35:29 -0700.
f2 back @ 2016-06-05 00:35:29 -0700 - .
Resuming f1
Fiber1 done @ 2016-06-05 00:35:31 -0700.
Resuming f2
Fiber2 done @ 2016-06-05 00:35:33 -0700.
Done @ 2016-06-05 00:35:34 -0700.

Fibers vs. explicit enumerators

I would use Enumerator, it allows you to use take, take_while, even each if your sequence is finite. While Fiber is designed for light weight concurrency and is pretty limited as enumerator.

prime_enum.take(ARGV[0].to_i).each { |x| puts x }

prime_enum.take_while { |x| x < ARGV[0].to_i }.each { |x| puts x }

Why does the implementation of user level thread (fiber) require a new allocated stack per fiber?

Suppose a fiber has a function that calls another function that calls another function that then causes that fiber to be switched for another fiber. When we resume this fiber, we need to ensure that all local variables are back the way they were when the fiber switched and we need to ensure that returning from this function goes back to the calling function.
Thus the following rules emerge:

While a fiber is running, it can change its stack.
When a fiber is resumed, the stack must be back where it was when the fiber was switched.

It immediately follows from these two rules that every fiber must have its own stack.

Why does Meteor use fibers rather than promises or async or something else?

Straight from the horse's mouth, lead Meteor developer Geoff Schmidt:

Meteor is focused on giving the best possible experience to the
application developer. We've had to make some seemingly unpopular or risky decisions to get there, but that has resulted in a set of tools
that are simpler, more powerful, and more fun to use. . . . it turns
out that these decisions are not nearly as risky or as unpopular as
some people might perceive. It would be better to say that they go
against the conventional wisdom in the node.js community. To take just
one example, the thread-per-request or process-per-request model is
very common in the larger software engineering community, whereas
node's continuation passing ("asynchronous") style is sometimes used
for chat servers and message busses but is almost never used for
business logic. I think that server-side JavaScript usage is going to
grow by multiple orders of magnitude in the next few years, and we're
going to have a massive influx of new developers. Most of the new code
that these developers write will be business logic, and they'll want
to write it with the straight-line control flow that they've used in
almost every other framework.

And to quote a great article about Fibers in Meteor:

Meteor abstracts Fibers with its APIs, allowing you to write your app
without callbacks. The best part is that you can write your code this
way and be completely oblivious to Fibers. It just works.

Fibers is the one of the best reasons Meteor is so popular. Since it
allows us to write Node.js apps without callbacks, it has attracted
many developers who hated Node.js for that reason.

In other words, you the developer can create Meteor apps without ever typing the word "Fiber". It all happens in the background. So most developers for most apps really have no reason to care "why Fibers" versus Promise or something else, because the developers aren't "using" any of those technologies directly anyway. The Meteor team could rewrite Meteor core under the hood to use Promises instead of Fibers and most apps should continue running just as before, oblivious to the change.

As for why in the Meteor core itself the core team preferred Fibers over Promises etc., from what I've read (and is hinted at in the Geoff Schmidt quote above) it's mostly their personal preference—i.e. their aversion to callbacks and code that is overly conscious of its asynchronous nature. They want the same callback-oblivious experience for themselves that they create for Meteor application developers.

What is the difference between run vs yield in node-fibers

Fibers are not new invention

Node fibers make it possible to suspend the running of any function by saving the state of the current executing environment in a platform dependent way at the lowest level (For example windows has a fiber concept, not widely used, more lightweight than a thread, not preemptive).

Other libraries simulate co-routines using language features

All other js libraries implement co-routine continuation by using callback functions, storing the execution state in scope variables. This means you either have callback pyramid, a promise chain, or async/await (I put decorated generators in the same bucket as async/await).

Fibers are also a possible implementation of co-routines. Fibers should be fast, and integrating them in your code does not require you to write in a different codestyle, or introducing new syntax. Execution contexts (stack, registers, etc...) which can be changed to and from at will, from your own code.

This cannot be done in pure JavaScript, node-fibers use native libraries to achieve this!

Node fibers restrict you so you don't block the event loop

The node-fibers specific concept is: the javascript event loop is outside of all fibers, thus your initial code runs without fibers too. If you have a fiber reference, you can pass the right to run to it by fiber.run();. When you are inside a fiber, you can give up the right to run by calling Fiber.yield(); (effectively suspending the currently running code), and the javascript event loop will continue. All builtin callbacks (setTimeout, Promise.then, event handlers, http request callbacks) will run in the javascript event loop, without a fiber.

See this example

const Fiber = require("fibers");

function findDataAsync(param, callback) {
  setTimeout(() => {
    callback(null, "Async returned data");
  }, 100);
}

function findData( param ) {
  const currentFiber = Fiber.current;
  var response = null;

  findDataAsync(param, function (err, data) {
    response = { err : err, data : data };
    currentFiber.run();
  });
  Fiber.yield();
  if (response.err) {
    throw response.err;
  } else {
    return response.data;
  }
}

function main() {
  console.log("Inside fiber started");
  console.log(findData());
  console.log("Inside fiber finished");
}

console.log("Outside fiber started");
Fiber(main).run();
console.log("Outside fiber finished");

This should output:

Outside fiber started
Inside fiber started
Outside fiber finished
Async returned data
Inside fiber finished

Notice that Outside fiber finished is logged immediately after the first yield in the fiber is called.

As you see, we had to start a fiber immediately to be able to yield. If you try to use fibers in a third party library, you have to make sure that the library does not "reset" your current execution context to the javascript event loop by calling setTimeout or issuing asynchronous http requests.

What is the difference between a thread and a fiber?

In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.

With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.

With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.

Can Ruby Fibers be Concurrent?

No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)

The only concurrency construct in Ruby is Thread.

Why Do We Need Fibers