What Is the Preferred Way of Performing Non Blocking I/O in Ruby

Is non-blocking I/O really faster than multi-threaded blocking I/O? How?

The biggest advantage of nonblocking or asynchronous I/O is that your thread can continue its work in parallel. Of course you can achieve this also using an additional thread. As you stated for best overall (system) performance I guess it would be better to use asynchronous I/O and not multiple threads (so reducing thread switching).

Let's look at possible implementations of a network server program that shall handle 1000 clients connected in parallel:

One thread per connection (can be blocking I/O, but can also be non-blocking I/O).
Each thread requires memory resources (also kernel memory!), that is a disadvantage. And every additional thread means more work for the scheduler.
One thread for all connections.
This takes load from the system because we have fewer threads. But it also prevents you from using the full performance of your machine, because you might end up driving one processor to 100% and letting all other processors idle around.
A few threads where each thread handles some of the connections.
This takes load from the system because there are fewer threads. And it can use all available processors. On Windows this approach is supported by Thread Pool API.

Of course having more threads is not per se a problem. As you might have recognized I chose quite a high number of connections/threads. I doubt that you'll see any difference between the three possible implementations if we are talking about only a dozen threads (this is also what Raymond Chen suggests on the MSDN blog post Does Windows have a limit of 2000 threads per process?).

On Windows using unbuffered file I/O means that writes must be of a size which is a multiple of the page size. I have not tested it, but it sounds like this could also affect write performance positively for buffered synchronous and asynchronous writes.

The steps 1 to 7 you describe give a good idea of how it works. On Windows the operating system will inform you about completion of an asynchronous I/O (WriteFile with OVERLAPPED structure) using an event or a callback. Callback functions will only be called for example when your code calls WaitForMultipleObjectsEx with bAlertable set to true.

Some more reading on the web:

Multiple Threads in the User Interface on MSDN, also shortly handling the cost of creating threads
Section Threads and Thread Pools says "Although threads are relatively easy to create and use, the operating system allocates a significant amount of time and other resources to manage them."
CreateThread documentation on MSDN says "However, your application will have better performance if you create one thread per processor and build queues of requests for which the application maintains the context information.".
Old article Why Too Many Threads Hurts Performance, and What to do About It

How to use ruby fibers to avoid blocking IO

I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html

Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.

If you want to upload multiple files at once, simply launch a new Thread for each file:

t = Thread.new do
  upload_file(param1, param2)
end
@all_threads << t

Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):

@all_threads.each do |t|
  t.join if t.alive?
end

The Queue can either be a @member_variable or a $global.

Facebook API - Using multiple requests in Rails Model

Take a look at the answers here: What is the preferred way of performing non blocking I/O in Ruby?

Good options seems to be:

Typhoeus
EventMachine + em-http-request

Better use EM.next_tick or EM.defer for long running calculation with Eventmachine?

Definitely EM.defer (or Thread.new I suppose), doing a long-running calculation in EM.next_tick will block your reactor for other things.

As a general rule, you don't want ANY block running inside reactor to be running for long regardless if it is or isn't IO blocking or the entire app halts while this is happening.

Why is an event loop needed for Asynchronous I/O

My question comes down to this: Why is it that in .Net (and from what
I can tell this is true for Java/JVM as well) there's no need for an
event loop, and I can fire off an asynchronous request at any time,
yet in languages like Ruby/Python, I need to resort to
eventmachine/twisted respectively?

I think that's because Ruby/Python (and seamlessly, Node.js as well) want to make a developer's life easier by imposing single-threaded model for the application's core loop. With event machine, the completion callbacks of the async I/O routines are serialized and queued to be executed on the same thread, so the developer doesn't have to worry about thread safety.

I can't speak for Java, but in .NET we have control over this with synchronization context. Check Stephen Cleary's "It's All About the SynchronizationContext". It's quite easy to replicate the concept of event machine in .NET (in fact, that's automatically done for UI applications). A custom implementation of the serializing synchronization context might look like AsyncPump from Stephen Toub's "Await, SynchronizationContext, and Console Apps". IMO, this would be a direct match to Ruby's event machine.