Confusion About Node.Js Internal Asynchronous I/O Mechanism

Confusion about node.js internal asynchronous I/O mechanism

  1. First of all, libuv has removed the libeio from it. But it does perform async file I/O with a thread pool like libeio just as you mentioned.

  2. libuv also removes libev. It does the async network I/O based on the async I/O interfaces in different platforms, such as epoll, kqueue and IOCP, without a thread pool. There is a event loop which runs on the main thread of uv which polls the I/O events and processes them.

  3. The thread pool inside libuv is a fixed size thread pool (4 in uinx like system). It performs a task queue role and avoids the exhaustion of the system resources by generating threads indefinitely when the requests increase.

Confusion on Node.js thread pool

There are two main reasons, you may want node:

  1. Performance

AFAIK, node leverages low-level non-blocking API wherever possible; I see thread pool as some kind of fallback which is used, when no-blocking primitive simply does not exist.

For more info, see:

When is the thread pool used?

Confusion about node.js internal asynchronous I/O mechanism


  1. No multithreading

is not just about speed. Event-loop-powered asynchronous callbacks / Promises / CSP is a way, how to write code, that 'runs tasks in parallel', but without explicit locks (and not so explicit deadlocks and race-conditions). Many of the people who tried some multithread programming tend to appreciate these semi-new paradigms.

What is non-blocking or asynchronous I/O in Node.js?

Synchronous vs Asynchronous

Synchronous execution usually refers to code executing in sequence. Asynchronous execution refers to execution that doesn't run in the sequence it appears in the code. In the following example, the synchronous operation causes the alerts to fire in sequence. In the async operation, while alert(2) appears to execute second, it doesn't.

Synchronous: 1,2,3

alert(1);alert(2);alert(3);

Why using sync functions in nodejs is known as blocking the event loop?

What is blocking

Why using sync functions in nodejs is known as "blocking the event loop"?

In a nutshell, it's because the event loop can only process the next event when you return from whatever your current Javascript is doing and allow the event loop to look for the next event. A sync function blocks the interpreter until it finishes. So, the entire time a sync function is working and you're waiting for it to return, the interpreter is blocked and control is not returned back to the event loop. This blocks the event loop and also blocks your Javascript from running.

Single Thread

Nodejs runs your Javascript all with a single thread. Other threads are used internally, but your Javascript itself runs only in a single thread (we're assuming there is no use of WorkerThreads in your code). So, when you make a synchronous function call, that single thread that runs your Javascript is busy and blocked until the synchronous function call returns and can then continue executing more of your Javascript.

This blocks everything. It blocks running more of your Javascript after the synchronous function call and it blocks getting back to the event loop to run any other event handlers that are pending such as incoming network events, timers, completion events from other things such as disk I/O, etc... So, while this is blocking the event loop, it's also blocking running more of your own code after the function call.

Non-Blocking, Asynchronous Operations

On the other hand, asynchronous functions such as fs.readFile(), for example, don't block. They initiate their operation and return immediately. This allows the interpreter to continue running any more of your own Javascript after the call to fs.readFile() and it also allows you to return from whatever event triggered your work in the first place which will return control back to the event loop so it can service other waiting events or other events that will trigger in the future. fs.readFile() then does most of its work in native code (behind the scenes) outside of the main thread that runs your Javascript. So, these type of asynchronous functions don't block the event loop - instead they cooperate with the event loop so that other things can get run while waiting for the completion of the asynchronous operation that was previously initiated. When they complete, they insert an event into the event loop that causes the event loop to call the completion callback at it's earliest convenience (when it's not blocked).

Differences in Blocking

It's also worth noting that functions that represent both synchronous and asynchronous operations both block the execution of your Javascript and block the event loop until they return. The difference is that an asynchronous operation returns from the function nearly immediately, long before the asynchronous operation itself is complete and communicates its completion and/or eventual result back via a promise, callback or event (which are all callbacks at the lowest level of the event loop). The synchronous operation does not return until the operation itself is complete. So, the asynchronous operation only blocks for a very short duration while the operation is being initiated whereas the synchronous operation blocks for the entire duration of the operation (until it completes).

More About the Event Loop

So, in what moment during the event loop is my javascript code executed?

When control returns back to the event loop, it goes through several different phases looking for things to do. When it finds something to do, that "something" results in calling a Javascript callback that starts running some of your Javascript. For example if the "something to do" is a setTimeout() timer that is ready to fire, then it will call the Javascript callback that was passed to setTimeout(). That callback runs to its completion and only when your Javascript returns from that callback does the event loop regain control and get to look for the next event to run and call its callback.

it won't be moved to the EventLoop because is not an async operation

This is not really the correct way to think about things. Things are not really "moved to the event loop".

A synchronous operation is just a blocking function call that returns when it returns and execution of any other Javascript is blocked until that blocking function call returns. Things are blocked because the single threaded interpreter running your Javascript is stuck waiting for this function to finish. It can't do anything else and the event loop is also blocked because it can't do anything until the interpreter returns control back to the event loop.

An asynchronous operation, on the other hand, initiates some operation (let's say it issues an http request to some other host) and then immediately returns, long before it has the result of that http request. Since this asynchronous operation returns before it has its result, it is considered non-blocking and because it returns quickly, you can then return from whatever event caused your code to run and that will then return control back to the event loop. That allows the event loop to then look for other events to handle and run their corresponding callbacks. Meanwhile, the asynchronous operation that was previously started has some native code associated with it (that may or may not be running in an native code OS thread - depending upon what type of operation it is). But regardless, that native code is configured such that when the asynchronous operation completes, it will insert an event into the appropriate event queue. So, at some future point when nodejs has control back in the event loop, it will find that event and run the Javascript callback associated with that event, thus notifying the original Javascript code that the asynchronous operation is now complete and providing some sort of result or error code.

Example

As a simple example, let's say you run this code:

    // timer that wants to fire in 1 second
setTimeout(function() {
console.log("timer fired")
}, 1000);

// loop that blocks for 5 seconds
const start = Date.now();
while (Date.now() - start < 5000) { }
console.log("blocking loop finished");

Nodejs Event Loop

I have been personally reading the source code of node.js & v8.

I went into a similar problem like you when I tried to understand node.js architecture in order to write native modules.

What I am posting here is my understanding of node.js and this might be a bit off track as well.

  1. Libev is the event loop which actually runs internally in node.js to perform simple event loop operations. It's written originally for *nix systems. Libev provides a simple yet optimized event loop for the process to run on. You can read more about libev here.

  2. LibEio is a library to perform input output asynchronously. It handles file descriptors, data handlers, sockets etc. You can read more about it here here.

  3. LibUv is an abstraction layer on the top of libeio , libev, c-ares ( for DNS ) and iocp (for windows asynchronous-io). LibUv performs, maintains and manages all the io and events in the event pool. ( in case of libeio threadpool ). You should check out Ryan Dahl's tutorial on libUv. That will start making more sense to you about how libUv works itself and then you will understand how node.js works on the top of libuv and v8.

To understand just the javascript event loop you should consider watching these videos

  • JS-conference
  • JSConf2011 ( has very irritative sfx)
  • Understanding event driven programming
  • Understanding the node.js event loop

To see how libeio is used with node.js in order to create async modules you should see this example.

Basically what happens inside the node.js is that v8 loop runs and handles all javascript parts as well as C++ modules [ when they are running in a main thread ( as per official documentation node.js itself is single threaded) ]. When outside of the main thread, libev and libeio handle it in the thread pool and libev provide the interaction with the main loop. So from my understanding, node.js has 1 permanent event loop: that's the v8 event loop. To handle C++ async tasks it's using a threadpool [via libeio & libev ].

For example:

eio_custom(Task,FLAG,AfterTask,Eio_REQUEST);

Which appears in all modules is usually calling the function Task in the threadpool. When it's complete, it calls the AfterTask function in the main thread. Whereas Eio_REQUEST is the request handler which can be a structure / object whose motive is to provide communication between the threadpool and main thread.



Related Topics



Leave a reply



Submit