Node.Js on Multi-Core MAChines

Node Js On Multi-Core Processors

(V8 developer here.)

Yes, in general, running several instances of Node on the same machine can increase the total amount of work done. This would be similar to having several Chrome tabs, which can each do some single-threaded JavaScript work.

That said, it's most probably not as simple as "8 instances on an 8-thread processor gives 8 times the overall throughput", for several reasons:

(1) If you actually mean "8 threads", i.e. 4 cores + hyperthreading, then going from 4 to 8 processes will likely give 20-40% improvement (depending on hardware architecture and specific workload), not 2x.

(2) V8 does use more than one thread for internal purposes (mostly compilation and garbage collection), which is one reason why a single Node instance likely (depending on workload) will use more than one CPU core/thread.

(3) Another reason is that while JavaScript is single-threaded, Node does more than just execute a single thread of JavaScript. The various things happening in the background (that will trigger JS callbacks when they're ready) also need CPU resources.

(4) Finally, the CPU is not necessarily your bottleneck. If your server's performance is capped by e.g. network or disk, then spawning more instances won't help; on the contrary, it might make things significantly worse.

Long story short: it doesn't hurt to try. As a first step, run a typical workload on one instance, and take a look at the current system load (CPU, memory, network, disk). If they all have sufficient idle capacity, try going to two instances, measure whether that increases overall throughput, and check system load again. Then keep adding instances until you notice that it doesn't help any more.

Node.js on multi-core machines for file I/O operations

Just for everyone to know, if they are interested, you need to use the npm module piscina.

In this gist I explain everything. Node.JS is a powerful tool for backend developers, but you must be aware of multi-core processing in order to maximize the potential of your CPU.
This Node.JS multi-core feature is mostly used for webservers and Node.JS has already out of the box the cluster module thereto.
Although Node.JS has also out of the box the module threads, it's not so easy to deal with.

Let's create a project that will test single-thread and multi-thread CPU intensive data and write some random data to file.

Create the project:

mkdir test-threads && cd test-threads
npm init -y

Install dependencies and create dist/ directory

npm install async progress piscina command-line-args
mkdir dist

Create the file index.js at the root of the project directory

const path = require('path')
const async = require('async')
const ProgressBar = require('progress')
const Piscina = require('piscina')
const commandLineArgs = require('command-line-args')

console.time('main')

const worker = require(path.resolve(__dirname, 'worker.js'))
const piscina = new Piscina({
filename: path.resolve(__dirname, 'worker.js')
})

const argvOptions = commandLineArgs([
{ name: 'multi-thread', type: Boolean },
{ name: 'iterations', alias: 'i', type: Number }
])

const files = []
for (let i=0; i < (argvOptions.iterations || 1000); i++) {
files.push(path.join(__dirname, 'dist', i + '.txt'))
}

var bar = new ProgressBar(':bar', { total: files.length, width: 80 });

async.each(files, function (file, cb) {
(async function() {
try {
const err = argvOptions['multi-thread'] ? (await piscina.run(file)) : worker(file)
bar.tick()
if (err) cb(Error(err)); else cb()
} catch(err) {
cb(Error(err))
}
})();
}, (err) => {
if (err) {
console.error('There was an error: ', err)
process.exitCode = 1
} else {
bar.terminate()
console.log('Success')
console.timeEnd('main')
process.exitCode = 0
}
})

Create now worker.js also at the root of the project directory

const fs = require('fs')

// some CPU intensive function; the higher is baseNumber, the higher is the time elapsed
function mySlowFunction(baseNumber) {
let result = 0
for (var i = Math.pow(baseNumber, 7); i >= 0; i--) {
result += Math.atan(i) * Math.tan(i)
}
}

module.exports = (file) => {
try {
mySlowFunction(parseInt(Math.random() * 10 + 1))
fs.writeFileSync(file, Math.random().toString())
return null
} catch (e) {
return Error(e)
}
}

Now just run on single thread and check time elapsed, for 1000 and 10000 iterations (one iteration equals to data processing and file creation)

node index.js -i 1000
node index.js -i 10000

Now compare with the great advantage of multi-thread

node index.js --multi-thread -i 1000
node index.js --multi-thread -i 10000

With the test I did (16 cores CPU), the difference is huge, it went with 1000 iterations from 1:27.061 (m:ss.mmm) for single thread to 8.884s with multi-thread. Check also the files inside dist/ to be sure they were created correctly.

Why run one Node.js process per core?

This article has an extension review on the threading mechanism of node.js, worth a read.

In short, the main point is in plain node.js only a few function calls uses thread pool (DNS and FS calls). Your call mostly runs on the event loop only. So for example if you wrote a web app that each request takes 100ms synchronously, you are bound to 10req/s. Thread pool won't be involved. And to increase throughput on a multicore system is to use other cores.

Then it comes asynchronous or callback functions. While it does give you a sense of parallelization, what really happens is it waits for the async code to finish in background so that event loop can work on another function call. Afterwards, the callback codes still has to run in event loop, therefore all your written code are still ran in the one and only one event loop, thus won't be able to harness multi-core systems' power.

Nodejs to utilize all cores on all CPUs

Node is perfect for that; it is actually named Node as reference to the intended topology of its apps, as multiple (distributed) nodes that communicate with each other.

take a look at the built-in cluster module, and also see this article and that one for reference.

Deno on multi-core machines

In Deno, like in a web browser, you should be able to use Web Workers to utilize 100% of a multi-core CPU.

In a cluster you need a "manager" node (which can be a worker itself too as needed/appropriate). In a similar fashion the Web Worker API can be used to create however many dedicated workers as desired. This means the main thread should never block as it can delegate all tasks that will potentially block to its workers. Tasks that won't block (e.g. simple database or other I/O bound calls) can be done directly on the main thread like normal.

Deno also supports navigator.hardwareConcurrency so you can query about available hardware and determine the number of desired workers accordingly. You might not need to define any limits though. Spawning a new dedicated worker from the same source as a previously spawned dedicated worker may be fast enough to do so on demand. Even so there may be value in reusing dedicated workers rather than spawning a new one for every request.

With Transferable Objects large data sets can be made available to/from workers without copying the data. This along with messaging makes it pretty straight forward to delegate tasks while avoiding performance bottlenecks from copying large data sets.

Depending on your use cases you might also use a library like Comlink "that removes the mental barrier of thinking about postMessage and hides the fact that you are working with workers."

e.g.

main.ts

import { serve } from "https://deno.land/std@0.133.0/http/server.ts";

import ComlinkRequestHandler from "./ComlinkRequestHandler.ts";

serve(async function handler(request) {
const worker = new Worker(new URL("./worker.ts", import.meta.url).href, {
type: "module",
});

const handler = ComlinkRequestHandler.wrap(worker);

return await handler(request);
});

worker.ts

/// <reference no-default-lib="true"/>
/// <reference lib="deno.worker" />

import ComlinkRequestHandler from "./ComlinkRequestHandler.ts";

ComlinkRequestHandler.expose(async (request) => {
const body = await request.text();
return new Response(`Hello to ${request.url}\n\nReceived:\n\n${body}\n`);
});

ComlinkRequestHandler.ts

import * as Comlink from "https://cdn.skypack.dev/comlink@4.3.1?dts";

interface RequestMessage extends Omit<RequestInit, "body" | "signal"> {
url: string;
headers: Record<string, string>;
hasBody: boolean;
}

interface ResponseMessage extends ResponseInit {
headers: Record<string, string>;
hasBody: boolean;
}

export default class ComlinkRequestHandler {
#handler: (request: Request) => Promise<Response>;
#responseBodyReader: ReadableStreamDefaultReader<Uint8Array> | undefined;

static expose(handler: (request: Request) => Promise<Response>) {
Comlink.expose(new ComlinkRequestHandler(handler));
}

static wrap(worker: Worker) {
const { handleRequest, nextResponseBodyChunk } =
Comlink.wrap<ComlinkRequestHandler>(worker);

return async (request: Request): Promise<Response> => {
const requestBodyReader = request.body?.getReader();

const requestMessage: RequestMessage = {
url: request.url,
hasBody: requestBodyReader !== undefined,
cache: request.cache,
credentials: request.credentials,
headers: Object.fromEntries(request.headers.entries()),
integrity: request.integrity,
keepalive: request.keepalive,
method: request.method,
mode: request.mode,
redirect: request.redirect,
referrer: request.referrer,
referrerPolicy: request.referrerPolicy,
};

const nextRequestBodyChunk = Comlink.proxy(async () => {
if (requestBodyReader === undefined) return undefined;
const { value } = await requestBodyReader.read();
return value;
});

const { hasBody: responseHasBody, ...responseInit } = await handleRequest(
requestMessage,
nextRequestBodyChunk
);

const responseBodyInit: BodyInit | null = responseHasBody
? new ReadableStream({
start(controller) {
async function push() {
const value = await nextResponseBodyChunk();
if (value === undefined) {
controller.close();
return;
}
controller.enqueue(value);
push();
}

push();
},
})
: null;

return new Response(responseBodyInit, responseInit);
};
}

constructor(handler: (request: Request) => Promise<Response>) {
this.#handler = handler;
}

async handleRequest(
{ url, hasBody, ...init }: RequestMessage,
nextRequestBodyChunk: () => Promise<Uint8Array | undefined>
): Promise<ResponseMessage> {
const request = new Request(
url,
hasBody
? {
...init,
body: new ReadableStream({
start(controller) {
async function push() {
const value = await nextRequestBodyChunk();
if (value === undefined) {
controller.close();
return;
}
controller.enqueue(value);
push();
}

push();
},
}),
}
: init
);
const response = await this.#handler(request);
this.#responseBodyReader = response.body?.getReader();
return {
hasBody: this.#responseBodyReader !== undefined,
headers: Object.fromEntries(response.headers.entries()),
status: response.status,
statusText: response.statusText,
};
}

async nextResponseBodyChunk(): Promise<Uint8Array | undefined> {
if (this.#responseBodyReader === undefined) return undefined;
const { value } = await this.#responseBodyReader.read();
return value;
}
}

Example usage:

% deno run --allow-net --allow-read main.ts
% curl -X POST --data '{"answer":42}' http://localhost:8000/foo/bar
Hello to http://localhost:8000/foo/bar

Received:

{"answer":42}

There's probably a better way to do this (e.g. via Comlink.transferHandlers and registering transfer handlers for Request, Response, and/or ReadableStream) but the idea is the same and will handle even large request or response payloads as the bodies are streamed via messaging.

Does running multiple Node.js scripts automatically distribute across cores

Indeed, just running multiple instances of a node app, will use multiple processes.

But one note, the cluster option has one slight advantage, and that's if your creating something that listens on a TcpIP port, the cluster can share a single IP port,.. eg. If say you was running a webserver on port 80, you could use all cores that run on port 80..

What I tend to do though, is run a reverse proxy on Port 80, and have other processes running on different ports. IOW: The cluster option is great for creating a reverse proxy.. :)

The advantage of the above is that a reverse proxy does not really need to keep state, you can have it do the SSL too all using cluster. This leaves your node apps, so it can keep state, eg. Cached responses in memory etc. Big advantage of node is that of in-process requests, no special ipc needed here.



Related Topics



Leave a reply



Submit