Scaling Socket.Io to Multiple Node.Js Processes Using Cluster

Some problems when scaling Socket.IO to multiple Node.js processes using cluster

By default, socket.io connects with several consecutive http requests. It essentially starts in HTTP polling mode and then after some initial data exchange, it switches to a webSocket transport.

Because of this, a cluster that does not have any sort of sticky load balancing will not work. Each of the initial consecutive http requests that are all supposed to go to the same server process will probably be sent to different server processes in the cluster and the initial connection will not work.

There are two solutions that I know of:

  1. Implement some sort of sticky load balancing (in the clustering module) so that each client repeatedly goes to the same server process and thus all the consecutive http requests at the beginning of a connection will go to the same server process.

  2. Switch your client configurations to immediately switch to the webSocket transport and never use the HTTP polling. The connection will still start with an http request (since that's how all webSocket connections start), but that exact same connection will be upgraded to webSocket so there will only ever be one connection.

FYI, you will also need to make sure that the reconnect logic in socket.io is properly reconnecting to the original server process that is was connected to.

socket.io has node.js clustering support in combination with redis. While the socket.io documentation site has been down for multiple days now, you can find some info here and Scaling Socket.IO to multiple Node.js processes using cluster and here's a previously cached version of the socket.io doc for clustering.

Scaling Socket.IO to multiple Node.js processes using cluster

Edit: In Socket.IO 1.0+, rather than setting a store with multiple Redis clients, a simpler Redis adapter module can now be used.

var io = require('socket.io')(3000);
var redis = require('socket.io-redis');
io.adapter(redis({ host: 'localhost', port: 6379 }));

The example shown below would look more like this:

var cluster = require('cluster');
var os = require('os');

if (cluster.isMaster) {
// we create a HTTP server, but we do not use listen
// that way, we have a socket.io server that doesn't accept connections
var server = require('http').createServer();
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');

io.adapter(redis({ host: 'localhost', port: 6379 }));

setInterval(function() {
// all workers will receive this in Redis, and emit
io.emit('data', 'payload');
}, 1000);

for (var i = 0; i < os.cpus().length; i++) {
cluster.fork();
}

cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}

if (cluster.isWorker) {
var express = require('express');
var app = express();

var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');

io.adapter(redis({ host: 'localhost', port: 6379 }));
io.on('connection', function(socket) {
socket.emit('data', 'connected to worker: ' + cluster.worker.id);
});

app.listen(80);
}

If you have a master node that needs to publish to other Socket.IO processes, but doesn't accept socket connections itself, use socket.io-emitter instead of socket.io-redis.

If you are having trouble scaling, run your Node applications with DEBUG=*. Socket.IO now implements debug which will also print out Redis adapter debug messages. Example output:

socket.io:server initializing namespace / +0ms
socket.io:server creating engine.io instance with opts {"path":"/socket.io"} +2ms
socket.io:server attaching client serving req handler +2ms
socket.io-parser encoding packet {"type":2,"data":["event","payload"],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["event","payload"],"nsp":"/"} as 2["event","payload"] +1ms
socket.io-redis ignore same uid +0ms

If both your master and child processes both display the same parser messages, then your application is properly scaling.


There shouldn't be a problem with your setup if you are emitting from a single worker. What you're doing is emitting from all four workers, and due to Redis publish/subscribe, the messages aren't duplicated, but written four times, as you asked the application to do. Here's a simple diagram of what Redis does:

Client  <--  Worker 1 emit -->  Redis
Client <-- Worker 2 <----------|
Client <-- Worker 3 <----------|
Client <-- Worker 4 <----------|

As you can see, when you emit from a worker, it will publish the emit to Redis, and it will be mirrored from other workers, which have subscribed to the Redis database. This also means you can use multiple socket servers connected the the same instance, and an emit on one server will be fired on all connected servers.

With cluster, when a client connects, it will connect to one of your four workers, not all four. That also means anything you emit from that worker will only be shown once to the client. So yes, the application is scaling, but the way you're doing it, you're emitting from all four workers, and the Redis database is making it as if you were calling it four times on a single worker. If a client actually connected to all four of your socket instances, they'd be receiving sixteen messages a second, not four.

The type of socket handling depends on the type of application you're going to have. If you're going to handle clients individually, then you should have no problem, because the connection event will only fire for one worker per one client. If you need a global "heartbeat", then you could have a socket handler in your master process. Since workers die when the master process dies, you should offset the connection load off of the master process, and let the children handle connections. Here's an example:

var cluster = require('cluster');
var os = require('os');

if (cluster.isMaster) {
// we create a HTTP server, but we do not use listen
// that way, we have a socket.io server that doesn't accept connections
var server = require('http').createServer();
var io = require('socket.io').listen(server);

var RedisStore = require('socket.io/lib/stores/redis');
var redis = require('socket.io/node_modules/redis');

io.set('store', new RedisStore({
redisPub: redis.createClient(),
redisSub: redis.createClient(),
redisClient: redis.createClient()
}));

setInterval(function() {
// all workers will receive this in Redis, and emit
io.sockets.emit('data', 'payload');
}, 1000);

for (var i = 0; i < os.cpus().length; i++) {
cluster.fork();
}

cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}

if (cluster.isWorker) {
var express = require('express');
var app = express();

var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);

var RedisStore = require('socket.io/lib/stores/redis');
var redis = require('socket.io/node_modules/redis');

io.set('store', new RedisStore({
redisPub: redis.createClient(),
redisSub: redis.createClient(),
redisClient: redis.createClient()
}));

io.sockets.on('connection', function(socket) {
socket.emit('data', 'connected to worker: ' + cluster.worker.id);
});

app.listen(80);
}

In the example, there are five Socket.IO instances, one being the master, and four being the children. The master server never calls listen() so there is no connection overhead on that process. However, if you call an emit on the master process, it will be published to Redis, and the four worker processes will perform the emit on their clients. This offsets connection load to workers, and if a worker were to die, your main application logic would be untouched in the master.

Note that with Redis, all emits, even in a namespace or room will be processed by other worker processes as if you triggered the emit from that process. In other words, if you have two Socket.IO instances with one Redis instance, calling emit() on a socket in the first worker will send the data to its clients, while worker two will do the same as if you called the emit from that worker.

Good way to use socket.io with cluster in multi-core server?

You could use socket.io as normal but with a redis store in the background. It will also support multiple instances on socket.io without any external library. It even supports rooms over multiple instances.

Link to how to set up socket.io with redis: Using Multiples Nodes/Processes with socket.io

Socket.io not working when using multiple nodes through pm2

I think that's because the redis@4 clients must be manually connected first:

 Promise.all([pubClient.connect(), subClient.connect()]).then(() => {
io.adapter(createAdapter(pubClient, subClient));
io.listen(3000);
});

@socket.io/redis-adapter README

HTML5 canvas with Node.js, clustering and Socket.IO

I think your CPU cores will be doing something very similar anyways, but I believe this question should be re-worded in a way that it's exploring how would you solve this through utilizing multiple CPUs assuming you have >= X paintings where X is the # of CPUs. You don't directly assign CPU's to individual paintings, OS is well optimized to come up with smart ways of choosing the best available one.

See how the workers are setup to listen to socket connections? You can emit the data that you want to emit in each of the workers.

Code below is taken from this SO post, I've slightly changed it.

var cluster = require('cluster');
var os = require('os');

if (cluster.isMaster) {
// we create a HTTP server, but we do not use listen
// that way, we have a socket.io server that doesn't accept connections
var server = require('http').createServer();
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');

io.adapter(redis({ host: 'localhost', port: 6379 }));

for (var i = 0; i < os.cpus().length; i++) {
cluster.fork();
}

cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
}

if (cluster.isWorker) {
var express = require('express');
var app = express();

var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);
var redis = require('socket.io-redis');

io.adapter(redis({ host: 'localhost', port: 6379 }));
io.on('connection', function(socket) {
// grandeFasola - emit what you what to emit here.
socket.emit('data', 'connected to worker: ' + cluster.worker.id);
});

app.listen(80);
}


Related Topics



Leave a reply



Submit