How Do Unix Domain Sockets Differentiate Between Multiple Clients

How do Unix Domain Sockets differentiate between multiple clients?

If you create a PF_UNIX socket of type SOCK_STREAM, and accept connections on it, then each time you accept a connection, you get a new file descriptor (as the return value of the accept system call). This file descriptor reads data from and writes data to a file descriptor in the client process. Thus it works just like a TCP/IP connection.

There's no “unix domain protocol format”. There doesn't need to be, because a Unix-domain socket can't be connected to a peer over a network connection. In the kernel, the file descriptor representing your end of a SOCK_STREAM Unix-domain socket points to a data structure that tells the kernel which file descriptor is at the other end of the connection. When you write data to your file descriptor, the kernel looks up the file descriptor at the other end of the connection and appends the data to that other file descriptor's read buffer. The kernel doesn't need to put your data inside a packet with a header describing its destination.

For a SOCK_DGRAM socket, you have to tell the kernel the path of the socket that should receive your data, and it uses that to look up the file descriptor for that receiving socket.

If you bind a path to your client socket before you connect to the server socket (or before you send data if you're using SOCK_DGRAM), then the server process can get that path using getpeername (for SOCK_STREAM). For a SOCK_DGRAM, the receiving side can use recvfrom to get the path of the sending socket.

If you don't bind a path, then the receiving process can't get an id that uniquely identifies the peer. At least, not on the Linux kernel I'm running (2.6.18-238.19.1.el5).

What is better practice, create one unix socket with multiple connections or multiple sockets with one connection?

Interesting question. Here are my thoughts around it:

(Around option 1)

If you have heavy traffic flowing through those sockets, then at some point they may become a bottleneck. If that's not the case (low traffic), then option 1 would work.

(Around option 2)

Let N be the number of child processes created in a given timeframe. If N * 3 > (total number of file descriptors on your machine), for the same timeframe, then definitely option 2 doesn't seem to be the right fit.

If you can also account for a file descriptor recycling rate, that would give more accuracy to the overall evaluation.

(Overall)

I would think about those 2 tradeoffs and make a decision based on that. Without some numbers around it would be hard to take an informed decision.

How can I differentiate between UNIX socket connections in Node.js?

File descriptors are the "IP+port-equivalent" UNIX socket connection identifiers.

A Node stream corresponding to a UNIX socket connection has stream._handle.fd containing that file descriptor.

How do multiple clients connect simultaneously to one port, say 80, on a server?

First off, a "port" is just a number. All a "connection to a port" really represents is a packet which has that number specified in its "destination port" header field.

Now, there are two answers to your question, one for stateful protocols and one for stateless protocols.

For a stateless protocol (ie UDP), there is no problem because "connections" don't exist - multiple people can send packets to the same port, and their packets will arrive in whatever sequence. Nobody is ever in the "connected" state.

For a stateful protocol (like TCP), a connection is identified by a 4-tuple consisting of source and destination ports and source and destination IP addresses. So, if two different machines connect to the same port on a third machine, there are two distinct connections because the source IPs differ. If the same machine (or two behind NAT or otherwise sharing the same IP address) connects twice to a single remote end, the connections are differentiated by source port (which is generally a random high-numbered port).

Simply, if I connect to the same web server twice from my client, the two connections will have different source ports from my perspective and destination ports from the web server's. So there is no ambiguity, even though both connections have the same source and destination IP addresses.

Ports are a way to multiplex IP addresses so that different applications can listen on the same IP address/protocol pair. Unless an application defines its own higher-level protocol, there is no way to multiplex a port. If two connections using the same protocol simultaneously have identical source and destination IPs and identical source and destination ports, they must be the same connection.

Linux IPC multiple clients with daemon

Basically I think you need to compromise and have a 2 stage process with a SOCK_STREAM socket as stage 1 and SOCK_DGRAM as stage 2.
So it will be like this:

server:

create SOCK_STREAM socket "my.daemon.handshake"
accept client
send a randomly generated string XXX to the client and close the
socket
create a SOCK_DGRAM socket "my.daemon.XXX" and start
processing it
repeat (2)

client

connect to socket "my.daemon.handshake"
read to EOF -- get value XXX
start communicating with server on socket "my.daemon.XXX"
profit!!!!

How Do Unix Domain Sockets Differentiate Between Multiple Clients