Unix Vs Bsd Vs Tcp Vs Internet Sockets

Unix vs BSD vs TCP vs Internet sockets?

A socket is an abstraction. The tag definition used on SO for a socket is as good as any:

An endpoint of a bidirectional inter-process communication flow. This often refers to a process flow over a network connection, but by no means is limited to such.

So from that a major distinction are sockets that (1) use a network and (2) sockets that do not.

Unix domain sockets do not use the network. Their API makes it appear to be (mostly) the same to the developer as a network socket but all the communication is done through the kernel and the sockets are limited to talking to processes on the machine upon which they are running.

Berkeley sockets are what we know as network sockets on POSIX platforms today. In the past there were different lines of Unix development (e.g. Berkeley or BSD, System V or sysV, etc.) Berkeley sockets essentially won in the marketplace and are effectively synonymous with Unix sockets today.

Strictly speaking there isn't a TCP socket. There are network sockets that can communicate using the TCP protocol. It's just a linguist shorthand to refer to them as a TCP socket to distinguish them from a socket using another protocol e.g. UDP, a routing protocol or whatnot.

An "Internet" socket is a mostly a meaningless distinction. It's a socket using a network protocol. That eliminates Unix domain sockets, but most network protocols can be used to communicate on a LAN or the Internet, which is just collection of networks. (Though do note there are protocols specific to local networks as well as those that manage collections of networks.)

What more socket APIs are available? What are the differences between each of these Socket API?

Why isn't there a standard for this?

The de facto standard is BSD sockets, upon which the Linux, POSIX and Windows sockets APIs are based.

What more socket APIs are available?

Nothing that's still widely used. Before BSD sockets and its derivatives took over the world, there were many. Most of the ones that remain are probably in the embedded world, and even those are going away as mainstream OSes continue to swallow more and more of the embedded market.

This battle was pretty much fought and over by the mid 90's. BSD sockets won.

What are the differences between each of these Socket API?

There are minor differences among the BSD, Linux and POSIX variants, nothing more serious than any other differences among Unixy operating systems.

The reason they have a Linux/POSIX version of the book probably has more to do with marketing than anything technical. It answers a question the publisher probably saw a lot, "Why do I need a BSD book, I'm running Linux, not BSD!" Or, more commonly these days: "What's BSD?"

From a 10,000 foot view, Winsock is very different from BSD sockets, but because it's a fairly strict superset of BSD sockets, you can still move your knowledge over. Most of the differences are pure extensions to BSD sockets, mostly to do with the differences in the Windows kernel architecture and the way Windows programs are typically built. For instance, the first really big extension was asynchronous sockets, which makes it much easier to use sockets in a single-threaded Windows GUI program than using pure BSD sockets. Later extensions support special features available in the NT derived kernels that have no simple analog in Unixy systems, like event objects and overlapped I/O.

For what it's worth, there are extensions to plain old BSD sockets in some Unixy systems, too, like the aio_*() stuff in Solaris and other systems.

If your program has to be source compatible with many systems, you either ignore these differences and program to the common base shared by all these systems, or you build some kind of translation layer that lets you use platform features transparently. Apache does the latter for instance, making use of the fastest networking features on each platform, while the core web server code doesn't care exactly how the networking gets done. Many other programs choose the portable path, since they're not performance critical, and saving programmer time is therefore more important.

When People say Just "Network Programming in C" / "Socket Programming" what exactly they are referring to?

BSD sockets or some variant.

Links for any further information?

The Winsock Programmer's FAQ. Specifically, you might want to look at its resources section, and the FAQ article BSD Sockets Compatibility.

(Disclaimer: I'm the FAQ's maintainer.)

communication between processes: tcp vs unix sockets, ipc vs nats

The question is actually too broad to answer, but one answer for TCP vs unix domain sockets:

Architect your code, so that you can easily move between those if necessary. The programming model for these is basically the same (both are bidirectional streams of data), and the read/write APIs on OS level as well as in most frameworks is the same. This means e.g. in node both will inherit from the Readable/WriteableStream interfaces. That means the only code that you need to change for switching between those is the listener on the server side where you call the TCP accept APIs instead of the unix domain socket accept APIs and the other way around. You can even have your application accept both types of connections and later on handle them the same internally.

TCP support is always nice because it gives you some flexibility. With my last measurement the overhead was a little bit more (I think 30% versus TCP over loopback) but these are all micro benchmarks and it won't matter for most applications. Unix domain sockets might have an advantage if require some of their special functions, e.g. the ability to send file descriptors across them.

And regarding TCP vs NATS & Co:
If you are not that experienced with network programming and protocol design it makes sense to use readymade IPC systems. That could be anything from HTTP to gRPC to Thrift. These are all point-to-point systems. NATS is different, since its a message broker and not RPC. It also requires an extra component in the middle. Whether this makes sense totally depends on the application.

Difference between UNIX domain STREAM and DATAGRAM sockets?

Just as the manual page says Unix sockets are always reliable. The difference between SOCK_STREAM and SOCK_DGRAM is in the semantics of consuming data out of the socket.

Stream socket allows for reading arbitrary number of bytes, but still preserving byte sequence. In other words, a sender might write 4K of data to the socket, and the receiver can consume that data byte by byte. The other way around is true too - sender can write several small messages to the socket that the receiver can consume in one read. Stream socket does not preserve message boundaries.

Datagram socket, on the other hand, does preserve these boundaries - one write by the sender always corresponds to one read by the receiver (even if receiver's buffer given to read(2) or recv(2) is smaller then that message).

So if your application protocol has small messages with known upper bound on message size you are better off with SOCK_DGRAM since that's easier to manage.

If your protocol calls for arbitrary long message payloads, or is just an unstructured stream (like raw audio or something), then pick SOCK_STREAM and do the required buffering.

Performance should be the same since both types just go through local in-kernel memory, just the buffer management is different.

How do I determine whether open socket is TCP or unix domain socket?

The first member of the struct sockaddr returned by getsockname is sa_family, just test that against the symbolic constants. The bug on OSX lets you assume the unix domain when the returned address structure is zero bytes, for other platforms and domains, just check the returned structure.

For what is better suited every type of communication in Unix sockets?

It really depends what kind of server you are going to implement.

If message boundaries are important, then SOCK_DGRAM would be the best choice.
Because recvfrom/recvmsg/select will return when a complete message is received.

With SOCK_STREAM, message receiving is more tricky: One receiving call may return a partial message, or part of two messages, or several messages... etc.

If message boundaries are not important, then SOCK_STREAM could be the best choice.

SOCK_DGRAM of AF_INET is unreliable UDP. But, in most sytems, SOCK_DGRAM of AF_UNIX is reliable.
For example: If queue of receiver is full, sender will be blocked until there is space.

Unix Vs Bsd Vs Tcp Vs Internet Sockets