How The Buffering Work in Socket on Linux

How the buffering work in socket on linux

For UDP socket client will never know - the server side will just start dropping packets after the receive buffer is filled.

TCP, on the other hand, implements flow control. The server's kernel will gradually reduce the window, so the client will be able to send less and less data. At some point the window will go down to zero. At this point the client fills up its send buffer and receives an error from the send(2).

Buffered writes for UNIX sockets?

You are doing buffered writes, albeit a bit incorrectly. When you call, for example,

    write(socketfd, packet, 24);

what do you think packet is?

Now, you can create a larger buffer, say unsigned char buffer[4096], then memcpy() your output into it and eventually write() larger chunks of data at a time. That makes sense only up to a point however, for if you need to also receive responses to the messages you send then there's no advantage to breaking up the transmission other than at message boundaries (unless messages are exceedingly long), and it will complicate your code to buffer more than one message before sending.

Be aware, however, that write() does not guarantee to send the whole number of bytes requested. Its return value tells you how many it actually did send, and in the event of a short write you probably need to call write() one or more additional times to send the rest of the bytes.

You do have the option of putting most of that on the C library by wrapping the socket file descriptor in a stream via fdopen(), and using the stream I/O functions. In that case, you can configure buffering details via setvbuf().

What are the differences between Kernel Buffer, TCP Socket Buffer and Sliding Window

Linux does not handle TCP's sliding window as a separate buffer, rather as several indices indicating how much has already been received / read. The Linux kernel packet handling process can be described in many ways and can be divided to small parts as yo go deeper, but the general flow is as follows:

  1. The kernel prepares to receive data over a network interface, it prepares SKB (Socket Buffer) data structures and map them to the interface Rx DMA buffer ring.
  2. When packets arrive, they fill these preconfigured buffers and notify the kernel in an interrupt context of the packets arrival. In this context, the buffers are moved to a recv queue for the network stack to handle them out of an interrupt context.
  3. The network stack retrieves these packets and handles them accordingly, eventually arriving to the TCP layer (if they are indeed TCP packets) which in turn handles the window.
  4. See struct tcp_sock member u32 rcv_wnd which is then used in tp->rcvq_space.space as the per-connection space left in window.
  5. The buffer is added to socket receive queue and is read accordingly as stream data in tcp_recvmsg()

The important thing to remember here is that copies is the worst thing regarding performance. Therefore, the kernel will always (unless absolutely necessary) will avoid copies and use pointers instead.

How to find the socket buffer size of linux

If you want see your buffer size in terminal, you can take a look at:

  • /proc/sys/net/ipv4/tcp_rmem (for read)
  • /proc/sys/net/ipv4/tcp_wmem (for write)

They contain three numbers, which are minimum, default and maximum memory size values (in byte), respectively.

Writing directly to socket vs to buffer

Applications buffer writes to a network connection because a single write call with a large buffer is more efficient than multiple write calls with small buffers.

Call SetNoDelay(false) to make the operating system delay packet transmission with the hope of reducing the number of packets.

There is no option to explicitly flush a TCP connection’s buffer.

Before writing your own utilities, take a look at the bufio.Writer type. Many applications use this type to buffer writes to TCP connections and files.

Buffering data from sockets?

The short answer to your question is that I would go with reading a single byte at a time. Unfortunately its one of those cases where there are pros and cons for both cases.

For the use of a buffer is the fact that the implementation can be more efficient from the perspective of the network IO. Against the use of a buffer, I think that the code will be inherently more complex than the single byte version. So its an efficiency vs complexity trade off. The good news is that you can implement the simple solution first, profile the result and "upgrage" to a buffered approach if testing shows it to be worthwhile.

Also, just to note, as a thought experiment I wrote some pseudo code for a loop that does buffer based reads of http packets, included below. The complexity to implement a buffered read doesn't seem to bad. Note however that I haven't given much consideration to error handling, or tested if this will work at all. However, it should avoid excessive "double handling" of data, which is important since that would reduce the efficiency gains which were the purpose of this approach.

#define CHUNK_SIZE 1024

nextHttpBytesRead = 0;
nextHttp = NULL;
while (1)
{
size_t httpBytesRead = nextHttpBytesRead;
size_t thisHttpSize;
char *http = nextHttp;
char *temp;
char *httpTerminator;

do
{
temp = realloc(httpBytesRead + CHUNK_SIZE);
if (NULL == temp)
...
http = temp;

httpBytesRead += read(httpSocket, http + httpBytesRead, CHUNK_SIZE);
httpTerminator = strstr(http, "\r\n\r\n");
}while (NULL == httpTerminator)

thisHttpSize = ((int)httpTerminator - (int)http + 4; // Include terminator
nextHttpBytesRead = httpBytesRead - thisHttpSize;

// Adding CHUNK_SIZE here means that the first realloc won't have to do any work
nextHttp = malloc(nextHttpBytesRead + CHUNK_SIZE);
memcpy(nextHttp, http + thisHttpSize, nextHttpSize);

http[thisHttpSize] = '\0';
processHttp(http);
}

what will be done if the buffer socket full

For output buffer (i. e. the one holding packets to be sent), sendto() or similar call will block until the space is available in the buffer.

For input buffer (i. e. the one holding received packets), it depends on OS networking stack implementation, strictly speaking. In Linux, new packets are dropped silently. In any case, it will not be an error if some packets are lost this way, as UDP does not guarantee their delivery.

BTW, there is no such thing as a "UDP connection".

Linux Socket Buffer Imbalance

Suggestions:

  • Take a look at the actual settings on your Ethernet interfaces. "ethtool" is one way to get a thorough look. "ifconfig" tells you something, though less. (Both probably in /usr/sbin/.) Finding kernel messages with "dmesg" might tell you something. Looking at link error rates might reveal something too.
  • Querying your switch for its idea of port state might also reveal what's really going on. (Not relevant if you're just using a CAT5 cable between interfaces, without a switch.)
  • Since one pair of machines works as you expect, while another pair of machines doesn't, I'm thinking about some anomaly with duplex autonegotiation. Half-duplex is unusual for GigE but perhaps your switch or NIC is causing it. Discovering a half-duplex setting anywhere, or especially a disagreement between a host and its switch about port state, could be possible cause.

Are there any side effects to increasing the socket buffer size in Linux?

The only side-effect is memory usage. Increase them gradually and monitor the system. As long as you leave enough memory for existing processes you should be golden.



Related Topics



Leave a reply



Submit