When Should I Use Tcp_Nodelay and When Tcp_Cork

When should I use TCP_NODELAY and when TCP_CORK?

First of all not both of them disables Nagle's algorithm.

Nagle's algorithm is for reducing more number of small network packets in wire. The algorithm is: if data is smaller than a limit (usually MSS), wait until receiving ACK for previously sent packets and in the mean time accumulate data from user. Then send the accumulated data.

if [ data > MSS ]
send(data)
else
wait until ACK for previously sent data and accumulate data in send buffer (data)
And after receiving the ACK send(data)

This will help in applications like telnet. However, waiting for the ACK may increase latency when sending streaming data. Additionally, if the receiver implements the 'delayed ACK policy', it will cause a temporary deadlock situation. In such cases, disabling Nagle's algorithm is a better option.

So TCP_NODELAY is used for disabling Nagle's algorithm.

TCP_CORK aggressively accumulates data. If TCP_CORK is enabled in a socket, it will not send data until the buffer fills to a fixed limit. Similar to Nagle's algorithm, it also accumulates data from user but until the buffer fills to a fixed limit not until receiving ACK. This will be useful while sending multiple blocks of data. But you have to be more careful while using TCP_CORK.

Until 2.6 kernel, both of these options are mutually exclusive. But in later kernel, both of them can exist together. In such case, TCP_CORK will be given more preference.

Ref:

  • http://baus.net/on-tcp_cork/
  • http://ccr.sigcomm.org/archive/2001/jan01/ccr-200101-mogul.pdf

Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?

You have two questions:

  1. Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?
  2. There must be some reason why they felt it was insufficient, which led them to introduce a new/proprietary TCP_CORK flag instead. Can anybody explain what that reason was?

First see the answers in this Stack Overflow Question, because the are related in the since that question generally describes the difference between the two without reference to your usecase.

  • TCP_NODELAY ON means send the data (partial frames) the moment you get, regardless if you have enough frames for a full network packet.
  • TCP_NODELAY OFF means Nagles Algoritm which means send the data when it is bigger than the MSS or waiting for the receiving acknowledgement before sending data which is smaller.
  • TCP_CORK ON means don't send any data (partial frames) smaller than the MSS until the application says so or until 200ms later.
  • TCP_CORK OFF means send all the data (partial frames) now.

This means in your given use case in the first example no partial frames are sent until the end, but in your second example partial frames with a receiving acknowledgement will be sent.

Also the final send in your first example , Nagle's algorithm still applies to the partial frames after the uncorking , where as in the second example it doesn't.

The short version is the TCP_NODELAY sends doesn't accumulate the logical packets before sending then as network packets, Nagle's algorithm does according the algorithm, and TCP_CORK does according to the application setting it.

A side effect of this is that Nagle's algorithm will send partial frames on an idle connection, TCP_CORK won't.

Additionally TCP_CORK was introduced into the Linux Kernel in 2.2 (specifically 2.1.127 see here), but until 2.5.71 it was mutually exclusive with TCP_NODELAY. E.g In 2.4 kernels you could use one or the other, but in 2.6 you can combine the two, and TCP_CORK will take precedence when it is applied.

Regarding your second question.

To quote Linus Torvalds

Now, TCP_CORK is basically me telling David Miller that I refuse to play
games to have good packet size distribution, and that I wanted a way for
the application to just tell the OS: I want big packets, please wait until
you get enough data from me that you can make big packets.

Basically, TCP_CORK is a kind of "anti-nagle" flag. It's the reverse of
"no-nagle".

Another quote also by Linus is regarding usage of TCP_CORK is the following

Basically, TCP_CORK is useful whenever the server knows the patterns of
its bulk transfers. Which is just about 100% of the time with any kind of
file serving.

For more quotes see the link with Sendfile Mailing List Discussion.

In summary, in addition to TCP_MAXSEG and MSGMORE when calling writev, TCP_CORK is another tool which allows the application in userspace to have more fine grained control over packet size distribution.

References and further reading

  • Earthquaky kernel interfaces
  • Sendfile Kernel Mailing Discussion (where the quote comes from)
  • TCP/IP options for high-performance data transmission
  • Rethinking the TCP Nagle Algorithm
  • TCP_CORK: More than you ever wanted to know
  • The C10K problem
  • TCP man page
  • The Linux Programming Interface Page 1262

Set TCP_QUICKACK and TCP_NODELAY

There's no direct relationship between those two options, they are just for different purposes.

TCP_NODELAY is intended to disable/enable segment buffering so data can be sent out to peer as quickly as possible, so this is typically used to improve network utilisation. TCP_QUICKACK is used to send out acknowledgements as early as possible than delayed under some protocol level exchanging, and it's not stable/permanent, subsequent TCP transactions (which may happen under the hood) can disregard this option depending on actual protocol level processing or any actual disagreements between user setting and stack behaviour.

NOTE TCP_NODELAY is portable while TCP_QUICKACK is not (only works under Linux 2.4.4+).

Does setting TCP_NODELAY affect the behaviour of both ends of the socket?

TCP_NODELAY affect sending TCP segments only on the host that sets this option on its socket. That is, the peer's sending algorithm is not affected.

How does TCP_NODELAY affect consecutive write() calls?


My question is: are these 3 byte arrays sent in one packet?

As you have disabled the Nagle algorithm, almost certainly not, but you can't be 100% sure.

I know you don't have much control over how TCP constructs a network packet, but is there any way to tell the socket to (at least try to) pack these byte arrays

Yes. Don't disable the Nagle algorithm.

so network overhead is avoided? Could manually packing the byte arrays and sending them in one call to write help?

Yes, or simpler still just wrap the socket output stream in a BufferedOutputStream and call flush() when you want the data to be sent, as per your present code. You are correct that flush() does nothing on a socket output stream, but it flushes a BufferedOutputStream.



Related Topics



Leave a reply



Submit