Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?
You have two questions:
- Is there any significant difference between TCP_CORK and TCP_NODELAY in this use-case?
- There must be some reason why they felt it was insufficient, which led them to introduce a new/proprietary TCP_CORK flag instead. Can anybody explain what that reason was?
First see the answers in this Stack Overflow Question, because the are related in the since that question generally describes the difference between the two without reference to your usecase.
- TCP_NODELAY ON means send the data (partial frames) the moment you get, regardless if you have enough frames for a full network packet.
- TCP_NODELAY OFF means Nagles Algoritm which means send the data when it is bigger than the MSS or waiting for the receiving acknowledgement before sending data which is smaller.
- TCP_CORK ON means don't send any data (partial frames) smaller than the MSS until the application says so or until 200ms later.
- TCP_CORK OFF means send all the data (partial frames) now.
This means in your given use case in the first example no partial frames are sent until the end, but in your second example partial frames with a receiving acknowledgement will be sent.
Also the final send in your first example , Nagle's algorithm still applies to the partial frames after the uncorking , where as in the second example it doesn't.
The short version is the TCP_NODELAY sends doesn't accumulate the logical packets before sending then as network packets, Nagle's algorithm does according the algorithm, and TCP_CORK does according to the application setting it.
A side effect of this is that Nagle's algorithm will send partial frames on an idle connection, TCP_CORK won't.
Additionally TCP_CORK was introduced into the Linux Kernel in 2.2 (specifically 2.1.127 see here), but until 2.5.71 it was mutually exclusive with TCP_NODELAY. E.g In 2.4 kernels you could use one or the other, but in 2.6 you can combine the two, and TCP_CORK will take precedence when it is applied.
Regarding your second question.
To quote Linus Torvalds
Now, TCP_CORK is basically me telling David Miller that I refuse to play
games to have good packet size distribution, and that I wanted a way for
the application to just tell the OS: I want big packets, please wait until
you get enough data from me that you can make big packets.
Basically, TCP_CORK is a kind of "anti-nagle" flag. It's the reverse of
Another quote also by Linus is regarding usage of TCP_CORK is the following
Basically, TCP_CORK is useful whenever the server knows the patterns of
its bulk transfers. Which is just about 100% of the time with any kind of
For more quotes see the link with Sendfile Mailing List Discussion.
In summary, in addition to TCP_MAXSEG and MSGMORE when calling writev, TCP_CORK is another tool which allows the application in userspace to have more fine grained control over packet size distribution.
References and further reading
- Earthquaky kernel interfaces
- Sendfile Kernel Mailing Discussion (where the quote comes from)
- TCP/IP options for high-performance data transmission
- Rethinking the TCP Nagle Algorithm
- TCP_CORK: More than you ever wanted to know
- The C10K problem
- TCP man page
- The Linux Programming Interface Page 1262
When should I use TCP_NODELAY and when TCP_CORK?
First of all not both of them disables Nagle's algorithm.
Nagle's algorithm is for reducing more number of small network packets in wire. The algorithm is: if data is smaller than a limit (usually MSS), wait until receiving ACK for previously sent packets and in the mean time accumulate data from user. Then send the accumulated data.
if [ data > MSS ]
wait until ACK for previously sent data and accumulate data in send buffer (data)
And after receiving the ACK send(data)
This will help in applications like telnet. However, waiting for the ACK may increase latency when sending streaming data. Additionally, if the receiver implements the 'delayed ACK policy', it will cause a temporary deadlock situation. In such cases, disabling Nagle's algorithm is a better option.
So TCP_NODELAY is used for disabling Nagle's algorithm.
TCP_CORK aggressively accumulates data. If TCP_CORK is enabled in a socket, it will not send data until the buffer fills to a fixed limit. Similar to Nagle's algorithm, it also accumulates data from user but until the buffer fills to a fixed limit not until receiving ACK. This will be useful while sending multiple blocks of data. But you have to be more careful while using TCP_CORK.
Until 2.6 kernel, both of these options are mutually exclusive. But in later kernel, both of them can exist together. In such case, TCP_CORK will be given more preference.
Is there an equivalent to TCP_CORK in Winsock?
FWIW I successfully use TCP_NODELAY to get TCP_CORK-style behavior. I do it like this:
- unset the TCP_NODELAY flag on the socket
- Call send() zero or more times to add your outgoing data into the Nagle-queue
- set the TCP_NODELAY flag on the socket
- call send() with the number-of-bytes argument set to zero, to force an immediate send of the Nagle-queued data
That works fine for me under Windows, MacOS/X, and Linux. (Note that under Linux the final zero-byte send() isn't necessary)
Does setting TCP_NODELAY affect the behaviour of both ends of the socket?
TCP_NODELAY affect sending TCP segments only on the host that sets this option on its socket. That is, the peer's sending algorithm is not affected.
What is the difference between Nagle algorithm and 'stop and wait'?
In a stop and wait protocol, one
- sends a message to the peer
- waits for an ack for that message
- sends the next message
(i.e. one cannot send a new message until the previous one has been acknowledged)
Nagle's algorithem as used in TCP is orthoginal to this concept. When the TCP application sends some data, the protocol buffers the data and waits a little while to see if there's more data to be sent instead of sending data to the peer immediately.
If the application has more data to send in this small timeframe, the protocol stack merges that data into the current buffer and can send it as one large message.
This concept could very well be applied to a stop and go protocol as well. (Note that TCP is not a stop and wait protocol)