Application Control of Tcp Retransmission on Linux

Application control of TCP retransmission on Linux

Looks like this was added in Kernel 2.6.37.
Commit diff from kernel Git and Excerpt from change log below;

commit
dca43c75e7e545694a9dd6288553f55c53e2a3a3
Author: Jerry Chu
Date: Fri Aug 27 19:13:28 2010 +0000

tcp: Add TCP_USER_TIMEOUT socket option.

This patch provides a "user timeout" support as described in RFC793. The
socket option is also needed for the the local half of RFC5482 "TCP User
Timeout Option".

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int,
when > 0, to specify the maximum amount of time in ms that transmitted
data may remain unacknowledged before TCP will forcefully close the
corresponding connection and return ETIMEDOUT to the application. If 
0 is given, TCP will continue to use the system default.

Increasing the user timeouts allows a TCP connection to survive extended
periods without end-to-end connectivity. Decreasing the user timeouts
allows applications to "fail fast" if so desired. Otherwise it may take
upto 20 minutes with the current system defaults in a normal WAN
environment.

The socket option can be made during any state of a TCP connection, but
is only effective during the synchronized states of a connection
(ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, or LAST-ACK).
Moreover, when used with the TCP keepalive (SO_KEEPALIVE) option,
TCP_USER_TIMEOUT will overtake keepalive to determine when to close a
connection due to keepalive failure.

The option does not change in anyway when TCP retransmits a packet, nor
when a keepalive probe will be sent.

This option, like many others, will be inherited by an acceptor from its
listener.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

TCP retransmission different payload

TCP retransmission is not about sending missing packets again but about sending missing data. It might well happen that the retransmitted data get combined with following data in the same packet. Your question only shows a different length of the payload and not (as claimed in the title) a different payload. The resulting application data are likely the same, only they are packetized differently for transport.

TCP Retransmission: how many packets will be re-sent?

This is a very broad question.

No, this is not basically nor necessarily CUBIC.

Retransmission is firstly specified in TCP "basic" RFC 793 (1981), section 3.7 Data Communication, paragraph "Retransmission Timeout".

Since then, there's been many (really many[*]) enhancements. A very noticeable one is "slow-start", last specified by RFC 5681, but which roots go back to 1997 RFC 2001, "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms".

There's no "one size fits all" in this domain, there's always trade-off. Plus "smart" algorithms will have more footprint (software + CPU use) so they may or may not be used or simply even available depending on the application (think embedded devices). And as these things are in the implementation (i.e. not seen in the exchanged data between host), you can never know for sure which host use which. You'll see the TCP window size and scale in the segments for instance, but you will not know by which algorithm it is managed.

As for Linux, it's supposed to default to PRR since 3.2. Prior to that was CUBIC, and prior BIC.

Though, default does not mean it's the only one available. On my debian stock 4.4.0 kernel, it's CUBIC:

jbmaillet@sumo:~$ cat /proc/sys/net/ipv4/tcp_congestion_control
cubic

Though Reno is available too:

jbmaillet@sumo:~$ cat /proc/sys/net/ipv4/tcp_allowed_congestion_control
cubic reno

...and there's a dozen available in the "TCP: advanced congestion control" section of the kernel configuration.

*:
https://en.wikipedia.org/wiki/TCP_congestion-avoidance_algorithm

TCP retransmission on RST - Different socket behaviour on Windows and Linux?

Nice find! According to this, Windows´ TCP will retry a connection if it receives a RST/ACK from the remote host after sending a SYN:

... Upon receiving the ACK/RST client from the target host, the client determines that there is indeed no service listening there. In the Microsoft Winsock implementation of TCP, a pending connection will keep attempting to issue SYN packets until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times)...

The value used to limit those retries is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxConnectRetransmissions according to the same article. At least in Win10 Pro it doesn´t seem to be present by default.

Although this is a conveniece for Windows machines, an application still should determine its own criteria for handling a failed connect attempt IMO (i. e number of attempts, timeouts etc).

Anyhow, as I said, surprising fact! Living and learning I guess ...

Cristian.

TCP ceases retransmissions before reaching the default of 15 attempts (after physical disconnection)

Maybe the entry in the ARP table is expiring and when the ARP requests are sent again there is a timeout for no-reponse? Did you run arp -a?
Maybe setting gc_timeout is not enough and you also need to set gc_stale_time? I read at this entry with a great explanation about how it works. The guy was trying to do almost the opposite you are trying. Configuring ARP age timeout

There is another thread to investigate. Maybe you should also change tcp_retries1?
Is it possible to change the Retransmission Timeout (RTO)?

Also I looked at Kernel documentation, file ip-sysctl.txt and I got:

tcp_retries1 - INTEGER
This value influences the time, after which TCP decides, that
something is wrong due to unacknowledged RTO retransmissions,
and reports this suspicion to the network layer.
See tcp_retries2 for more details.
RFC 1122 recommends at least 3 retransmissions, which is the
default.

tcp_retries2 - INTEGER
This value influences the timeout of an alive TCP connection,
when RTO retransmissions remain unacknowledged.
Given a value of N, a hypothetical TCP connection following
exponential backoff with an initial RTO of TCP_RTO_MIN would
retransmit N times before killing the connection at the (N+1)th RTO.
The default value of 15 yields a hypothetical timeout of 924.6
seconds and is a lower bound for the effective timeout.
TCP will effectively time out at the first RTO which exceeds the
hypothetical timeout.
RFC 1122 recommends at least 100 seconds for the timeout,
which corresponds to a value of at least 8.

In other thread I read about socket option TCP_USER_TIMEOUT. I've never use it but it could be an easy solution.
Application control of TCP retransmission on Linux

I hope one of these options helps.

Disabling of TCP SYN retransmission

My requirement is that if TCP connection is not established with the 1st server within 2s, client needs to establish connection with the 2nd server.

In this case just do a connect with a timeout of 2 seconds at the client and if the connect does not succeed retry with the other server. Once you have closed the socket the kernel will stop trying to connect to the first server. This is much better and more independent from a specific platform than to fiddle with the built in reliability behavior and timing of TCP.

Application Control of Tcp Retransmission on Linux