Does a Tcp Socket Connection Have a "Keep Alive"

Does a TCP socket connection have a keep alive?

TCP sockets remain open till they are closed.

That said, it's very difficult to detect a broken connection (broken, as in a router died, etc, as opposed to closed) without actually sending data, so most applications do some sort of ping/pong reaction every so often just to make sure the connection is still actually alive.

java.net.Socket TCP keep-alive usage

As the documentation for setKeepAlive() says, it will enable (or disable) the SO_KEEPALIVE option on the socket.

When the keepalive option is set for a TCP socket and no data has been exchanged across the socket in either direction for 2 hours (NOTE: the actual value is implementation dependent), TCP automatically sends a keepalive probe to the peer. This probe is a TCP segment to which the peer must respond. One of three responses is expected: 1. The peer responds with the expected ACK. The application is not notified (since everything is OK). TCP will send another probe following another 2 hours of inactivity. 2. The peer responds with an RST, which tells the local TCP that the peer host has crashed and rebooted. The socket is closed. 3. There is no response from the peer. The socket is closed. The purpose of this option is to detect if the peer host crashes. Valid only for TCP socket: SocketImpl

Here is another reference explaining the SO_KEEPALIVE option.

Note that in networking, connections can be lost at any time for a myriad of reasons. If the connection traverses a NAT router, the entry in the NAT table could expire (when the connection is idle) and the connection is lost due to that. The client could cease to function, or be suspended (especially laptops and mobile devices), or a cable could be disconnected, or WiFi (or cellular) signal could be interfered with, or ... the list can go on. Your server needs to be written to cope gracefully with loss of connection.

Keep TCP socket connection alive and read/write coordination

So I finally solved this problem. Many thanks to @bigdataolddriver 's offline help. I learned a lot about ncat debugging among other things.

I basically

on server side: gave up on the idea of using Python's socketserver module. For one, I found out that it's synchronous only.
on client side: used asio::ip::tcp::socket::read_some / asio::ip::tcp::socket::write_some instead of asio::read / asio::write.

Here is the new server code based on just the socket module.

import socket
import sys
import threading

_dostuff = True

def run_cmd_server():
    global _dostuff
    # Create a TCP/IP socket
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    # Bind the socket to the port
    server_address = ('localhost', 1234)
    print('CmdServer: starting up on {} port {}'.format(*server_address))
    sock.bind(server_address)

    # Listen for incoming connections
    sock.listen(1)

    while True:
        # Wait for a connection
        print('CmdServer: waiting for a connection')
        connection, client_address = sock.accept()
        try:
            print('CmdServer: connection from client:', client_address)

            # Receive the data in small chunks and retransmit it
            while True:
                data = connection.recv(1024)
                print('received {!r}'.format(data))
                if not data:
                    print('no data from', client_address)
                    break
                cmd = data.decode('utf-8').strip('\x00')
                if cmd == 'Hello':
                    print('Cmd: {}'.format(cmd))
                    connection.sendall('ACK\x00'.encode('utf-8'))
                elif cmd == 'Stop':
                    print('Cmd: {}'.format(cmd))                    
                    _dostuff = False
                    print('_dostuff : {}'.format(_dostuff ))
                elif cmd == 'Start':
                    _dostuff = True
                    print('_dostuff : {}'.format(_dostuff ))
                else:
                    print('Misc: {}'.format(cmd))
                connection.sendall('ack\x00'.encode('utf-8'))
        except:
            continue;
        finally:
            # Clean up the connection
            connection.close()

def main():
    t1 = threading.Thread(target=run_cmd_server, name='t_cmd', daemon=True)
    t1.start()
    t1.join()

if __name__ == '__main__':
    main()

And here is the new client code:

virtual bool Connect() override {
        bool isInitialized = false;
        try {
            asio::io_context io_context;
            asio::ip::tcp::resolver resolver(io_context);
            asio::ip::tcp::resolver::query query("127.0.0.1", "1234");
            asio::ip::tcp::resolver::iterator endpoint_iterator = resolver.resolve(query);
            asio::ip::tcp::socket socket(io_context);
            asio::connect(socket, endpoint_iterator);
            while (true) {
                std::array<char, 1024> readBuf{'\0'};
                asio::error_code error;
                // Handshaking
                // - on connection, say hello to cmd-server; wait for ACK
                if ( ! isInitialized ) {
                    debug("CmdClient {}: handshaking ...", m_id.c_str());
                    std::string handshake("Hello");
                    size_t len = socket.write_some(asio::buffer(handshake.c_str(), handshake.length()), error);
                    if (error == asio::error::eof) {
                        asio::connect(socket, endpoint_iterator);
                        continue; // Connection closed cleanly by peer; keep trying.
                    }
                    else if (error)
                        throw asio::system_error(error); // Some other error.
                    len = socket.read_some(asio::buffer(readBuf), error);
                    if (len <= 0) {
                        debug("CmdClient {}: No response", m_id.c_str());
                    }
                    std::string received = std::string(readBuf.data());
                    if (received == std::string("ACK")) {
                        debug("CmdClient {}: handshaking ... SUCCESS!", m_id.c_str());
                        isInitialized = true;
                        Notify("ACK");
                    }
                    else {
                        debug("CmdClient {}: Received: {}", m_id.c_str(), received.c_str());
                    }
                    continue;
                }
                SendCommand(socket);

            }
        }
        catch (std::exception& e) {
            std::cerr << e.what() << std::endl;
            isInitialized = false;
        }
        return true;
    }

    void SendCommand(asio::ip::tcp::socket& socket) {
        std::string cmd("");
        switch (m_cmd) {
        case NoOp:
            break;
        case Hello:
            cmd = "Hello";
            break;
        case Stop:
            cmd = "Stop";
            break;
        case Start:
            cmd = "Start";
            break;
        default:
            break;
        }
        if (cmd.size() > 0) {
            size_t len = socket.write_some(asio::buffer(cmd.c_str(), cmd.length()));
            m_cmd = NoOp;  // Avoid resend in next frame;
        }
    }

I have yet to use ASIO's async feature (very much scared to do so right after this debugging session). But right now at least this code works as I expected: The server can receive commands from client normally.

On a side note, since there is only one thread writing into the global variable _dostuff, I removed thread locking.

I'd still appreciate it if anyone knows where exactly my original implementation was faulty.

TCP socket state become persist after changing IP address even configured keep-alive early

In short: TCP keepalive is only relevant if the connection is idle, i.e. no data to send. If instead there are still data to send but sending is currently impossible due to missing ACK or a window of 0 then other timeouts are relevant. This is likely the problem in your case.

For the deeper details see The Cloudflare Blog: When TCP sockets refuse to die.

When is Keep-alive required for TCP Sockets?

What happens when keep-alive option detects a dead socket?

The connection is reset, and any reads or writes get a 'connection reset' error. Note that keepalive is off by default, and when enabled only operates at two-hour intervals by default.

How can I check if connection is alive or dead without actually using the send and recv?

You can't. TCP/IP is deliberately designed not to have a 'dial tone'. It works much better that way. This is a major reason why it has displaced all the prior protocols such as SNA that did.

If I have to use send and recv functions then what's the point of using keep-alive in the first place?

recv() won't tell you about a broken connection. It may just block forever. You can use read timeouts, but then you have to decide how much time is too much. Or, you can implement an application-level PING.

Do I need to heartbeat to keep a TCP connection open?

The connection should remain open regardless but yes it's often common to see protocols implement a heartbeat in order to help detect dead connections, IRC with the PING command for example.

How expensive is maintaining a TCP socket in java

It's more expensive to re-open the connection regularly; there is a three-way handshake on open. Once the socket is open, that cost can be amortized (but only if you leave it open).

Does a Tcp Socket Connection Have a "Keep Alive"