Socket Shutdown and Rebind - How to Avoid Long Wait

Socket shutdown and rebind - How to avoid long wait?

I'm not sure how to do it in Python, but you want to set the SO_REUSEADDR socket option.

How to avoid TIME_WAIT for server sockets?

If you follow the TCP state machine diagram, you will see that it is mandatory for the socket to transition to the TIME-WAIT state if the socket initiated sending the FIN. Using shutdown(sockTX, 2) without waiting for the client's FIN does exactly that.

If you want the server to wait for the client's FIN, you block on recv() waiting for a 0 return value first. Then, you can close().

Note that unless you have duplicated the socket in some way (either with dup*() or with a fork() call), there is no need to call shutdown() if it is being immediately followed by a close(). You can just call close() (if the socket has been duplicated, the FIN will only be sent when the last copy is closed).

There is no need to shutdown() the accepting socket (sockS) at all.

So, I would change your server side to look something like the following to clean up the sockets:

while (recv(sockTX,...) > 0) {}
close(sockTX);

close(sockS);

close vs shutdown socket?

This is explained in Beej's networking guide. shutdown is a flexible way to block communication in one or both directions. When the second parameter is SHUT_RDWR, it will block both sending and receiving (like close). However, close is the way to actually destroy a socket.

With shutdown, you will still be able to receive pending data the peer already sent (thanks to Joey Adams for noting this).

How to forcibly close a socket in TIME_WAIT?

Actually there is a way to kill a connection - killcx. They claim it works in any state of the connection (which I have not verified). You need to know the interface where communication happens though, it seems to assume eth0 by default.

UPDATE: another solution is cutter which comes in some linux distros' repositories.

bind() fails after using SO_REUSEADDR option for wildcard socket in state TIME_WAIT

Setting the linger time to zero will cause your socket not to wait for unsent data to be sent (all unsent data is discarded at once), however it will only for sure avoid TIME_WAIT state if the other end has already closed its write pipe.

A socket can be seen as two pipes. A read pipe and a write pipe. Your read pipe is connected to the write pipe of the other side and your write pipe is connected to the read pipe of the other side. When you open a socket, both pipes are opened and when you close a socket, both pipes are closed. However, you can close individual pipes using the shutdown() call.

When you use shutdown to close your write pipe (SHUT_WR or SHUT_RDWR), your socket may end up in TIME_WAIT, even if linger time is zero. And when you call close() on a socket, it will implicitly close both pipes, unless already closed, and if it did close the write pipe, it will have to wait, even if it dropped any pending data from the send buffer.

If the other side calls close() first or at least calls shutdown() with SHUT_WR and only after that you call close(), socket closing may only be delayed by the linger time to ensure unsent data is sent or data in flight is acknowledged. After all data has been sent and acknowledged or after the linger timeout has been hit, whatever happens first, the socket will close at once and not remain in TIME_WAIT state, as it was the other side who initiated the disconnect first.

On some systems setting linger time to zero causes sockets to be closed by reset (RST) instead of an normal close (FIN, ACK), in which case all unsent data is discarded and the socket will not go into TIME_WAIT either, as that is not required after a reset, not even if you closed the socket first. But if a linger time of zero triggers a reset or not is system dependent, you cannot rely on that as there is no standard that defines this behavior. It can also vary if your sockets are blocking or non-blocking and whether shutdown() has been called prior to close() or not.

However, if your app crashes or is killed in the middle of a TCP transmission, both pipes are open and the system has to close the socket on your behalf. In that case some systems will simply ignore any linger configuration and just fall back to standard behavior which you will also get if linger is disabled completely. This means you may end up in TIME_WAIT even with a linger time of zero on systems that would otherwise support closing a socket by reset. Again, this is system specific but has already bitten me in the past on macOS systems.

As for SO_REUSEADDR, this setting does not necessarily allow reuse across different processes for sockets in TIME_WAIT state. If process X has opened socketA and now socketA is in TIME_WAIT state, then process X can for sure bind socketB to the same address and port as socketA, if, and only if it uses SO_REUSEADDR (in case of Linux, both, the socket waiting and the new one requires that flag, in BSD only the new one requires it). But process Y may not be able to bind to a socket to the same address and port as socketA, while socketA is still in TIME_WAIT state for security reasons.

Again, this is system specific and Linux does not always behave like BSD would or POSIX expects. It may also depend on the port number your are using. Sometimes this limitation only applies to ports below 1024 (most people testing behavior forget to test for both, ports above and below 1024). Some system will additionally restrict reuse to the same user (IIRC Windows has such kinds of restrictions).

So what could you possibly do to work around the issue? SO_REUSEPORT is an option, as it has no restriction regarding using exactly the same address+port combination in different processes, since it has explicitly been introduced to Linux to allow port re-use by different processes for the purpose of load balancing between multiple server processes.

Another possibility is to catch any termination of your program (as much as that is possible) and then somehow make the other side close the socket first. As long as the other side initiates the close operation, you will never end up in TIME_WAIT. Of course, pulling this off is tricky and maybe impossible inside a signal handler that is called because your app has crashed, as what you can do in a signal handler is very limited. Usually you work around this by handling signals outside of the handler but if that was a crash signal, it's not clear which calls you can still safely perform and which ones you cannot, even if you handle signals on a different thread than then one who just crashed. Also note that you cannot catch SIGKILL and even when killed like this, the system will cleanly close your sockets.

A nice programmatic work-around: Make two processes. One parent process, which does all the socket management and that spawns a child process that then deals with the actual server implementation. If the child process is killed, the parent process still owns all sockets, can still close them cleanly, can re-bind to the same address and port using SO_REUSEADDR and it can even spawn a new child process, so your server continues running.

Some references:

https://groups.google.com/g/comp.os.linux.development.system/c/sqxTvgccEzk
https://groups.google.com/g/alt.winsock.programming/c/md6bsoy08Fk
https://www.nybek.com/blog/2015/03/05/cross-platform-testing-of-so_linger/
https://www.nybek.com/blog/2015/04/29/so_linger-on-non-blocking-sockets/

How do I close a server socket while it is waiting for a connection?

This is where the Python standard library lends you a helping hand. Check out the asyncore module, or Twisted.

There is usually no reason not to go asynchronous with Python, since it is so easy. As a bonus, your code will be more readable and maintainable.

Your code with asyncore:

import asyncore
import socket
import sys

HOST = ''   # Symbolic name meaning all available interfaces
PORT = 9992 # Arbitrary non-privileged port

class ExampleHandler(asyncore.dispatcher_with_send):
    data = ''

    def handle_read(self):
        self.data += self.recv(1024)
        lines = self.data.split('\n')
        if len(lines) > 1:
            self.data = lines[-1]
            for line in lines:
                if not line: continue
                if line == 'CLOSE':
                    global s
                    self.send('You have requested to destroy the connection...\r\n')
                    self.close()
                    # To exit asyncore.loop() immediately,
                    # closing all client connections
                    raise asyncore.ExitNow()
                    # If you want to finish processing the client connections, you
                    # could instead close the server, 
                    server.close()
                else:
                    self.send('OK...%s\r\n' % line)

    def handle_write(self):
        self.send('Welcome to the server. Type something and hit enter\r\n')

class ExampleServer(asyncore.dispatcher):

    def __init__(self, host, port):
        asyncore.dispatcher.__init__(self)
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.set_reuse_addr()
        self.bind((host, port))
        self.listen(5)

    def handle_accept(self):
        pair = self.accept()
        if pair is not None:
            sock, addr = pair
            print 'Connected with ' + addr[0] + ':' + str(addr[1])
            handler = ExampleHandler(sock)
        else:
            print 'socket issue sorry'

server = ExampleServer(HOST, PORT)
try:
    asyncore.loop()
except asyncore.ExitNow:
    pass

Note: I also fixed the string/buffering problem in your original code by adding a line buffer.

How to close socket after python fail?

The problem here is the dirty socket closing which occurs when the script crashes without the proper TCP connection shutdown sequence. Thankfully there's a simple solution which tells the kernel to ignore the fact the socket is already in use (the port it's bound to):

sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

That's all, add that before the bind call and you're set. Debugging your other errors will be much simpler and less time consuming once that's done ;) See more in the docs https://docs.python.org/2/library/socket.html#socket.socket.setsockopt

Proper way to stop listening on a Socket

TCP connection termination correctly involves a four-way handshake. You want both ends to inform the other that they're shutting down and then acknowledge each other's shutdown.

Wikipedia explains the process: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination

This post explains how to make it happen in C#: http://vadmyst.blogspot.com/2008/04/proper-way-to-close-tcp-socket.html

Socket Shutdown and Rebind - How to Avoid Long Wait