Handling Ssl_Shutdown Correctly

Handling SSL_shutdown correctly

openssl is a bit of a dark art.

Firstly the page you referenced has HTML-ified the return values badly. Here's what the man-page actually says:

  RETURN VALUES

   The following return values can occur:

   0   The shutdown is not yet finished. Call SSL_shutdown() for a second
       time, if a bidirectional shutdown shall be performed.  The output
       of SSL_get_error(3) may be misleading, as an erroneous
       SSL_ERROR_SYSCALL may be flagged even though no error occurred.

   1   The shutdown was successfully completed. The "close notify" alert
       was sent and the peer's "close notify" alert was received.

   -1  The shutdown was not successful because a fatal error occurred
       either at the protocol level or a connection failure occurred. It
       can also occur if action is need to continue the operation for non-
       blocking BIOs.  Call SSL_get_error(3) with the return value ret to
       find out the reason.

If you have blocking BIOs, things are relatively simple. A 0 on the first call means you need to call SSL_shutdown again if you want a full bidirectional shutdown. Basically it means that you sent a close_notify alert but haven't one back yet). A 1 would mean you previously received a close_notify alert from the other peer, and you're totally done. A -1 means an unrecoverable error. On the second call (which you only do if you got a 0 back), then a bidirectional shutdown is initiated (i.e. now wait from the other side for them to send you their "close_notify" alert). Logic dictates you can't get a 0 back again (because it's a blocking BIO and will have completed the first step). A -1 indicates an error, and a 1 indicates completion success.

If you have non-blocking BIOs, the same "possibly 0 then 1" return values apply, save for the fact you need to go through the whole SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE rigmarole as well, i.e.:

   If the underlying BIO is non-blocking, SSL_shutdown() will also return
   when the underlying BIO could not satisfy the needs of SSL_shutdown()
   to continue the handshake. In this case a call to SSL_get_error() with
   the return value of SSL_shutdown() will yield SSL_ERROR_WANT_READ or
   SSL_ERROR_WANT_WRITE. The calling process then must repeat the call
   after taking appropriate action to satisfy the needs of SSL_shutdown().
   The action depends on the underlying BIO. When using a non-blocking
   socket, nothing is to be done, but select() can be used to check for
   the required condition. When using a buffering BIO, like a BIO pair,
   data must be written into or retrieved out of the BIO before being able
   to continue.

So you have two levels of repetition. You call SSL_shutdown the 'first' time but repeat if you get SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE after going around the select() loop in the normal way, and only count the 'first' SSL_shutdown as done if you get a non SSL_ERROR_WANT_ error code (in which case it failed), or you get a 0 or 1 return. If you get a 1 return, you've done. If you get a 0 return, and you want a bidirectional shutdown, then you have to do the second call, on which again you will need to check for SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE and retry select; that should not return 1, but may return 0 or an error.

Not simple.

Couple more notes from the docs: after calling SSL_shutdown and getting a "0" back the first time, you could optionally then call SSL_read instead of SSL_shutdown (in case the peer is still sending you any data on that SSL socket), and, I guess, "hope" that they eventually send you a close message from their side, to flush the pipes.

Also if you're planning on closing the socket after shutdown completion "anyway" you could entirely skip the second call to SSL_shutdown (the "1" of the "0 then 1") and just go ahead and close the socket, the kernel should take care of discarding the "now ignored" close_notify alert that presumably they should be about to send...

SSL_shutdown() returns -1 and errno is 0

A full SSL shutdown consists of two parts:

sending the 'close notify' alert to the peer
receiving the 'close notify' alert from the peer

The first SSL_shutdown returned 0 which means that it did send the 'close notify' to the peer but did not receive anything back yet. The second call of SSL_shutdown fails because the peer did not do a proper SSL shutdown and send a 'close notify' back, but instead just closed the underlying TCP connection.

This behavior is actually very common and you can usually just ignore the error. It does not matter much if the underlying TCP connection should be closed anyway. But a proper SSL shutdown is usually needed when you want to continue in plain text on the same TCP connection, like needed for the CCC command in FTPS connections (but even there various implementation fail to handle this case properly).

SSL Socket free and shutdown

SSL_shutdown only sends an close notify over the socket. If you want to reuse the socket afterwards as a plain socket you have to make sure, that the other side did also an SSL_shutdown. This information gives you the return code of your SSL_shutdown: if it is 1 the SSL connection is closed, if it is 0 you should call SSL_shutdown again so wait for the close notify from the peer. Please see the SSL_shutdown documentation for more information.

After this is done you can continue to use the socket as a plain socket. This what is done in SSL over FTP (ftps), e.g. with "AUTH TLS" the connection will be upgraded to SSL and with "CCC" it will be downgraded to plain text again.

SSL_free only frees the memory associated with the SSL object, it does not change anything on the socket nor does it send/receive any data.

OpenSSL error handling

SSL_get_error:

SSL_get_error() returns a result code (suitable for the C "switch"
statement) for a preceding call to SSL_connect(), SSL_accept(),
SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The
value returned by that TLS/SSL I/O function must be passed to
SSL_get_error() in parameter ret.

ERR_get_error:

ERR_get_error() returns the earliest error code from the thread's
error queue and removes the entry. This function can be called
repeatedly until there are no more error codes to return.

So the latter is for more general use and those shouldn't be used together, because:

The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably.

So you have to read all of the errors using ERR_get_error and handle them (or ignore them by removal as you did in your code sample with ERR_clear_error) and then perform the IO operation. Your approach seems to be correct, although I can't check all aspects of it by myself at the moment.

Refer to this answer and this post for more information.

EDIT: according to this tutorial, BIO_ routines may generate an error and affect error queue:

The third field is the name of the package that generated the error,
such as "BIO routines" or "bignum routines".

How to prevent SIGPIPEs (or handle them properly)

You generally want to ignore the SIGPIPE and handle the error directly in your code. This is because signal handlers in C have many restrictions on what they can do.

The most portable way to do this is to set the SIGPIPE handler to SIG_IGN. This will prevent any socket or pipe write from causing a SIGPIPE signal.

To ignore the SIGPIPE signal, use the following code:

signal(SIGPIPE, SIG_IGN);

If you're using the send() call, another option is to use the MSG_NOSIGNAL option, which will turn the SIGPIPE behavior off on a per call basis. Note that not all operating systems support the MSG_NOSIGNAL flag.

Lastly, you may also want to consider the SO_SIGNOPIPE socket flag that can be set with setsockopt() on some operating systems. This will prevent SIGPIPE from being caused by writes just to the sockets it is set on.

What is the correct way to handle an OpenSSL error of the type decryption failed or bad record mac?

Yourself being the server here, the message would indicate that either OpenSSL wasn't able to properly decrypt the TLS records sent by the client or that the MAC computation failed, i.e. the client's tag doesn't match the one you computed.

Apart from bugs or network errors (as EJP mentioned), this can also happen if someone actively changed the content sent, the wrong message was sent or only a partial message was sent etc. etc.

In any case, as the server, you don't want to try to rescue such a connection. After all, it could be an attack. The proper way to handle this event is to call SSLSocket#close which will try to execute a SSL_shutdown which again will attempt to send the "close notify" alert to the client, which is the graceful TLS way of telling the peer that you're about to close this connection.

In order to resume communication with a client where this happened, you would negotiate a new connection started from scratch.

Handling Ssl_Shutdown Correctly