TCP: When is EPOLLHUP generated?
For these kind of questions, use the source! Among other interesting comments, there is this text:
EPOLLHUP
is UNMASKABLE event (...). It means that after we receivedEOF
,poll
always returns immediately, making impossiblepoll()
onwrite()
in stateCLOSE_WAIT
. One solution is evident --- to setEPOLLHUP
if and only ifshutdown
has been made in both directions.
And then the only code that sets EPOLLHUP
:
if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
mask |= EPOLLHUP;
Being SHUTDOWN_MASK
equal to RCV_SHUTDOWN |SEND_SHUTDOWN
.
TL; DR; You are right, this flag is only sent when the shutdown has been both for read and write (I reckon that the peer shutdowning the write equals to my shutdowning the read). Or when the connection is closed, of course.
UPDATE: From reading the source code with more detail, these are my conclusions.
About shutdown
:
- Doing
shutdown(SHUT_WR)
sends aFIN
and marks the socket withSEND_SHUTDOWN
. - Doing
shutdown(SHUT_RD)
sends nothing and marks the socket withRCV_SHUTDOWN
. - Receiving a
FIN
marks the socket withRCV_SHUTDOWN
.
And about epoll
:
- If the socket is marked with
SEND_SHUTDOWN
andRCV_SHUTDOWN
,poll
will returnEPOLLHUP
. - If the socket is marked with
RCV_SHUTDOWN
,poll
will returnEPOLLRDHUP
.
So the HUP
events can be read as:
EPOLLRDHUP
: you have receivedFIN
or you have calledshutdown(SHUT_RD)
. In any case your reading half-socket is hung, that is, you will read no more data.EPOLLHUP
: you have both half-sockets hung. The reading half-socket is just like the previous point, For the sending half-socket you did something likeshutdown(SHUT_WR)
.
To complete a a graceful shutdown I would do:
- Do
shutdown(SHUT_WR)
to send aFIN
and mark the end of sending data. - Wait for the peer to do the same by polling until you get a
EPOLLRDHUP
. - Now you can close the socket with grace.
PS: About your comment:
it's counterintuitive to get writable, as the writing half is closed
It is actually expected if you understand the output of epoll
not as ready but as will not block. That is, if you get EPOLLOUT
you have the guarantee that calling write()
will not block. And certainly, after shutdown(SHUT_WR)
, write()
will return immediately.
How do I use EPOLLHUP
You use EPOLLRDHUP
to detect peer shutdown, not EPOLLHUP
(which signals an unexpected close of the socket, i.e. usually an internal error).
Using it is really simple, just "or" the flag with any other flags that you are giving to epoll_ctl
. So, for example instead of EPOLLIN
write EPOLLIN|EPOLLRDHUP
.
After epoll_wait
, do an if(my_event.events & EPOLLRDHUP)
followed by whatever you want to do if the other side closed the connection (you'll probably want to close the socket).
Note that getting a "zero bytes read" result when reading from a socket also means that the other end has shut down the connection, so you should always check for that too, to avoid nasty surprises (the FIN
might arrive after you have woken up from EPOLLIN
but before you call read
, if you are in ET mode, you'll not get another notification).
Given any epoll TCP socket event, if EPOLLRDHUP=0 and EPOLLIN=1; is a subsequent call to read()/recv() guaranteed to return a read size unequal to 0?
If you get an event with EPOLLRDHUP=1
then just close the connection right away without reading. If you get an event with EPOLLRDHUP=0
and EPOLLIN=1
then go ahead and read, but you should be prepared to handle the possibility of recv()
still returning 0, just in case. Perhaps a FIN
arrives after you got EPOLLIN=1
but before you actually call recv()
.
Why am I getting the EPOLLHUP event on a brand new socket
As my example in the comments shows, it seems you can't poll the socket before it's properly initialized, unless you want to handle EPOLLHUP
.
As for the question, no, you won't miss any events. Calling listen()
then epoll()
is the same you'd have to do otherwise (listen()
+ blocking accept()
); actual incoming connections between those calls are handled by the kernel and stay waiting until your code handles them.
EPOLLRDHUP not reliable
To answer this: EPOLLRDHUP
indeed comes if you continue to poll after receiving a zero-byte read.
So from my experiments it looks like either an EPOLLIN
with zero-byte read or an EPOLLRDHUP
are reliable indicators for orderly shutdown, the only trouble was, they are not received together. Sometimes (the case that makes the subject of this question), it happens that EPOLLIN is received, yielding zero bytes (connection terminated), and on subsequent polling you get to see the EPOLLRDHUP
. Other times, it's vice-versa: you get the EPOLLRDHUP
together with an EPOLLIN
that signals actual bytes to be read. Then, on subsequent reads, you get zero bytes.
Epoll and remote 1-way shutdown
I'm answering this myself after doing the heavy lifting to find the answer.
A socket listening for epoll events will typically receive an EPOLLRDHUP (in addition to EPOLLIN) event flag upon the remote peer calling close or shutdown(SHUT_WR). This does not neccessarily mean the socket is dead. Subsequent calls to recv() will return any unread data on the socket and eventually "0" will be returned to indicate EOF. It may even be possible to send data back if the remote peer only did a half-close of its socket.
The one notable exception is if the remote peer is using the SO_LINGER option enabled on its socket with a linger value of "0". The result of closing such a socket may result in a TCP RST getting sent instead of a FIN. From what I've read, a connection reset event will generate either a EPOLLHUP or EPOLLERR. (I haven't had time to confirm, but it makes sense).
There is some documentation to suggest there are older Linux implementations that don't support EPOLLRDHUP, as such EPOLLHUP gets generated instead.
And for what it is worth, in my particular case, I found that it is not too interesting to have code that special cases EPOLLHUP or EPOLLRDHUP events. Instead, just treat these events the same as EPOLLIN/EPOLLOUT and call recv() (or send() as appropriate). But pay close attention to return codes returned back from recv() and send().
POLLHUP vs. POLLRDHUP?
No, when poll()
ing a socket, POLLHUP
will signal that the connection was closed in both directions.
POLLRDHUP
will be set when the other end has called shutdown(SHUT_WR)
or when this end has called shutdown(SHUT_RD)
, but the connection may still be alive in the other direction.
You can have a look at net/ipv4/tcp.c
the kernel source:
if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
mask |= EPOLLHUP;
if (sk->sk_shutdown & RCV_SHUTDOWN)
mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
SHUTDOWN_MASK
is RCV_SHUTDOWN|SEND_SHUTDOWN
. RCV_SHUTDOWN
is set when a FIN
packet is received, and SEND_SHUTDOWN
is set when a FIN
packet is acknowledged by the other end, and the socket moves to the FIN-WAIT2
state.
[except for the TCP_CLOSE
part, that snippet is replicated by all protocols; and the whole thing works similarly for unix sockets, etc]
There are other important differences -- POLLRDHUP
(unlike POLLHUP
) has to be set explicitly in .events
in order to be returned in .revents
.
And POLLRDHUP
only works on sockets, not on fifos/pipes or ttys.
Related Topics
Linux, Why Can't I Write Even Though I Have Group Permissions
What Are My Environment Variables
Rm Fails to Delete Files by Wildcard from a Script, But Works from a Shell Prompt
Difference Between Netstat and Ss in Linux
Difference Between Vm.Dirty_Ratio and Vm.Dirty_Background_Ratio
How Does Epoll's Epollexclusive Mode Interact with Level-Triggering
Id_Rsa.Pub File Ssh Error: Invalid Format
Should I Put Trailing Slash After Source and Destination When Copy Folders
How to Fix Symbol Lookup Error: Undefined Symbol Errors in a Cluster Environment
How to Check If a Process Is in Hang State (Linux)
How to Fetch Java Version Using Single Line Command in Linux
Fata[0000] Get Http:///Var/Run/Docker.Sock/V1.17/Version: Dial Unix /Var/Run/Docker.Sock
Convert Multipage PDF to a Single Image
How to Get "Requests Per Second" for Apache in Linux
Ubuntu: Using Curl to Download an Image
Can't Install Python-Dev on Centos 6.5