Why Having to Use Non-Blocking Fd in a Edge Triggered Epoll Function

Too many data on socket using epoll edge trigger

When using edge triggered mode the data must be read in one recv call, otherwise it risks starving other sockets. This issue has been written about in numerous blogs, e.g. Epoll is fundamentally broken.

Make sure that your user-space receive buffer is at least the same size as the kernel receive socket buffer. This way you read the entire kernel buffer in one recv call.

Also, you can process ready sockets in a round-robin fashion, so that the control flow does not get stuck in recv loop for one socket. That works best with the user-space receive buffer being of the same size as the kernel one. E.g.:

auto n = epoll_wait(...);
for(int dry = 0; dry < n;) {
    for(auto i = 0; i < n; i++) {
        if(events[i].events & EPOLLIN) {
            // Do only one read call for each ready socket
            // before moving to the next ready socket.
            auto r = recv(...);
            if(-1 == r) {
                if(EAGAIN == errno) {
                    events[i].events ^= EPOLLIN;
                    ++dry;
                }
                else
                    ; // Handle error.
            }
            else if(!r){
                // Process client disconnect.
            }
            else {
                // Process data received so far.
            }
        }
    }
}

This version can be further improved to avoid scanning the entire events array on each iteration.

In you original post do {} while(n > 0); is incorrect and it leads to an endless loop. I assume it is a typo.

Edge Triggered epoll c

This will indeed work, epoll acts as if all events which happened to the epoll group before you made the epoll_wait call happened the moment you make the call. epoll is designed to be used this way so do not worry about this kind of usage. As long as you handle all of the events which had been triggered at the time epoll_wait returns you need not worry about any which happens between calls to it, they will be caught next time you call it.

Basically: Your usage is fine, keep going :)

Epoll TCP edge-triggered necessity of last read(2) call

Your question is answered in man 7 epoll. As you see, it depends on the socket type (packet/stream):

Q9 Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?

A9 Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You must consider it ready until the next (nonblocking) read/write yields EAGAIN. When and how you will use the file descriptor is entirely up to you.

For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.

For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor. The same is true when writing using write(2). (Avoid this latter technique if you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)

Using edge triggered epoll, should I loop over send?

epoll man page says:

When to rearm epoll with edge mode & oneshot?

Should I rearm before calling a non-blocking operation or after?

Technically, after, but it's not as simple as that.

Regardless of EPOLLONESHOT, once you receive an edge-triggered event signalling read-readiness of a given file descriptor, you must consider that FD to continue to be ready until a read() on it fails with errno set to EAGAIN (and therefore the file must be in non-blocking mode). Over the course of those reads, it may be the case that you read all remaining bytes with one read(), but then more arrive before the next. In that case, if the FD is still armed then a new event for it will be queued (or merged with another event for that FD, as appropriate). This is the case that could result in you receiving an event when in fact the FD is not any longer ready.

You should consider just accepting those "phantom" events. Since your file will be in non-blocking mode, they will not cause unwanted stalls, just a little extra work. And your code will be simpler. But if you do use EPOLLONESHOT to avoid receiving phantom events, then you must not re-arm the FD before you determine it to be unready (via a read failing with EAGAIN), else you defeat the purpose.

Thus, the full answer is after the FD is determined to be unready. That will take at least two read()s, and possibly more. If the file becomes ready after the last read and before the rearming then the rearming should cause an appropriate event to be queued.