Why is epoll faster than select?
There's a lot of misinformation about this, but the real reason is this:
A typical server might be dealing with, say, 200 connections. It will service every connection that needs to have data written or read and then it will need to wait until there's more work to do. While it's waiting, it needs to be interrupted if data is received on any of those 200 connections.
With select
, the kernel has to add the process to 200 wait lists, one for each connection. To do this, it needs a "thunk" to attach the process to the wait list. When the process finally does wake up, it needs to be removed from all 200 wait lists and all those thunks need to be freed.
By contrast, with epoll
, the epoll
socket itself has a wait list. The process needs to be put on only that one wait list using only one thunk. When the process wakes up, it needs to be removed from only one wait list and only one thunk needs to be freed.
To be clear, with epoll
, the epoll
socket itself has to be attached to each of those 200 connections. But this is done once, for each connection, when it is accepted in the first place. And this is torn down once, for each connection, when it is removed. By contrast, each call to select
that blocks must add the process to every wait queue for every socket being monitored.
Ironically, with select
, the largest cost comes from checking if sockets that have had no activity have had any activity. With epoll
, there is no need to check sockets that have had no activity because if they did have activity, they would have informed the epoll
socket when that activity happened. In a sense, select
polls each socket each time you call select
to see if there's any activity while epoll
rigs it so that the socket activity itself notifies the process.
What are the differences between poll and select?
I think that this answers your question:
From Richard Stevens (rstevens@noao.edu):
The basic difference is that select()'s fd_set is a bit mask and
therefore has some fixed size. It would be possible for the kernel to
not limit this size when the kernel is compiled, allowing the
application to define FD_SETSIZE to whatever it wants (as the comments
in the system header imply today) but it takes more work. 4.4BSD's
kernel and the Solaris library function both have this limit. But I
see that BSD/OS 2.1 has now been coded to avoid this limit, so it's
doable, just a small matter of programming. :-) Someone should file a
Solaris bug report on this, and see if it ever gets fixed.With poll(), however, the user must allocate an array of pollfd
structures, and pass the number of entries in this array, so there's
no fundamental limit. As Casper notes, fewer systems have poll() than
select, so the latter is more portable. Also, with original
implementations (SVR3) you could not set the descriptor to -1 to tell
the kernel to ignore an entry in the pollfd structure, which made it
hard to remove entries from the array; SVR4 gets around this.
Personally, I always use select() and rarely poll(), because I port my
code to BSD environments too. Someone could write an implementation
of poll() that uses select(), for these environments, but I've never
seen one. Both select() and poll() are being standardized by POSIX
1003.1g.
October 2017 Update:
The email referenced above is at least as old as 2001; the poll()
command is now (2017) supported across all modern operating systems - including BSD. In fact, some people believe that select()
should be deprecated. Opinions aside, portability issues around poll()
are no longer a concern on modern systems. Furthermore, epoll()
has since been developed (you can read the man page), and continues to rise in popularity.
For modern development you probably don't want to use select()
, although there's nothing explicitly wrong with it. poll()
, and it's more modern evolution epoll()
, provide the same features (and more) as select()
without suffering from the limitations therein.
poll vs. epoll insight
Always use poll
unless all of the following are satisfied:
- You can ensure you're on a (Linux) system that has
epoll
or you provide a fallback for systems that don't. - You have a huge number of file descriptors active (at least 1000-10000).
- The set of file descriptors you're working with is stable over a long time period (adding/removing file descriptors from the
epoll
list is just as expensive as apoll
operation, since it requires entering/leaving kernelspace).
Caveats of select/poll vs. epoll reactors in Twisted
For very small numbers of sockets (varies depending on your hardware, of course, but we're talking about something on the order of 10 or fewer), select
can beat epoll
in memory usage and runtime speed. Of course, for such small numbers of sockets, both mechanisms are so fast that you don't really care about this difference in the vast majority of cases.
One clarification, though. Both select
and epoll
scale linearly. A big difference, though, is that the userspace-facing APIs have complexities that are based on different things. The cost of a select
call goes roughly with the value of the highest numbered file descriptor you pass it. If you select on a single fd, 100, then that's roughly twice as expensive as selecting on a single fd, 50. Adding more fds below the highest isn't quite free, so it's a little more complicated than this in practice, but this is a good first approximation for most implementations.
The cost of epoll
is closer to the number of file descriptors that actually have events on them. If you're monitoring 200 file descriptors, but only 100 of them have events on them, then you're (very roughly) only paying for those 100 active file descriptors. This is where epoll
tends to offer one of its major advantages over select
. If you have a thousand clients that are mostly idle, then when you use select
you're still paying for all one thousand of them. However, with epoll
, it's like you've only got a few - you're only paying for the ones that are active at any given time.
All this means that epoll
will lead to less CPU usage for most workloads. As far as memory usage goes, it's a bit of a toss up. select
does manage to represent all the necessary information in a highly compact way (one bit per file descriptor). And the FD_SETSIZE
(typically 1024) limitation on how many file descriptors you can use with select
means that you'll never spend more than 128 bytes for each of the three fd sets you can use with select
(read, write, exception). Compared to those 384 bytes max, epoll
is sort of a pig. Each file descriptor is represented by a multi-byte structure. However, in absolute terms, it's still not going to use much memory. You can represent a huge number of file descriptors in a few dozen kilobytes (roughly 20k per 1000 file descriptors, I think). And you can also throw in the fact that you have to spend all 384 of those bytes with select
if you only want to monitor one file descriptor but its value happens to be 1024, wheras with epoll
you'd only spend 20 bytes. Still, all these numbers are pretty small, so it doesn't make much difference.
And there's also that other benefit of epoll
, which perhaps you're already aware of, that it is not limited to FD_SETSIZE
file descriptors. You can use it to monitor as many file descriptors as you have. And if you only have one file descriptor, but its value is greater than FD_SETSIZE
, epoll
works with that too, but select
does not.
Randomly, I've also recently discovered one slight drawback to epoll
as compared to select
or poll
. While none of these three APIs supports normal files (i.e., files on a file system), select
and poll
present this lack of support as reporting such descriptors as always readable and always writable. This makes them unsuitable for any meaningful kind of non-blocking filesystem I/O, a program which uses select
or poll
and happens to encounter a file descriptor from the filesystem will at least continue to operate (or if it fails, it won't be because of select
or poll
), albeit it perhaps not with the best performance.
On the other hand, epoll
will fail fast with an error (EPERM
, apparently) when asked to monitor such a file descriptor. Strictly speaking, this is hardly incorrect. It's merely signalling its lack of support in an explicit way. Normally I would applaud explicit failure conditions, but this one is undocumented (as far as I can tell) and results in a completely broken application, rather than one which merely operates with potentially degraded performance.
In practice, the only place I've seen this come up is when interacting with stdio. A user might redirect stdin or stdout from/to a normal file. Whereas previously stdin and stdout would have been a pipe -- supported by epoll
just fine -- it then becomes a normal file and epoll
fails loudly, breaking the application.
Why poll is not replaced with epoll?
Okay, 7 years later I have a more convincing answer based on this article by Evan Klitzke.
Firstly, the reason I asked the question in the first place is the often mentioned performance advantage of epoll
compared to poll
/select
. The word goes that epoll
is asymptotically more efficient (O(1)) than poll
(O(N)).
What is not as widely known is that only edge-triggered epoll
is truly O(1), while level-trggered epoll
has same asymptotics of O(N). Indeed, level-triggered flavor has to go over the list of watched fds every time it is called to find ones that potentially has still more data pending. Edge-triggered variety can rely on signals in response to new bytes appearing in an fd.
It would be interesting to find out, how exactly a resumed thread finds out which fd woke it up, but it's certainly possible that this datum is passed through during epoll-triggered wake-up.
Obviously, poll
/select
cannot use edge-triggered epoll
as the semantics are different. As we saw, implementing with level-triggered epoll
wouldn't bring asymptotic performance benefits. And possibly, also negatively affect it if constant factors or constant terms are high (as they seem to be based on coarse benchmark that I did and quoted in another comment).
For more information, please read Blocking I/O, Nonblocking I/O, And Epoll.
Why is poll() better than select()?
Active FD means an open file descriptor.
Both select() and poll() are for single-threaded single-process programs to allow them to handle multiple connections at the same time. For instance OpenWRT's uhttpd web-server is like that.
select() and poll() are available on all Unices.
The better scaling O(1) versions are epoll
on Linux and kqueue
on BSDs. Less portable, though. But you can install libkqueue0
on Debian Linux.
Many programs use other approaches. For instance sshd, the SSH daemon, spawns a child process for each connection. Others handle each connection in a thread.
Shall we use poll() or select()?
All remotely modern systems have poll
, and it's a greatly superior interface to select
/pselect
in almost all ways:
poll
allows more fine-grained detection of status thanselect
.poll
does not have limits on the max file descriptor you can use (and more importantly, does not have critical vulnerabilities when you fail to check for file descriptors past theFD_SETSIZE
limit).
The only disadvantages I can think of to using poll
are that:
- unlike
pselect
,poll
cannot atomically unmask/mask signals, so you can't use it for waiting for a set of events that includes both file descriptor activity and signals unless you resort to the self-pipe trick. poll
only has millisecond resolution for the wait timeout, rather than microsecond (select
) or nanosecond (pselect
).
Certainly portability of poll
is not a consideration anymore. Any system old enough to lack poll
is full of so many vulnerabilities it should not be connected to a network.
In summary, unless you have very special needs (tiny timeout intervals, nasty signal interactions, scaling to millions of persistent connections, etc.) I would simply use poll
and be done with it. As others have mentioned, libevent
is also an option, but it's not clean/safe code (its use of select
actually invokes dangerous UB trying to workaround the limitations of select
!) and I find code that uses libevent
is generally a lot more unnecessarily complicated than code that simply uses poll
directly.
Related Topics
What Does 'Set -O Errtrace' Do in a Shell Script
Print Field 'N' to End of Line
Counting Number of Directories in a Specific Directory
How to Convert Iso8859-15 to Utf8
Iterate Over Lines Instead of Words in a for Loop of Shell Script
Find and Replace a Particular Term in Multiple Files
List of All Folders and Sub-Folders
Linux Bash: Setting Iptables Rules to Allow Both Active and Passive Ftp
How to Restart a Service If Its Dependent Service Is Restarted
Hiding Github Token in .Gitconfig
Less Gets Keyboard Input from Stderr
Get a Substring from a File Shell Script
Use Find Command But Exclude Files in Two Directories
Remote Linux Server to Remote Linux Server Dir Copy. How
Changing Location of Core Dump
How to Deploy a Container into a Specific Node in a Docker Swarm