Does the Thundering Herd Problem Exist on Linux Anymore

Does the Thundering Herd Problem exist on Linux anymore?

For years, most unix/linux kernels serialize response to accept(2)s, in other words, only one thread is waken up if more than one are blocking on accept(2) against a single open file descriptor.

OTOH, many (if not all) kernels still have the thundering herd problem in the select-accept pattern as you describe.

I have written a simple script ( https://gist.github.com/kazuho/10436253 ) to verify the existence of the problem, and found out that the problem exists on linux 2.6.32 and Darwin 12.5.0 (OS X 10.8.5).

Calling accept() from multiple threads

As mentioned in the StackOverflow answer you linked, a single thread calling accept() is probably the way to go. You mention concerns about locking, but these days you will find lockfree queue implementations available in Boost.Lockfree, Intel TBB, and elsewhere. You could use one of those if you like, but you might just use a condition variable to let your worker threads sleep and wake one of them when a new connection is established.

gRPC(C Base) polling engine is built with 'epollex' despite being under linux kernel version 4.5


TL;DR

  • RHEL7/CentOS7's kernel 3.10.x may have EPOLLEXCLUSIVE.
  • epollex engine does NOT exist in gRPC source anymore.

Details

CentOS, or RHEL seems to have EPOLLEXCLUSIVE backported into its kernel 3.10.x, which is available in release >= 7.3.

  • https://bugzilla.redhat.com/show_bug.cgi?id=1426133

gRPC has kernel feature availability check code which actually tries epoll system call with EPOLLEXCLUSIVE flag on. It does not depends on actual version of linux kernel.

https://github.com/grpc/grpc/blob/77e2827f3d70650182474624b4de22e053ac01f6/src/core/lib/iomgr/is_epollexclusive_available.cc#L63-L95

/* This polling engine is only relevant on linux kernels supporting epoll() */
bool grpc_is_epollexclusive_available(void) {

...

struct epoll_event ev;
/* choose events that should cause an error on
EPOLLEXCLUSIVE enabled kernels - specifically the combination of
EPOLLONESHOT and EPOLLEXCLUSIVE */
ev.events =
static_cast<uint32_t>(EPOLLET | EPOLLIN | EPOLLEXCLUSIVE | EPOLLONESHOT);
ev.data.ptr = nullptr;
if (epoll_ctl(fd, EPOLL_CTL_ADD, evfd, &ev) != 0) {
if (errno != EINVAL) {
if (!logged_why_not) {
gpr_log(
GPR_ERROR,
"epoll_ctl with EPOLLEXCLUSIVE | EPOLLONESHOT failed with error: "
"%d. Not using epollex polling engine.",
errno);
logged_why_not = true;
}
close(fd);
close(evfd);
return false;
}

...


BTW epollex polling engine is now removed from gRPC repository for some unknown reason.

  • https://github.com/grpc/grpc/pull/29160
  • https://github.com/grpc/grpc/issues/30328#issuecomment-1189477119

How do you minimize the number of threads used in a tcp server application?

The modern approach is to make use of the operating system to multiplex many network sockets for you, freeing your application to only processing active connections with traffic.

Whenever you open a socket it's associated it with a selector. You use a single thread to poll that selector. Whenever data arrives, the selector will indicate the socket which is active, you hand off that operation to a child thread and continue polling.

This way you only need a thread for each concurrent operation. Sockets which are open but idle will not tie up a thread.

  • Using the select() and poll() methods
  • Building Highly Scalable Servers with Java NIO


Related Topics



Leave a reply



Submit