How to Use Aio and Epoll Together in a Single Event Loop

How do you use AIO and epoll together in a single event loop?

try libevent:

http://www.monkey.org/~provos/libevent/

there are patches to support both.

revisiting how do you use aio and epoll together

note you CAN use POSIX aio with epoll, there's signalfd(2) it creates a file descriptor that you can then use to be notified of signals in an epoll based loop.

Also the second aio API is supposed to eventually be what glibc bases it's implementation of POSIX aio on, it's just not quite there yet... (I don't know if anyone is working on it either)

What's the difference between event-driven and asynchronous? Between epoll and AIO?

Events is one of the paradigms to achieve asynchronous execution.
But not all asynchronous systems use events. That is about semantic meaning of these two - one is super-entity of another.

epoll and aio use different metaphors:

epoll is a blocking operation (epoll_wait()) - you block the thread until some event happens and then you dispatch the event to different procedures/functions/branches in your code.

In AIO, you pass the address of your callback function (completion routine) to the system and the system calls your function when something happens.

Problem with AIO is that your callback function code runs on the system thread and so on top of the system stack. A few problems with that as you can imagine.

Python: retrieve several URLs via select.epoll()


How can I use the requests library (or a different urllib) combined with linux epoll?

Unfortunately you can’t unless such a library has been built with this integration in mind. epoll, as well as select/poll/kqueue and others are I/O multiplexing system calls and the overall program architecture needs to be built around it.

Simply put, a typical program structure boils down to the following

  • one needs to have a bunch of file descriptors (sockets in non-blocking mode in your case)
  • a system call (man epoll_wait in case of epoll) blocks until a specified event occurs on one or multiple descriptors
  • information of the descriptors available for I/O is returned

After that this is the outer code’s job to handle these descriptors i.e. figure out how much data has become available, call some callbacks etc.

If the library uses regular blocking sockets the only way to parallelize it is to use threads/processes
Here’s a good article on the subject, the examples use C and that’s good as it’s easier to understand what’s actually happening under the hood

Async frameworks & requests library

Lets check out what’s suggested here

If you are concerned about the use of blocking IO, there are lots of
projects out there that combine Requests with one of Python's
asynchronicity frameworks. Some excellent examples are
requests-threads, grequests, and requests-futures).

requests-threads - uses threads

grequests - integration with gevent (it’s a different story, see below)

requests-futures - in fact also threads/processes

neither of them has anything to do with true asynchronicity

Should I use select.epoll() or one of the many async frameworks which python has

Please note, epoll is linux-specific beast and it won’t work i.e. on OS X that has a different mechanism called kqueue. As you appear to be writing a general-purpose job queue it doesn’t seem to be a good solution.

Now back to python. You’ve got the following options:

threads/processes/concurrent.futures - unlikely is it something you’re aiming at as your app is a typical C10K server

epoll/kqueue - you’ll have to do everything yourself. In case of fetching an HTTP urls you’ll need to deal with not only http/ssl but also with asynchronous DNS resolution. Also consider using asyncore[] that provides some basic infrastructure

twisted/tornado - callback-based frameworks that already do all the low-level stuff for you

gevent - this is something you might like if you’re going to reuse existing blocking libraries (urllib, requests etc) and use both python 2.x and python 3.x. But this solution is a hack by design. For an app of your size it might be ok, but I wouldn’t use it for anything bigger that should be rock-solid and run in prod

asyncio

This module provides infrastructure for writing single-threaded
concurrent code using coroutines, multiplexing I/O access over sockets
and other resources, running network clients and servers, and other
related primitives

It has everything you might need.
There’s also a bunch of libraries working with popular RDBMs and http
https://github.com/aio-libs

But it lacks support of python 2.x. There are ports of asyncio to python 2.x but not sure how stable they are

Finally

So if I could sacrifice python 2.x I’d personally go with asyncio & related libraries

If you really really need python 2.x use one of the approaches above depending on the stability required and assumed peak load

Problem handling file I/O with libevent2

I needed libevent to read many files regarding priorities. The problem was in epoll not in libevent. Epoll doesn't support regular Unix files.

To solve it I forced libevent not to use epoll:

    struct event_config *cfg = event_config_new();

event_config_avoid_method(cfg, "epoll");

ev_base = event_base_new_with_config(cfg);
event_config_free(cfg);

Next method on the preference list was poll, which fully support files just as I wanted to.

Thank you all for answers.

C - How to use both aio_read() and aio_write()

If you wish to have separate AIO queues for reads and writes, so that a write issued later can execute before a read issued earlier, then you can use dup() to create a duplicate of the socket, and use one to issue reads and the other to issue writes.

However, I second the recommendations to avoid AIO entirely and simply use an epoll()-driven event loop with non-blocking sockets. This technique has been shown to scale to high numbers of clients - if you are getting high CPU usage, profile it and find out where that's happening, because the chances are that it's not your event loop that's the culprit.

AIO support on Linux

AIO support has been included in the linux kernel proper. That's why the first hit on Google only offers patches to the 2.4 Linux kernel. In 2.6 and 3.0 it's already in there.

If you checkout the Linux kernel source code, it's at fs/aio.c

There's some documentation in the GNU libc manual, but be advised that aio is not possible for all types of Linux file descriptors. Most of the general "how to" documentation is dated around 2006, which is appropriate since that's when AIO in Linux was making the headlines.

Note that the POSIX.1b and Unix98 standards haven't changed, so can you be a bit specific as to the nature of the "out-of-date"ness of the examples?



Related Topics



Leave a reply



Submit