Cancelling Boost Asio Deadline Timer Safely

Cancelling boost asio deadline timer safely

The cancellation is safe.

It's just not robust. You didn't account for the case when the timer wasn't pending. You cancel it once, then, but it will just start a new async wait once the completion handler is invoked.

What follows is my detailed steps on how I traced the issue.

SUMMARY TL;DR

Cancelling a time only cancels asynchronous operations in flight.

If you want to shutdown an asynchronous call chain, you'll have to use additional logic for that. An example is given below.

Handler Tracking

Enabling with

#define BOOST_ASIO_ENABLE_HANDLER_TRACKING 1

This produces output that can be visualized with boost/libs/asio/tools/handlerviz.pl:

A successful trace

Sample Image

As you can see, the async_wait is in-flight when the cancellation happens.

A "bad" trace

(truncated because it would run infinitely)

Sample Image

Note how the completion handler sees cc=system:0, not cc=system:125 (for operation_aborted). This is a symptom of the fact that the posted cancel did not actually "take". The only logical explanation (not visible in the diagram) is that the timer had already expired before the cancel gets invoked.

Let's compare the raw traces¹

Sample Image

¹ removing the noisy difference

Detecting It

So, we have a lead. Can we detect it?

    timer.get_io_service().post([](){
std::cerr << "tid: " << std::this_thread::get_id() << ", cancelling in post\n";
if (timer.expires_from_now() >= std::chrono::steady_clock::duration(0)) {
timer.cancel();
} else {
std::cout << "PANIC\n";
timer.cancel();
}
});

Prints:

tid: 140113177143232, i: 0, waiting for thread to join()
tid: 140113177143232, i: 1, waiting for thread to join()
tid: 140113177143232, i: 2, waiting for thread to join()
tid: 140113177143232, i: 3, waiting for thread to join()
tid: 140113177143232, i: 4, waiting for thread to join()
tid: 140113177143232, i: 5, waiting for thread to join()
tid: 140113177143232, i: 6, waiting for thread to join()
tid: 140113177143232, i: 7, waiting for thread to join()
tid: 140113177143232, i: 8, waiting for thread to join()
tid: 140113177143232, i: 9, waiting for thread to join()
tid: 140113177143232, i: 10, waiting for thread to join()
tid: 140113177143232, i: 11, waiting for thread to join()
tid: 140113177143232, i: 12, waiting for thread to join()
tid: 140113177143232, i: 13, waiting for thread to join()
tid: 140113177143232, i: 14, waiting for thread to join()
tid: 140113177143232, i: 15, waiting for thread to join()
tid: 140113177143232, i: 16, waiting for thread to join()
tid: 140113177143232, i: 17, waiting for thread to join()
tid: 140113177143232, i: 18, waiting for thread to join()
tid: 140113177143232, i: 19, waiting for thread to join()
tid: 140113177143232, i: 20, waiting for thread to join()
tid: 140113177143232, i: 21, waiting for thread to join()
tid: 140113177143232, i: 22, waiting for thread to join()
tid: 140113177143232, i: 23, waiting for thread to join()
tid: 140113177143232, i: 24, waiting for thread to join()
tid: 140113177143232, i: 25, waiting for thread to join()
tid: 140113177143232, i: 26, waiting for thread to join()
PANIC

Could we communicate the "super-cancellation" in another, clearer way? We have ... just the timer object to work with, of course:

Signaling Shutdown

The timer object doesn't have a lot of properties to work with. There's no close() or similar, like on a socket, that can be used to put the timer in some kind of invalid state.

However, there's the expiry timepoint, and we can use a special domain
value to signal "invalid" for our application:

timer.get_io_service().post([](){
std::cerr << "tid: " << std::this_thread::get_id() << ", cancelling in post\n";
// also cancels:
timer.expires_at(Timer::clock_type::time_point::min());
});

This "special value" is easy to handle in the completion handler:

void handle_timeout(const boost::system::error_code& ec)
{
if (!ec) {
started = true;
if (timer.expires_at() != Timer::time_point::min()) {
timer.expires_from_now(std::chrono::milliseconds(10));
timer.async_wait(&handle_timeout);
} else {
std::cerr << "handle_timeout: detected shutdown\n";
}
}
else if (ec != boost::asio::error::operation_aborted) {
std::cerr << "tid: " << std::this_thread::get_id() << ", handle_timeout error " << ec.message() << "\n";
}
}

How do I properly cancel a Boost deadline_timer from a destructor (in a multithreaded environment)?

A pattern is to have a shared_ptr to your Timer (by using std::enable_shared_from_this).

That way you can keep the timer alive as long as the handler hasn't been executed (by keeping a copy of the shared pointer bound to the handler).

Other solutions could be:

  • having externally allocated timers (e.g. in a container with reference stability, like a std::list) where you delete them manually when they're no longer needed
  • running a dedicated io_service on your own thread, so you can join the thread to await the work on the io_service.

Depending on your use cases/load patterns, one approach will be better than the others.

Samples:

  • using a std::list to manage the lifetime of objects with service objects that participate in async operations (in this case a Session: How to pass a boost asio tcp socket to a thread for sending heartbeat to client or server
  • using shared_from_this instead: Simple server using Boost.Asio throws an exception
  • Wrapping a timer for recurring callbacks and using local variables for lifetime management: C++ boost asynchronous timer to run in parallel with program

I picked answers that have some contrasting approaches (some not using Boost Asio) so you can see the trade-offs and what changes between approaches.

Atomically cancel asio asynchronious timer from another thread

Actually, the both approaches are not quite safe, just because deadline_timer is not thread-safe.

IMO, the most simple and safe way is to post the cancellation:

//...
timer.get_io_service().post([&]{timer.cancel();})
//...

NOTE: in the real code one has to ensure that timer outlives the functor (lambda).

UPDATE: as @sehe mentioned, this solution might not work - because the cancelling handler may appear in the io_service queue just before print, when the timer is not waiting anymore.

cancel a deadline_timer, callback triggered anyway

First, let's show the problem reproduced:

Live On Coliru (code below)

As you can see I run it as

./a.out | grep -C5 false

This filters the output for records that print from C1's completion handler when really c1_active is false (and the completion handler wasn't expected to run)

The problem, in a nutshell, is a "logical" race condition.

It's a bit of mind bender because there's only a single thread (visible on the surface). But it's actually not too complicated.

What happens is this:

  • when Clock C1 expires, it will post its completion handler onto the io_service's task queue. Which implies that it might not run immediately.

  • imagine that C2 expired too, and it's completion handler now gets scheduled and executes before the one that C1 just pushed. Imagine that by some high coincidence this time, C2 decides to call stop() on C1.

  • After C2's completion handler returns, C1's completion handler gets invoked.

    OOPS

    It still has ec saying "no error"... Hence the deadline timer for C1 gets rescheduled. Oops.

Background

For a more in-depth background on the guarantees that Asio (doesn't) make(s) for the order in which completion handlers get executed, see

  • When do handlers for cancelled boost::asio handlers get to run?

Solutions?

The simplest solution is to realize that m_enabled could be false. Let's just add the check:

void tick(const boost::system::error_code &ec) {
if (!ec && m_enabled) {
m_timer.expires_at(m_timer.expires_at() + m_duration);
m_timer.async_wait(boost::bind(&Clock::tick, this, _1));

if (m_callback)
m_callback(++m_count, *this);
}
}

On my system it doesn't reproduce the problem any more :)

Reproducer

Live On Coliru

#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/date_time/posix_time/posix_time_io.hpp>

static boost::posix_time::time_duration elapsed() {
using namespace boost::posix_time;
static ptime const t0 = microsec_clock::local_time();
return (microsec_clock::local_time() - t0);
}

class Clock {
public:
using callback_t = std::function<void(int, Clock &)>;
using duration_t = boost::posix_time::time_duration;

public:
Clock(boost::asio::io_service &io, callback_t callback = nullptr,
duration_t duration = boost::posix_time::seconds(1), bool enable = true)
: m_timer(io), m_duration(duration), m_callback(callback), m_enabled(false), m_count(0ul)
{
if (enable)
start();
}

void start() {
if (!m_enabled) {
m_enabled = true;
m_timer.expires_from_now(m_duration);
m_timer.async_wait(boost::bind(&Clock::tick, this, _1)); // std::bind _1 issue ?
}
}

void stop() {
if (m_enabled) {
m_enabled = false;
size_t c_cnt = m_timer.cancel();
#ifdef DEBUG
printf("[DEBUG@%p] timer::stop : %lu ops cancelled\n", this, c_cnt);
#endif
}
}

void tick(const boost::system::error_code &ec) {
if (ec != boost::asio::error::operation_aborted) {
m_timer.expires_at(m_timer.expires_at() + m_duration);
m_timer.async_wait(boost::bind(&Clock::tick, this, _1));
if (m_callback)
m_callback(++m_count, *this);
}
}

void reset_count() { m_count = 0ul; }
size_t get_count() const { return m_count; }

void set_duration(duration_t duration) { m_duration = duration; }
const duration_t &get_duration() const { return m_duration; }

void set_callback(callback_t callback) { m_callback = callback; }
const callback_t &get_callback() const { return m_callback; }

private:
boost::asio::deadline_timer m_timer;
duration_t m_duration;
callback_t m_callback;
bool m_enabled;
size_t m_count;
};

#include <iostream>

int main() {
boost::asio::io_service ios;

bool c1_active = true;

Clock c1(ios, [&](int i, Clock& self)
{
std::cout << elapsed() << "\t[C1 - fast] tick" << i << " (c1 active? " << std::boolalpha << c1_active << ")\n";
},
boost::posix_time::millisec(1)
);

#if 1
Clock c2(ios, [&](int i, Clock& self)
{
std::cout << elapsed() << "\t[C2 - slow] tick" << i << "\n";
c1_active = (i % 2 == 0);

if (c1_active)
c1.start();
else
c1.stop();
},
boost::posix_time::millisec(10)
);
#endif

ios.run();
}

Can ASIO timer `cancel()` call a spurious success ?

Everything you stated is correct. So in your situation you could need a separate variable to indicate you don’t want to continue the loop. I normally used a atomic_bool and I don’t bother posting a cancel routine, I just set the bool & call cancel from whatever thread I am on.

UPDATE:

The source for my answer is mainly experience in using ASIO for years and for understanding the asio codebase enough to fix problems and extend parts of it when required.

Yes the documentation says that the it's not thread safe between shared instances of the deadline_timer, but the documentation is not the best (what documentation is...). If you look at the source for how the "cancel" works we can see:

Boost Asio version 1.69: boost\asio\detail\impl\win_iocp_io_context.hpp

template <typename Time_Traits>
std::size_t win_iocp_io_context::cancel_timer(timer_queue<Time_Traits>& queue,
typename timer_queue<Time_Traits>::per_timer_data& timer,
std::size_t max_cancelled)
{
// If the service has been shut down we silently ignore the cancellation.
if (::InterlockedExchangeAdd(&shutdown_, 0) != 0)
return 0;

mutex::scoped_lock lock(dispatch_mutex_);
op_queue<win_iocp_operation> ops;
std::size_t n = queue.cancel_timer(timer, ops, max_cancelled);
post_deferred_completions(ops);
return n;
}

You can see that the cancel operation is guarded by a mutex lock so the "cancel" operation is thread safe.

Calling most of the other operations on deadline timer is not (in regards to calling them at the same time from multiple threads).

Also I think you are correct about the restarting of timers in quick order. I don't normally have a use case for stopping and starting timers in that sort of fashion, so I've never needed to do that.

How to avoid firing already destroyed boost::asio::deadline_timer

According to the reference of deadline_timer::cancel:

If the timer has already expired when cancel() is called, then the handlers for asynchronous wait operations will:

  • have already been invoked; or

  • have been queued for invocation in the near future.

These handlers can no longer be cancelled, and therefore are passed an error code that indicates the successful completion of the wait operation.

We can know that calling cancel() can not cancel the timer which has already been queued for firing.

And it seems that the dealine_timer doesn't override destructor. (There is no destructor in the member list of deadline_timer)

In your code snippet, all timers will fire at almost the same time. Concerning that asio will use some internal threads, it's quite probably that when one completion handler is called, the others are being queued.

Is boost asio timer expected to block on `cancel`?

Looks like the problem was due the fact that we are working in multi-process environment. All processes share the same memory, all objects created on this shared memory, including mutexes, threads etc. To function properly in such a case the mutexes used in the system are created with PTHREAD_PROCESS_SHARED attribute. Obviously, asio mutexes are not created with such an attribute so I guess this is the problem with the mutexes getting stuck in unexpected locations. Once the io_context and steady_timer started to execute in only one process it started to work as expected

Boost asio deadline timer completing immediately (C++)

As mentioned in the comments, deadline_timer is destroyed too soon because it's a local variable, thus canceling the I/O operation.

If we add some error handling, we will see the actual error reported:

void print(const boost::system::error_code& e) {
if (e.failed())
std::cout << "error: " << e.message() << std::endl;
else
std::cout << "connected!" << std::endl;
}

Prints:

error: The I/O operation has been aborted because of either a thread exit or an application request

A possible fix is to move deadline_timer to be member of WebSocketSession:

class WebSocketSession {
public:
WebSocketSession(boost::asio::io_context& io_context) : io_context_(io_context),
timer_(io_context, boost::posix_time::seconds(10)) {}
void connect() {
timer_.async_wait(&print);
}
private:
boost::asio::io_context& io_context_;
boost::asio::deadline_timer timer_;
};


Related Topics



Leave a reply



Submit