C++: Safe to Use Longjmp and Setjmp

Practical usage of setjmp and longjmp in C

Error handling
Suppose there is an error deep down in a function nested in many other functions and error handling makes sense only in the top level function.

It would be very tedious and awkward if all the functions in between had to return normally and evaluate return values or a global error variable to determine that further processing doesn't make sense or even would be bad.

That's a situation where setjmp/longjmp makes sense.
Those situations are similar to situation where exception in other langages (C++, Java) make sense.

Coroutines
Besides error handling, I can think also of another situation where you need setjmp/longjmp in C:

It is the case when you need to implement coroutines.

Here is a little demo example.
I hope it satisfies the request from Sivaprasad Palas for some example code and answers the question of TheBlastOne how setjmp/longjmp supports the implementation of corroutines (as much as I see it doesn't base on any non-standard or new behaviour).

EDIT:
It could be that it actually is undefined behaviour to do a longjmp down the callstack (see comment of MikeMB; though I have not yet had opportunity to verify that).

#include <stdio.h>
#include <setjmp.h>

jmp_buf bufferA, bufferB;

void routineB(); // forward declaration

void routineA()
{
int r ;

printf("(A1)\n");

r = setjmp(bufferA);
if (r == 0) routineB();

printf("(A2) r=%d\n",r);

r = setjmp(bufferA);
if (r == 0) longjmp(bufferB, 20001);

printf("(A3) r=%d\n",r);

r = setjmp(bufferA);
if (r == 0) longjmp(bufferB, 20002);

printf("(A4) r=%d\n",r);
}

void routineB()
{
int r;

printf("(B1)\n");

r = setjmp(bufferB);
if (r == 0) longjmp(bufferA, 10001);

printf("(B2) r=%d\n", r);

r = setjmp(bufferB);
if (r == 0) longjmp(bufferA, 10002);

printf("(B3) r=%d\n", r);

r = setjmp(bufferB);
if (r == 0) longjmp(bufferA, 10003);
}

int main(int argc, char **argv)
{
routineA();
return 0;
}

Following figure shows the flow of execution:

flow of execution

Warning note
When using setjmp/longjmp be aware that they have an effect on the validity of local variables often not considered.

Cf. my question about this topic.

C++: Safe to use longjmp and setjmp?

setjmp()/longjmp() completely subvert stack unwinding and therefore exception handling as well as RAII (destructors in general).

From 18.7/4 "Other runtime support" in the standard:

If any automatic objects would be destroyed by a thrown exception transferring
control to another (destination) point in the program, then a call to longjmp(jbuf, val) at the throw point that transfers control to the same (destination) point has undefined behavior.

So the bottom line is that setjmp()/longjmp() do not play well in C++.

Safe usage of `setjmp` and `longjmp`

setjmp and longjmp can be seen as a poor man's exception mechanism. BTW, Ocaml exceptions are as quick as setjmp but have a much clearer semantics.

Of course a longjmp is much faster than repeatedly returning error codes in intermediate functions, since it pops up a perhaps significant call stack portion.

(I am implicitly focusing on Linux)

They are valid and useful as long as no resources are allocated between them, including:

  • heap memory (malloc)
  • fopen-ing FILE* handles
  • opening operating system file descriptors (e.g. for sockets)
  • other operating system resources, such as timers or signal handlers
  • getting some external resource managed by some server, e.g. X11 windows (hence using any widget toolkit like GTK), or database handle or connection...
  • etc...

The main issue is that that property of not leaking resources is a global whole-program property (or at least global to all functions possibly called between setjmp and longjmp), so it prohibits modular software development : any other colleague having to improve some code in any function between setjmp and longjmp has to be aware of that limitation and follow that discipline.

Hence, if you use setjmp document that very clearly.

BTW, if you only care about malloc, using systematically Boehm's conservative garbage collector would help a lot; you'll use GC_malloc instead of malloc everywhere and you won't care about free, and practically that is enough; then you can use setjmp without fears (since you could call GC_malloc between setjmp and longjmp).

(notice that the concepts and the terminology around garbage collector are quite related to exception handling and setjmp, but many people don't know them enough. Reading the Garbage Collection Handbook should be worthwhile)

Read also about RAII and learn about C++11 exceptions (and their relation to destructors). Learn a bit about continuations and CPS.

Read setjmp(3), longjmp(3) (and also about sigsetjmp, siglongjmp, and setcontext(3)) and be aware that the compiler has to know about setjmp

Safe usage of longjmp/setjmp with volatile

u32 *volatile var makes the pointer volatile, while volatile u32 *var tells the compiler that the data at that address is volatile. So since the pointer is not volatile in the latter example, I wouldn't be surprised if your compiler optimized away the default case completely to something like result = NULL;.

It probably doesn't expect the setjmp wizardry, and these are notorious for being even "more spaghetti than goto".

Is it good programming practice to use setjmp and longjmp in C?

Essentially, you're right in your assertion that jmp-style propagation is essentially the same thing as goto. Read Dijkstra's (famous and controversial) paper about gotos which (I think) provides sensible reasoning for why gotos should rarely be used. Unless you know exactly why you're doing what you're doing (or you're working in very specific fields -- such as embedded programming), you should not touch either goto or longjmp.

What happens when a thread calls longjmp() in c

The POSIX specification for relevant functions can be found at:

  • longjmp()
  • setjmp()
  • siglongjmp()
  • sigsetjmp()

Note that the specification for longjmp() lists some of the restrictions:

The longjmp() function shall restore the environment saved by the most recent invocation of setjmp() in the same process, with the corresponding jmp_buf argument. If the most recent invocation of setjmp() with the corresponding jmp_buf occurred in another thread, or if there is no such invocation, or if the function containing the invocation of setjmp() has terminated execution in the interim, or if the invocation of setjmp() was within the scope of an identifier with variably modified type and execution has left that scope in the interim, the behavior is undefined. [CX] ⌦ It is unspecified whether longjmp() restores the signal mask, leaves the signal mask unchanged, or restores it to its value at the time setjmp() was called. ⌫

For your scenarios:

  1. Should be OK.
  2. Undefined behaviour. If instead the main thread (or the thread that called the setjmp()) does the longjmp(), it should be OK, but it won't kill other threads. You're likely to run foul of the general restrictions on longjmp() even so.

Overall, be sensible and very conservative. They're fragile functions. Don't use them unless really necessary, and worry about resource management in general.

Why does setjmp/longjmp

TL/DR - you can't jump back into a function you jumped out of.

7.13.2.1 The longjmp function
...

2     The longjmp function restores the environment saved by the most recent invocation of
the setjmp macro in the same invocation of the program with the corresponding
jmp_buf argument. If there has been no such invocation, or if the invocation was from
another thread of execution, or if the function containing the invocation of the setjmp
macro has terminated execution
248) in the interim
, or if the invocation of the setjmp
macro was within the scope of an identifier with variably modified type and execution has
left that scope in the interim, the behavior is undefined.

248) For example, by executing a return statement or because another longjmp call has caused a
transfer to a setjmp invocation in a function earlier in the set of nested calls.

C 2011 Online Draft

When you execute longjmp(jump_body, 1); in func, you invalidate jump_ret.

longjmp isn't bidirectional - it unwinds the stack as though any of the function calls between the setjmp and longjmp never happened.

Is there a safe way to use setjmp() and longjmp() in C++?

If you have some really weird requirement that doesn't allow you to control the flow of the program normally, with conditionals/loops/breaks, I would prefer to use an exception over jmp.

There are scenarios where using an exception to control flow is acceptable. I think one of Boost.Graph's search functions throws an exception to quickly return to the caller from deep recursion.

Is longjmp supposed to restore the stack?

That's the expected behavior:

Upon return to the scope of setjmp, all accessible objects,
floating-point status flags, and other components of the abstract
machine have the same values as they had when std::longjmp was
executed, except for the non-volatile local variables in setjmp's
scope, whose values are indeterminate if they have been changed since
the setjmp invocation
.

The value of a when executing longjmp is 15, so that is a value one could expect to see (it's indeterminate in general). The jmp_buf only stores the point of execution. Not the state of every variable in the program.

Using setjmp() and longjmp() to prevent segmentation fault in a program

Long story short: longjump is (obviously) not an async-signal-safe function, and printf too. Therefore calling those functions from a signal handler will cause undefined behavior. Refer to man 7 signal-safety for more information and a list of async-signal-safe functions.

What's most probably happening is that the longjump(buf, 2) is causing the program to "escape" the signal handler abnormally, and this causes another segmentation fault after executing the second switch case. Since another segmentation fault occurs, the signal handler is called again, and you do another longjump(buf, 2), getting back where you was, causing another segfault, and so on and so forth... indefinitely.


EDIT: as suggested by Andrew Henle in the comments below, there also are the two POSIX functions sigsetjmp() and siglongjmp(). I however prefer the approach described below since it looks cleaner to me and safely returns from the signal handler leaving the dirty work to the kernel.

If you want your code to run as expected, you can have your signal receive information about the context at the moment of the segfault:

static void signal_handler(int sig, siginfo_t *info, void *ucontext) {
/* Assuming your architecture is Intel x86_64. */
ucontext_t *uc = (ucontext_t *)ucontext;
greg_t *rip = &uc->uc_mcontext.gregs[REG_RIP];

/* Assign a new value to *rip somehow, which will be where the
execution will continue after the signal handler returns. */
}

int main(void) {
struct sigaction sa;
int err;

sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = signal_handler;

err = sigemptyset(&sa.sa_mask);
if (err)
return 1;

err = sigaddset(&sa.sa_mask, SIGSEGV);
if (err)
return 1;

err = sigaction(SIGSEGV, &sa, NULL);
if (err)
return 1;

/* ... */

return 0;
}

This will allow you to resume execution basically anywhere you want, provided that you actually know where to exactly resume. To set rip to the right value though, you will probably have to use a global label defined with inline asm or some other dirty trick.

Something like this should work (tested on my machine):

/* In main, where you want to retums after SIGSEGV: */
asm voaltile ("checkpoint: .global checkpoint" : );

/* In your signal handler: */
asm volatile (
"movabs $checkpoint, %0"
: "=r" (*rip)
);

If you are wondering about why this isn't that easy that's because it shouldn't even be done in the first place, it's basically an abomination that serves no purpose other than maybe having fun discovering how stuff can be broken in the most absurd ways.

You will need at least the following headers and feature test macros for the above to work:

#define _GNU_SOURCE
#define __USE_GNU
#include <signal.h>
#include <ucontext.h>

Note that this is (of course) both architecture and platform dependent.



Related Topics



Leave a reply



Submit