How Are Exceptions Implemented Under the Hood

How are exceptions implemented under the hood?

Exceptions are just a specific example of a more general case of advanced non-local flow control constructs. Other examples are:

  • notifications (a generalization of exceptions, originally from some old Lisp object system, now implemented in e.g. CommonLisp and Ioke),
  • continuations (a more structured form of GOTO, popular in high-level, higher-order languages),
  • coroutines (a generalization of subroutines, popular especially in Lua),
  • generators à la Python (essentially a restricted form of coroutines),
  • fibers (cooperative light-weight threads) and of course the already mentioned
  • GOTO.

(I'm sure there's many others I missed.)

An interesting property of these constructs is that they are all roughly equivalent in expressive power: if you have one, you can pretty easily build all the others.

So, how you best implement exceptions depends on what other constructs you have available:

  • Every CPU has GOTO, therefore you can always fall back to that, if you must.
  • C has setjmp/longjmp which are basically MacGyver continuations (built out of duct-tape and toothpicks, not quite the real thing, but will at least get you out of the immediate trouble if you don't have something better available).
  • The JVM and CLI have exceptions of their own, which means that if the exception semantics of your language match Java's/C#'s, you are home free (but if not, then you are screwed).
  • The Parrot VM as both exceptions and continuations.
  • Windows has its own framework for exception handling, which language implementors can use to build their own exceptions on top.

A very interesting use case, both of the usage of exceptions and the implementation of exceptions is Microsoft Live Lab's Volta Project. (Now defunct.) The goal of Volta was to provide architectural refactoring for Web applications at the push of a button. So, you could turn your one-tier web application into a two- or three-tier application just by putting some [Browser] or [DB] attributes on your .NET code and the code would then automagically run on the client or in the DB. In order to do that, the .NET code had to be translated to JavaScript source code, obviously.

Now, you could just write an entire VM in JavaScript and run the bytecode unmodified. (Basically, port the CLR from C++ to JavaScript.) There are actually projects that do this (e.g. the HotRuby VM), but this is both inefficient and not very interoperable with other JavaScript code.

So, instead, they wrote a compiler which compiles CIL bytecode to JavaScript sourcecode. However, JavaScript lacks certain features that .NET has (generators, threads, also the two exception models aren't 100% compatible), and more importantly it lacks certain features that compiler writers love (either GOTO or continuations) and that could be used to implement the above-mentioned missing features.

However, JavaScript does have exceptions. So, they used JavaScript Exceptions to implement Volta Continuations and then they used Volta Continuations to implement .NET Exceptions, .NET Generators and even .NET Managed Threads(!!!)

So, to answer your original question:

How are exceptions implemented under the hood?

With Exceptions, ironically! At least in this very specific case, anyway.

Another great example is some of the exception proposals on the Go mailing list, which implement exceptions using Goroutines (something like a mixture of concurrent coroutines ans CSP processes). Yet another example is Haskell, which uses Monads, lazy evaluation, tail call optimization and higher-order functions to implement exceptions. Some modern CPUs also support basic building blocks for exceptions (for example the Vega-3 CPUs that were specifically designed for the Azul Systems Java Compute Accelerators).

Catching and Throwing an Exception: What happens under the hood?

Regarding the inner workings of the exception mechanism: There is plenty of documentation on this.
I'm particularly a fan of this article:
http://www.javaworld.com/article/2076868/learn-java/how-the-java-virtual-machine-handles-exceptions.html

Ultra-short summary: When an exception is thrown, the jvm looks up in a table where the execution (of the init() method of the exception) continues.

For the second part of your question:

what is the point of catching an exception in a try-catch block, then throwing it?

I see some reasons for catching an exception an throwing another one:

  • You might want to catch an unchecked exception (because you know,
    "something bad might happen") and throw a checked one - so the caller has to handle it.

  • You want to use a custom Exception, maybe with additional
    information/logic

  • You're Implementing an error facade, e.g. throwing exceptions and
    catching them at the end in the facade.

How is the C++ exception handling runtime implemented?

Implementations may differ, but there are some basic ideas that follow from requirements.

The exception object itself is an object created in one function, destroyed in a caller thereof. Hence, it's typically not feasible to create the object on the stack. On the other hand, many exception objects are not very big. Ergo, one can create e.g a 32 byte buffer and overflow to heap if a bigger exception object is actually needed.

As for the actual transfer of control, two strategies exist. One is to record enough information in the stack itself to unwind the stack. This is basically a list of destructors to run and exception handlers that might catch the exception. When an exception happens, run back the stack executing those destructors until you find a matching catch.

The second strategy moves this information into tables outside the stack. Now, when an exception occurs, the call stack is used to find out which scopes are entered but not exited. Those are then looked up in the static tables to determine where the thrown exception will be handled, and which destructors run in between. This means there is less exception overhead on the stack; return addresses are needed anyway. The tables are extra data, but the compiler can put them in a demand-loaded segment of the program.

How do exceptions work (behind the scenes) in c++

Instead of guessing, I decided to actually look at the generated code with a small piece of C++ code and a somewhat old Linux install.

class MyException
{
public:
MyException() { }
~MyException() { }
};

void my_throwing_function(bool throwit)
{
if (throwit)
throw MyException();
}

void another_function();
void log(unsigned count);

void my_catching_function()
{
log(0);
try
{
log(1);
another_function();
log(2);
}
catch (const MyException& e)
{
log(3);
}
log(4);
}

I compiled it with g++ -m32 -W -Wall -O3 -save-temps -c, and looked at the generated assembly file.

    .file   "foo.cpp"
.section .text._ZN11MyExceptionD1Ev,"axG",@progbits,_ZN11MyExceptionD1Ev,comdat
.align 2
.p2align 4,,15
.weak _ZN11MyExceptionD1Ev
.type _ZN11MyExceptionD1Ev, @function
_ZN11MyExceptionD1Ev:
.LFB7:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
popl %ebp
ret
.LFE7:
.size _ZN11MyExceptionD1Ev, .-_ZN11MyExceptionD1Ev

_ZN11MyExceptionD1Ev is MyException::~MyException(), so the compiler decided it needed a non-inline copy of the destructor.

.globl __gxx_personality_v0
.globl _Unwind_Resume
.text
.align 2
.p2align 4,,15
.globl _Z20my_catching_functionv
.type _Z20my_catching_functionv, @function
_Z20my_catching_functionv:
.LFB9:
pushl %ebp
.LCFI2:
movl %esp, %ebp
.LCFI3:
pushl %ebx
.LCFI4:
subl $20, %esp
.LCFI5:
movl $0, (%esp)
.LEHB0:
call _Z3logj
.LEHE0:
movl $1, (%esp)
.LEHB1:
call _Z3logj
call _Z16another_functionv
movl $2, (%esp)
call _Z3logj
.LEHE1:
.L5:
movl $4, (%esp)
.LEHB2:
call _Z3logj
addl $20, %esp
popl %ebx
popl %ebp
ret
.L12:
subl $1, %edx
movl %eax, %ebx
je .L16
.L14:
movl %ebx, (%esp)
call _Unwind_Resume
.LEHE2:
.L16:
.L6:
movl %eax, (%esp)
call __cxa_begin_catch
movl $3, (%esp)
.LEHB3:
call _Z3logj
.LEHE3:
call __cxa_end_catch
.p2align 4,,3
jmp .L5
.L11:
.L8:
movl %eax, %ebx
.p2align 4,,6
call __cxa_end_catch
.p2align 4,,6
jmp .L14
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",@progbits
.align 4
.LLSDA9:
.byte 0xff
.byte 0x0
.uleb128 .LLSDATT9-.LLSDATTD9
.LLSDATTD9:
.byte 0x1
.uleb128 .LLSDACSE9-.LLSDACSB9
.LLSDACSB9:
.uleb128 .LEHB0-.LFB9
.uleb128 .LEHE0-.LEHB0
.uleb128 0x0
.uleb128 0x0
.uleb128 .LEHB1-.LFB9
.uleb128 .LEHE1-.LEHB1
.uleb128 .L12-.LFB9
.uleb128 0x1
.uleb128 .LEHB2-.LFB9
.uleb128 .LEHE2-.LEHB2
.uleb128 0x0
.uleb128 0x0
.uleb128 .LEHB3-.LFB9
.uleb128 .LEHE3-.LEHB3
.uleb128 .L11-.LFB9
.uleb128 0x0
.LLSDACSE9:
.byte 0x1
.byte 0x0
.align 4
.long _ZTI11MyException
.LLSDATT9:

Surprise! There are no extra instructions at all on the normal code path. The compiler instead generated extra out-of-line fixup code blocks, referenced via a table at the end of the function (which is actually put on a separate section of the executable). All the work is done behind the scenes by the standard library, based on these tables (_ZTI11MyException is typeinfo for MyException).

OK, that was not actually a surprise for me, I already knew how this compiler did it. Continuing with the assembly output:

    .text
.align 2
.p2align 4,,15
.globl _Z20my_throwing_functionb
.type _Z20my_throwing_functionb, @function
_Z20my_throwing_functionb:
.LFB8:
pushl %ebp
.LCFI6:
movl %esp, %ebp
.LCFI7:
subl $24, %esp
.LCFI8:
cmpb $0, 8(%ebp)
jne .L21
leave
ret
.L21:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
movl %eax, (%esp)
call __cxa_throw
.LFE8:
.size _Z20my_throwing_functionb, .-_Z20my_throwing_functionb

Here we see the code for throwing an exception. While there was no extra overhead simply because an exception might be thrown, there is obviously a lot of overhead in actually throwing and catching an exception. Most of it is hidden within __cxa_throw, which must:

  • Walk the stack with the help of the exception tables until it finds a handler for that exception.
  • Unwind the stack until it gets to that handler.
  • Actually call the handler.

Compare that with the cost of simply returning a value, and you see why exceptions should be used only for exceptional returns.

To finish, the rest of the assembly file:

    .weak   _ZTI11MyException
.section .rodata._ZTI11MyException,"aG",@progbits,_ZTI11MyException,comdat
.align 4
.type _ZTI11MyException, @object
.size _ZTI11MyException, 8
_ZTI11MyException:
.long _ZTVN10__cxxabiv117__class_type_infoE+8
.long _ZTS11MyException
.weak _ZTS11MyException
.section .rodata._ZTS11MyException,"aG",@progbits,_ZTS11MyException,comdat
.type _ZTS11MyException, @object
.size _ZTS11MyException, 14
_ZTS11MyException:
.string "11MyException"

The typeinfo data.

    .section    .eh_frame,"a",@progbits
.Lframe1:
.long .LECIE1-.LSCIE1
.LSCIE1:
.long 0x0
.byte 0x1
.string "zPL"
.uleb128 0x1
.sleb128 -4
.byte 0x8
.uleb128 0x6
.byte 0x0
.long __gxx_personality_v0
.byte 0x0
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x88
.uleb128 0x1
.align 4
.LECIE1:
.LSFDE3:
.long .LEFDE3-.LASFDE3
.LASFDE3:
.long .LASFDE3-.Lframe1
.long .LFB9
.long .LFE9-.LFB9
.uleb128 0x4
.long .LLSDA9
.byte 0x4
.long .LCFI2-.LFB9
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI3-.LCFI2
.byte 0xd
.uleb128 0x5
.byte 0x4
.long .LCFI5-.LCFI3
.byte 0x83
.uleb128 0x3
.align 4
.LEFDE3:
.LSFDE5:
.long .LEFDE5-.LASFDE5
.LASFDE5:
.long .LASFDE5-.Lframe1
.long .LFB8
.long .LFE8-.LFB8
.uleb128 0x4
.long 0x0
.byte 0x4
.long .LCFI6-.LFB8
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI7-.LCFI6
.byte 0xd
.uleb128 0x5
.align 4
.LEFDE5:
.ident "GCC: (GNU) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)"
.section .note.GNU-stack,"",@progbits

Even more exception handling tables, and assorted extra information.

So, the conclusion, at least for GCC on Linux: the cost is extra space (for the handlers and tables) whether or not exceptions are thrown, plus the extra cost of parsing the tables and executing the handlers when an exception is thrown. If you use exceptions instead of error codes, and an error is rare, it can be faster, since you do not have the overhead of testing for errors anymore.

In case you want more information, in particular what all the __cxa_ functions do, see the original specification they came from:

  • Itanium C++ ABI

Behavior of c++ exceptions escaping into c program

From what I see about C++ exceptions, in this example which I took from MSDN, GCC seems to include the following assembly in the catch statement:

    call    __cxa_end_catch
jmp .L37
movq %rax, %rbx
call __cxa_end_catch
movq %rbx, %rax
movq %rax, %rdi
call _Unwind_Resume

Which makes use of what I can only assume are C++ library calls to functions that deal with exceptions (e.g. _Unwind_resume). So if the C code links against your library it will have to provide these symbols/functions which means that the code is going to be entering the C++ library to deal with the exceptions.

However, I don't yet know what the C++ library requires in order to do its job. I would expect it to be self contained but I'm not certain of it.

Edit: The answer to this question likely lies in the answers to the following two existing questions (and their interpretation):

  1. How is the C++ exception handling runtime implemented?
  2. How are exceptions implemented under the hood?
  3. How do exceptions work (behind the scenes) in c++

Edit 2: From this answer, it seems that since __cxa_throw uses a table for keeping track of available handlers. I would assume that when the table is exhausted, which in our case occurs when we enter C code, the function would call std::terminate. Hence, the C++ runtime (against which you must have linked) should take care of this for you without you needing to put up a catch all clause.

Since I'm still uncertain I will write up a test of this theory and update the answer with the results.

Exception handling in pure C

The C language itself has no support for exception handling. However a means of exception handling for C does exist on a platform + compiler specific basis.

In windows for example C programs can use SEH exception handling.

  • http://www.microsoft.com/msj/0197/Exception/Exception.aspx

I've seen arguments in the past that the C function pair setjmp and longjmp is exception handling in C. I consider it closer to the ejection pattern but it's worth investigating

  • http://msdn.microsoft.com/en-us/library/aa272905(VS.60).aspx

Purpose of throws Exception

The link Beri provides explains the technical rules behind declaring exceptions thrown across methods. To answer your question of "why throws Exception":

In a concrete (might put "final" in here, but I won't) method one should almost never need to declare "throws Exception", because a concrete method will know exactly which exceptions it could possibly throw and should list those explicitly.

An abstract method/interface method is different. You have three choices:

  1. Don't declare any thrown exceptions. This means that the only exceptions that may be thrown by any implementation are RuntimeException. This implies no checked exceptions can be thrown and that it should, in almost all cases, be safe to call this method without expecting failure. If it does throw an exception, there's nothing you can do about it.
  2. Throw specific checked exceptions. This can be done, but it's going to be a rare few cases where an abstract method can correctly predict the exact limited set of checked exceptions that could be thrown. When writing a framework with plugins, this would be a way to specify checked exceptions the framework understands how to handle (e.g. IOException in stream classes, FileNotFound). The implication of doing so is that the defined set are the only checked exceptions that could ever occur or would make sense to occur.
  3. Throw Exception. In this case it's saying that a concrete implementation will be allowed to throw any checked exceptions that make sense for that implementation, with no restrictions. An implementation might choose to throw less (or none), but is allowed to throw any checked exception. It indicates that an implementation is allowed to throw any checked exception, and a caller will be required handle Exception.

It doesn't add much value. Why not? Because the value of checked exceptions is understanding the specific exceptions that could be thrown so that they can be meaningfully handled by the caller. When you are left with just "Exception", and no indication of what an implementation might throw (or, with multiple implementations, what might vary from one to another), there's no meaningful way to handle that more so than just handling Exception, which is really no more meaningful than just handling RuntimeException.

So the only real value of declaring an abstract method with "throws Exception" is to explicitly say "we require the caller to explicitly handle exceptions that may be thrown by this method, because we can't guarantee if the implementation is likely to throw them or not." So rather than hoping an implementation won't throw an exception, one must assume it does.



Related Topics



Leave a reply



Submit