How Are Gcc and G++ Bootstrapped

How are GCC and g++ bootstrapped?

The oldest version of GCC was compiled using another C compiler, since there were others when it was written. The very first C compiler ever (ca. 1973, IIRC) was implemented either in PDP-11 assembly, or in the B programming language which preceded it, but in any case the B compiler was written in assembly. Similarly, the first ever C++ compiler (CPre/Cfront, 1979-1983) were probably first implemented in C, then rewritten in C++.

When you compile GCC or any other self-hosting compiler, the full order of building is:

  1. Build new version of GCC with existing C compiler
  2. re-build new version of GCC with the one you just built
  3. (optional) repeat step 2 for verification purposes.

This process is called bootstrapping. It tests the compiler's capability of compiling itself and makes sure that the resulting compiler is built with all the optimizations that it itself implements.

EDIT: Drew Dormann, in the comments, points to Bjarne Stroustrup's account of the earliest implementation of C++. It was implemented in C++ but translated by what Stroustrup calls a "preprocessor" from C++ to C; not a full compiler by his definition, but still C++ was bootstrapped in C.

How does bootstrapping work for gcc?

Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?

I understand what you're asking... what would happen if we had no C compiler and had to start from scratch?

The answer is you'd have to start from assembly or hardware. That is, you can either build a compiler in software or hardware. If there were no compilers in the whole world, these days you could probably do it faster in assembly; however, back in the day I believe compilers were in fact dedicated pieces of hardware. The wikipedia article is somewhat short and doesn't back me up on that, but never mind.

The next question I guess is what happens today? Well, those compiler writers have been busy writing portable C for years, so the compiler should be able to compile itself. It's worth discussing on a very high level what compilation is. Basically, you take a set of statements and produce assembly from them. That's it. Well, it's actually more complicated than that - you can do all sorts of things with lexers and parsers and I only understand a small subset of it, but essentially, you're looking to map C to assembly.

Under normal operation, the compiler produces assembly code matching your platform, but it doesn't have to. It can produce assembly code for any platform you like, provided it knows how to. So the first step in making C work on your platform is to create a target in an existing compiler, start adding instructions and get basic code working.

Once this is done, in theory, you can now cross compile from one platform to another. The next stages are: building a kernel, bootloader and some basic userland utilities for that platform.

Then, you can have a go at compiling the compiler for that platform (once you've got a working userland and everything you need to run the build process). If that succeeds, you've got basic utilities, a working kernel, userland and a compiler system. You're now well on your way.

Note that in the process of porting the compiler, you probably needed to write an assembler and linker for that platform too. To keep the description simple, I omitted them.

If this is of interest, Linux from Scratch is an interesting read. It doesn't tell you how to create a new target from scratch (which is significantly non trivial) - it assumes you're going to build for an existing known target, but it does show you how you cross compile the essentials and begin building up the system.

Python does not actually assemble to assembly. For a start, the running python program keeps track of counts of references to objects, something that a cpu won't do for you. However, the concept of instruction-based code is at the heart of Python too. Have a play with this:

>>> def hello(x, y, z, q):
... print "Hello, world"
... q()
... return x+y+z
...
>>> import dis
dis.dis(hello)

2 0 LOAD_CONST 1 ('Hello, world')
3 PRINT_ITEM
4 PRINT_NEWLINE

3 5 LOAD_FAST 3 (q)
8 CALL_FUNCTION 0
11 POP_TOP

4 12 LOAD_FAST 0 (x)
15 LOAD_FAST 1 (y)
18 BINARY_ADD
19 LOAD_FAST 2 (z)
22 BINARY_ADD
23 RETURN_VALUE

There you can see how Python thinks of the code you entered. This is python bytecode, i.e. the assembly language of python. It effectively has its own "instruction set" if you like for implementing the language. This is the concept of a virtual machine.

Java has exactly the same kind of idea. I took a class function and ran javap -c class to get this:

invalid.site.ningefingers.main:();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iconst_0
3: istore_1
4: iload_1
5: aload_0
6: arraylength
7: if_icmpge 57
10: getstatic #2;
13: new #3;
16: dup
17: invokespecial #4;
20: ldc #5;
22: invokevirtual #6;
25: iload_1
26: invokevirtual #7;
//.......
}

I take it you get the idea. These are the assembly languages of the python and java worlds, i.e. how the python interpreter and java compiler think respectively.

Something else that would be worth reading up on is JonesForth. This is both a working forth interpreter and a tutorial and I can't recommend it enough for thinking about "how things get executed" and how you write a simple, lightweight language.

Why does GCC compile itself 3 times?

  • Stage 2. and 3. are a good test for the compiler itself: If it can compile itself (and usually also some libraries like libgcc and libstdc++-v3) then it can chew non-trivial projects.

  • In stage 2. and 3., you can generate the compiler with different options, for example without optimization (-O0) or with optimization turned on (-O2). As the output / side effects of a program should not depend on the optimization level used, either version of the compiler must produce the same binary for the same source file, even though the two compilers are binary very different. This is yet another (run-time test) for the compiler.

If you prefer non-bootstrap for some reason, configure --disable-bootstrap.

Is LLVM/Clang bootstrapped/compiled by GCC?

No. LLVM can be compiled using any of a large number of compilers. When I cared about such things there were dozens of independent compilers. (Remember that only correctness matters — you can bootstrap using even the slowest compiler, that produces the slowest output and has shitty diagnostics.)

This legend has a basis though: On many systems, LLVM asks GCC about how to invoke the system linker, which can be fiddly.

Building GCC: What are the advantages and disadvantages of bootstrap?

The GCC configuration documentation says:

--disable-bootstrap

For a native build, the default configuration is to perform a 3-stage bootstrap of the compiler when ‘make’ is invoked, testing that GCC can compile itself correctly. If you want to disable this process, you can configure with --disable-bootstrap.

--enable-bootstrap

In special cases, you may want to perform a 3-stage build even if the target and host triplets are different. This is possible when the host can run code compiled for the target (e.g. host is i686-linux, target is i486-linux). Starting from GCC 4.2, to do this you have to configure explicitly with --enable-bootstrap.

On the whole, you're best off leaving this default well alone.

If you use --disable-bootstrap, it does a 1-stage build (per Explorer09 in a comment, but I've still not tried it), which could work when a 3-stage build won't.

If you need to know more, read the documentation.

Building boost with different gcc version

I cross built Boost for an ARM toolchain using something like this:

echo "using gcc : arm-unknown-linux-gnueabi : /usr/local/arm/bin/g++ ; " >> tools/build/v2/user-config.jam

You should be able to do something like this:

boost version 1.59 and above:

echo "using gcc : 4.4 : /usr/bin/g++-4.4 ; " >> tools/build/src/user-config.jam

boost version 1.58 and below:

echo "using gcc : 4.4 : /usr/bin/g++-4.4 ; " >> tools/build/v2/user-config.jam

and then build with

bjam --toolset=gcc-4.4


Related Topics



Leave a reply



Submit