What Is a Symbol Table

What is a symbol table?

There are two common and related meaning of symbol tables here.

First, there's the symbol table in your object files. Usually, a C or C++ compiler compiles a single source file into an object file with a .obj or .o extension. This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand. If you call a function from your code, the compiler doesn't put the final address of the routine in the object file. Instead, it puts a placeholder value into the code and adds a note that tells the linker to look up the reference in the various symbol tables from all the object files it's processing and stick the final location there.

Second, there's also the symbol table in a shared library or DLL. This is produced by the linker and serves to name all the functions and data items that are visible to users of the library. This allows the system to do run-time linking, resolving open references to those names to the location where the library is loaded in memory.

If you want to learn more, I suggest John Levine's excellent book "Linkers and Loaders".link text

Why do symbol tables still exist after compilation

Why are these symbols still needed?

They are not needed for correctness of execution, but they are helpful for debugging.

Some programs can record their own stack trace (e.g. TCMalloc performs allocation sampling), and report it on crash (or other kind of errors).

While all such stack traces could be symbolized off-line (given a binary which did contain symbols), it is often much more convenient for the program to produce symbolized stack trace, so you don't need to find a matching binary.

Consider a case where you have 1000s of different applications running in the cloud at multiple versions, and you get 100 reports of a crash. Are they the same crash, or are there different causes?

If all you have are bunches of hex numbers, it's hard to tell. You'd have to find a matching binary for each instance, symbolize it, and compare to all the other ones (automation could help here).

But if you have the stack traces in symbolized form, it's pretty easy to tell at a glance.

This does come with a little bit of cost: your binaries are perhaps 1% larger than they have to be.

why doesn't the linker remove them once done?

You have to remember traditional UNIX roots. In the environment in which UNIX was developed everybody had access to the source for all UNIX utilities (including ld), and debuggability was way more important than keeping things secret. So I am not at all surprised that this default (keep symbols) was chosen.

Compare the to choice made by Microsoft -- keep everything to .DBG (later .PDB files).

aren't they potentially a security risk for hackers to read the source?

They are helpful in reverse engineering, yes. They don't contain the source, so unless the source is already open, they don't add that much.

Still, if your program contains something like CheckLicense(), this helps hackers to concentrate their efforts on bypassing your license checks.

Which is why commercial binaries are often shipped fully-stripped.

Update:

Is this what you mean by keeping symbol tables for debugging release builds?

Yes.

is this the normal way of doing it?

It's one way of doing it.

are there other tools used?

Yes: see best practice below.

P.S. The best practice is to build your binaries with full debug info:

gcc -c -g -O2 foo.c bar.c
gcc -g -o app.dbg foo.o bar.o ...

Then keep the full debug binary app.dbg for when you need to debug crashes, but ship a fully-stripped version app to your customers:

strip app.dbg -o app

P.P.S.

gcc -g is used for gdb. gcc without -g still has symbol tables.

Sooner or later you will find out that you must perform debugging on a binary that is built without -g (such as when the binary built without -g crashes, but one built with -g does not).

When that moment comes, your job will be much easier if the binary still has symbol table.



Related Topics



Leave a reply



Submit