Will Adding the -Rdynamic Linker Option to Gcc/G++ Impact Performance

Will adding the -rdynamic linker option to gcc/g++ impact performance?

Yes, there is, although it is very specific and normally not a cause of concern.

The -rdynamic option instructs the linker to add symbols to the symbol tables that are not normally needed at run time. It means there are more, possibly many more, symbols that the dynamic linker needs to weed through at run time for symbol resolution.

Specifically, since symbol table lookups in GNU based systems are implemented using a hash, having more symbols increases the chance that there would be hash collisions. Since all symols that collide in the hash table sit in a list, the run time linker needs to traverse the list and compare, using memcmp, each symbol name. Having more symbols collide in the hash meaning having longer lists and so it will take more time to resolve each dynamic symbol.

This situation is slightly worse for C++ then C, with the multitude of identically prefixed symbol names due to class names.

In practice, this only effects the very first time that a symbol is used and so, unless your application is very large and contains a lot of symbols, it will not be felt.

In the rare case that your application is that large, tricks like prelinking can be used to overcome the overhead.

Impact/Disadvantages of rdynamic gcc option

Q: But does it Affect performance considerably ..?

A: I've used it on a larger project w/o any degradation.

Q: Does it exposes My Source Code ..?

A: No, it just exposes function names.

Q: Does it affect total runtime performance or startup time ..?

A: In my experience, no. Most functions are already exported. Usually this adds the static functions.

Q: What are the disadvantages of 'rdynamic' ..?

A: rdynamic can be used with dlopen() to have a shared/global symbol table for the executable which was a must in my project (dynamic_cast<> will work across SO boundaries). The downside is function name collision between SOs.

Do `-g -rdynamic` gcc flags slow down application execution (grow performance consumption) notably?

Although introducing debug symbols does not affect performance by itself, your application still end up far behind in terms of possible performance. What I mean by that is that it would be bad idea to use -g and -O3 simultaneously, in general. Therefore, if your application is performance critical, but at the same time severely needs to keep good level of debugging, then it would be reasonable to find some balance between these two. In the latest versions of GCC, we are provided with -Og flag:

Optimize debugging experience. -Og enables optimizations that do not
interfere with debugging. It should be the optimization level of
choice for the standard edit-compile-debug cycle, offering a
reasonable level of optimization while maintaining fast compilation
and a good debugging experience.

I think it would be good idea to test your application with this flag, to see whether the performance is indeed better than bare -g, but the debugging stays intact.

Once again, do not neglect reading official GCC documentation. LTO is relatively new feature in GCC, and, as a result, some of its parts are still experimental and are not meant for production. For example, this is the direct extract:

Link-time optimization does not work well with generation of debugging
information. Combining -flto with -g is currently experimental and
expected to produce wrong results.

Not so long ago I had mixed experience with LTO. Sometimes it works well, sometimes the project doesn't even compile, not to mention that there could also be subtle runtime issues. Summarizing all of it, I would not recommend using LTO, especially in your situation.

NOTE: Performance gain from LTO usually varies from 0% to 3%, and it heavily depends on the underlying application. Without profiling, you cannot tell whether it is even reasonable to employ LTO for your situation as it might deliver more troubles than benefits.

Flags like -march and -mtune usually do optimizations on a very low level - instruction level for the target processor architecture. Thus, I wouldn't expect them to interfere with debugging. Nevertheless, you are welcomed to test this yourself with your application.

What exactly does `-rdynamic` do and when exactly is it needed?

Here is a simple example project to illustrate the use of -rdynamic.

bar.c

extern void foo(void);

void bar(void)
{
foo();
}

main.c

#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

void foo(void)
{
puts("Hello world");
}

int main(void)
{
void * dlh = dlopen("./libbar.so", RTLD_NOW);
if (!dlh) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
void (*bar)(void) = dlsym(dlh,"bar");
if (!bar) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
bar();
return 0;
}

Makefile

.PHONY: all clean test

LDEXTRAFLAGS ?=

all: prog

bar.o: bar.c
gcc -c -Wall -fpic -o $@ $<

libbar.so: bar.o
gcc -shared -o $@ $<

main.o: main.c
gcc -c -Wall -o $@ $<

prog: main.o | libbar.so
gcc $(LDEXTRAFLAGS) -o $@ $< -L. -lbar -ldl

clean:
rm -f *.o *.so prog

test: prog
./$<

Here, bar.c becomes a shared library libbar.so and main.c becomes
a program that dlopens libbar and calls bar() from that library.
bar() calls foo(), which is external in bar.c and defined in main.c.

So, without -rdynamic:

$ make test
gcc -c -Wall -o main.o main.c
gcc -c -Wall -fpic -o bar.o bar.c
gcc -shared -o libbar.so bar.o
gcc -o prog main.o -L. -lbar -ldl
./prog
./libbar.so: undefined symbol: foo
Makefile:23: recipe for target 'test' failed

And with -rdynamic:

$ make clean
rm -f *.o *.so prog
$ make test LDEXTRAFLAGS=-rdynamic
gcc -c -Wall -o main.o main.c
gcc -c -Wall -fpic -o bar.o bar.c
gcc -shared -o libbar.so bar.o
gcc -rdynamic -o prog main.o -L. -lbar -ldl
./prog
Hello world

What does gcc linking option / LOCAL_CFLAGS -rdynamic do

Adding -rdynamic to LOCAL_CFLAGS will do nothing, as -rdynamic is a linker flag. You need to add it to LOCAL_LDFLAGS.

For a more thorough explanation of -rdynamic, see https://stackoverflow.com/a/12636790/632035 (I know the question isn't the same, but the answer explains the flag well).

Performance improvements moving from g++/gcc 3.2.3 to 4.2.4

In my experience, 3.4 is where the performance basically peaked; 4.2 is actually slower than 3.4 on my project, with 4.3 being the first to roughly equal 3.4's performance. 4.4 is slightly faster than 3.4.

There are a specific few cases I've found where older versions of gcc did some unbelievably retarded things in code--there was a particular function that went from 128 to 21 clocks from 3.4 to 4.3, but that was obviously a special case (it was a short loop where the addition of just a few unnecessary instructions massively hurt performance).

I personally use 3.4 just because it compiles so much faster, making testing much quicker. I also try to avoid the latest versions because they seem to have nasty habits of miscompiling code; --march core2 on recent gcc versions causes segfaults in my program, for example, because it emits autovectorized code that tries to perform aligned accesses on unaligned addresses.

Overall though the differences are rarely large; 3-5% is the absolute most I've seen in terms of performance change.

Now, note this is C; things may be different in C++.

gcc -g vs not -g and strip vs not strip, performance and memory usage?

The ELF loader loads segments, not sections; the mapping from sections to segments is determined by the linker script used for building the executable.

The default linker script does not map debug sections to any segment, so this is omitted.

Symbol information comes in two flavours: static symbols are processed out-of-band and never stored as section data; dynamic symbol tables are generated by the linker and added to a special segment that is loaded along with the executable, as it needs to be accessible to the dynamic linker. The strip command only removes the static symbols, which are never referenced in a segment anyway.

So, you can use full debug information through the entire process, and this will not affect the size of the executable image in RAM, as it is not loaded. This also means that the information is not included in core dumps, so this does not give you any benefit here either.

The objcopy utility has a special option to copy only the debug information, so you can generate a second ELF file containing this information and use stripped binaries; when analyzing the core dump, you can then load both files into the debugger:

objcopy --only-keep-debug myprogram myprogram.debug
strip myprogram

GCC 4.8: Does -Og imply -g?

Short answer: No, you must still add -g manually.

Long answer:

I have struggled to find a hard answer straight from the source, so I decided to test it out myself using the methods described here: How to check if program was compiled with debug symbols?

I built an executable with the -O3 flag and without -g. Using objdump --syms <file> | grep debug yielded nothing, as expected.

I then built an executable with -g and without any optimization flags. The same objdump command yielded six results such as this:

0000000000000000 l d .debug_info 0000000000000000 .debug_info

I finally built an executable with the -Og flag and without -g. The objdump command yielded nothing. That implies that debug symbols are not present in this case.

While I can't find any explicit documentation from GCC itself, the Gentoo Wiki (as mentioned before by Marco Scannadinari) confirms my assertion that -Og does not imply -g.

Is there any difference between -Xlinker -export-dynamic and -rdynamic?


This is a pretty straight forward question. Is there any difference between: -Xlinker -export-dynamic and -rdynamic

It depends.

If you are using GNU-ld (or gold) as your linker, then -Xlinker --export-dynamic (note: you have a missing dash in your question) is exactly equivalent to -rdynamic.

But on e.g. Solaris, -rdynamic will do the right thing and pass nothing to the linker (Sun ld apparently exports all symbols by default), while the -Xlinker ... variant will result in a link error.



Related Topics



Leave a reply



Submit