Detect Gcc Compile-Time Flags of a Binary

Detect GCC compile-time flags of a binary

A quick look at the GCC documentation doesn't turn anything up.

The Boost guys are some of the smartest C++ developers out there, and they resort to naming conventions because this is generally not possible any other way (the executable could have been created in any number of languages, by any number of compiler versions, after all).


(Added much later): Turns out GCC has this feature in 4.3 if asked for when you compile the code:

A new command-line switch -frecord-gcc-switches ... causes the command line that was used to invoke the compiler to be recorded into the object file that is being created. The exact format of this recording is target and binary file format dependent, but it usually takes the form of a note section containing ASCII text.

Get the compiler options from a compiled executable?

gcc has a -frecord-gcc-switches option for that:

   -frecord-gcc-switches
This switch causes the command line that was used to invoke the compiler to
be recorded into the object file that is being created. This switch is only
implemented on some targets and the exact format of the recording is target
and binary file format dependent, but it usually takes the form of a section
containing ASCII text.

Afterwards, the ELF executables will contain .GCC.command.line section with that information.

$ gcc -O2 -frecord-gcc-switches a.c
$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
[ 0] a.c
[ 4] -mtune=generic
[ 13] -march=x86-64
[ 21] -O2
[ 25] -frecord-gcc-switches

Of course, it won't work for executables compiled without that option.


For the simple case of optimizations, you could try using a debugger if the file was compiled with debug info. If you step through it a little, you may notice that some variables were 'optimized out'. That suggests that optimization took place.

Is there a way to store clang compile-time flags in the output binary?

As ecatmur already has implied in the comments. This feature is currently not supported as documented in bug https://llvm.org/bugs/show_bug.cgi?id=16291 .

However as a work around while the feature is not available I would suggest having your build process define a macro inside the program using clang's -D argument. For example assuming you are invoking this from a bash script (adjust to whatever build tool you use):

CLANG_ARGS='-O3 -c main.c'
clang $CLANG_ARGS -D CLANG_ARGS="\"${CLANG_ARGS}\""

Then in your C or C++ programs you add something along the lines of:

const char clangArgs[] = CLANG_ARGS;

Which you can then retrieve using a debugger or some such or even could add some code to print it from your program when invoked with the -V or --version switch.

Embedding compile time information into binary

Are __TIME__ and __DATE__ what you are looking for ?

If compiling and linking is one step in your scenario you can have your compiler and linker replace those macros with the date and time. If you compile one day and link the other this will not work because the compiler (better: preprocessor) decides which value is inserted.

Have a look at this or other posts here on stackoverflow.

Which GCC optimization flags affect binary size the most?

Most of the extra code-size for an un-optimized build is the fact that the default -O0 also means a debug build, not keeping anything in registers across statements for consistent debugging even if you use a GDB j command to jump to a different source line in the same function. -O0 means a huge amount of store/reload vs. even the lightest level of optimization, especially disastrous for code-size on a non-CISC ISA that can't use memory source operands. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? applies to GCC equally.

Especially for modern C++, a debug build is disastrous because simple template wrapper functions that normally inline and optimize away to nothing in simple cases (or maybe one instruction), instead compile to actual function calls that have to set up args and run a call instruction. e.g. for a std::vector, the operator[] member function can normally inline to a single ldr instruction, assuming the compiler has the .data() pointer in a register. But without inlining, every call-site takes multiple instructions1


Options that affect code-size in the actual .text section1 the most: alignment of branch-targets in general, or just loops, costs some code-size. Other than that:

  • -ftree-vectorize - make SIMD versions loops, also necessitating scalar cleanup if the compiler can't prove that the iteration count will be a multiple of the vector width. (Or that pointed-to arrays are non-overlapping if you don't use restrict; that may also need a scalar fallback). Enabled at -O3 in GCC11 and earlier. Enabled at -O2 in GCC12 and later, like clang.

  • -funroll-loops / -funroll-all-loops - not enabled by default even at -O3 in modern GCC. Enabled with profile-guided optimization (-fprofile-use), when it has profiling data from a -fprofile-generate build to know which loops are actually hot and worth spending code-size on. (And which are cold and thus should be optimized for size so you get fewer I-cache misses when they do run, and less eviction of other code.) PGO also influences vectorization decisions.

    Related to loop unrolling are heuristics (tuning knobs) that control loop peeling (fully unrolling) and how much to unroll. The normal way to set these is with -march=native, implying -mtune= whatever as well. -mtune=znver3 may favour big unroll factors (at least clang does), compared to -mtune=sandybridge or -mtune=haswell. But there are GCC options to manually adjust individual things, as discussed in comments on gcc: strange asm generated for simple loop and in How to ask GCC to completely unroll this loop (i.e., peel this loop)?

    There are options to override the weights and thresholds for other decision heuristics like inlining, too, but it's very rare you'd want to fine-tune that much unless you're working on refining the defaults, or finding good defaults for a new CPU.

  • -Os - optimize for size and speed, trying not to sacrifice too much speed. A good tradeoff if your code has a lot of I-cache misses, otherwise -O3 is normally faster, or at least that's the design goal for GCC. Can be worth trying different options to see if -O2 or -Os make your code faster than -O3 across some CPUs you care about; sometimes missed-optimizations or quirks of certain microarchitectures make a difference, as in Why does GCC generate 15-20% faster code if I optimize for size instead of speed? which has actual benchmarks from GCC4.6 to 4.8 (current at the time) for a specific small loop in a test program, on quite a few different x86 and ARM CPUs, with and without -march=native to actually tune for them. There's zero reason to expect that to be representative of other code, though, so you need to test yourself for your own codebase. (And for any given loop, small code changes could make a different compile option better on any given CPU.)

    And obviously -Os is very useful if you need your static code-size smaller to fit in some size limit.

  • -Oz optimizing for size only, even at a large cost in speed. GCC only very recently added this to current trunk, so expect it in GCC12 or 13. Presumably what I wrote below about clang's implementation of -Oz being quite aggressive also applies to GCC, but I haven't yet tested it.

Clang has similar options, including -Os. It also has a clang -Oz option to optimize only for size, without caring about speed. It's very aggressive, e.g. on x86 using code-golf tricks like push 1; pop rax (3 bytes total) instead of mov eax, 1 (5 bytes).

GCC's -Os unfortunately chooses to use div instead of a multiplicative inverse for division by a constant, costing lots of speed but not saving much if any size. (https://godbolt.org/z/x9h4vx1YG for x86-64). For ARM, GCC -Os still uses an inverse if you don't use a -mcpu= that implies udiv is even available, otherwise it uses udiv: https://godbolt.org/z/f4sa9Wqcj .

Clang's -Os still uses a multiplicative inverse with umull, only using udiv with -Oz. (or a call to __aeabi_uidiv helper function without any -mcpu option). So in that respect, clang -Os makes a better tradeoff than GCC, still spending a little bit of code-size to avoid slow integer division.



Footnote 1: inlining or not for std::vector

#include <vector>
int foo(std::vector<int> &v) {
return v[0] + v[1];
}

Godbolt with gcc with the default -O0 vs. -Os for -mcpu=cortex-m7 just to randomly pick something. IDK if it's normal to use dynamic containers like std::vector on an actual microcontroller; probably not.

# -Os (same as -Og for this case, actually, omitting the frame pointer for this leaf function)
foo(std::vector<int, std::allocator<int> >&):
ldr r3, [r0] @ load the _M_start member of the reference arg
ldrd r0, r3, [r3] @ load a pair of words (v[0..1]) from there into r0 and r3
add r0, r0, r3 @ add them into the return-value register
bx lr

vs. a debug build (with name-demangling enabled for the asm)

# GCC -O0 -mcpu=cortex-m7 -mthumb
foo(std::vector<int, std::allocator<int> >&):
push {r4, r7, lr} @ non-leaf function requires saving LR (the return address) as well as some call-preserved registers
sub sp, sp, #12
add r7, sp, #0 @ Use r7 as a frame pointer. -O0 defaults to -fno-omit-frame-pointer
str r0, [r7, #4] @ spill the incoming register arg to the stack

movs r1, #0 @ 2nd arg for operator[]
ldr r0, [r7, #4] @ reload the pointer to the control block as the first arg
bl std::vector<int, std::allocator<int> >::operator[](unsigned int)
mov r3, r0 @ useless copy, but hey we told GCC not to spend any time optimizing.
ldr r4, [r3] @ deref the reference (pointer) it returned, into a call-preserved register that will survive across the next call

movs r1, #1 @ arg for the v[1] operator[]
ldr r0, [r7, #4]
bl std::vector<int, std::allocator<int> >::operator[](unsigned int)
mov r3, r0
ldr r3, [r3] @ deref the returned reference

add r3, r3, r4 @ v[1] + v[0]
mov r0, r3 @ and copy into the return value reg because GCC didn't bother to add into it directly

adds r7, r7, #12 @ tear down the stack frame
mov sp, r7
pop {r4, r7, pc} @ and return by popping saved-LR into PC

@ and there's an actual implementation of the operator[] function
@ it's 15 instructions long.
@ But only one instance of this is needed for each type your program uses (vector<int>, vector<char*>, vector<my_foo>, etc.)
@ so it doesn't add up as much as each call-site
std::vector<int, std::allocator<int> >::operator[](unsigned int):
push {r7}
sub sp, sp, #12
...

As you can see, un-optimized GCC cares more about fast compile-times than even the most simple things like avoiding useless mov reg,reg instructions even within code for evaluating one expression.



Footnote 1: metadata

If you could a whole ELF executable with metadata, not just the .text + .rodata + .data you'd need to burn to flash, then of course -g debug info is very significant for size of the file, but basically irrelevant because it's not mixed in with the parts that are needed while running, so it just sits there on disk.

Symbol names and debug info can be stripped with gcc -s or strip.

Stack-unwind info is an interesting tradeoff between code-size and metadata. -fno-omit-frame-pointer wastes extra instructions and a register as a frame pointer, leading to larger machine-code size, but smaller .eh_frame stack unwind metadata. (strip does not consider that "debug" info by default, even for C programs not C++ where exception-handling might need it in non-debugging contexts.)

How to remove "noise" from GCC/clang assembly output? mentions how to get the compiler to omit some of that: -fno-asynchronous-unwind-tables omits .cfi directives in the asm output, and thus the metadata that goes into the .eh_frame section. Also -fno-exceptions -fno-rtti with C++ can reduce metadata. (Run-Time Type Information for reflection takes space.)

Linker options that control alignment of sections / ELF segments can also take extra space, relevant for tiny executables but is basically a constant amount of space, not scaling with the size of the program. See also Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?

Get the compiler options from the program

Because I'm using qmake build system I came across this solution :

I added this line to the end of my pro file :

QMAKE_CXXFLAGS += -DFLAGS=\"$$QMAKE_CXXFLAGS $$QMAKE_CXXFLAGS_RELEASE\"

then retrieved what I want from the FLAGS macro

Compiled binary path based on compiler options

Any better approaches / recommendations?

If I understand correctly, your GNU Make build system can build several variants of
your executable, differentiated by preprocessor macros that are defined (or not)
in the compilation commands depending on conditions that are tested in your Makefile
and/or on the arguments that you pass to make. And you want to be able to build
any of these variants independently, without needing a make clean to remove the
artifacts of the previous build, which might well have been a build of a different
variant.

This is one of the basic needs of build systems. The conventional solution is not the
one you're thinking about - to somehow encode differentiations into the name of
the executable. That won't work anyway, unless you do the same thing with the names of
the object files that are linked into the executable. If you don't, then when
you switch from variant X to variant Y, a variant-X object file foo.o
that is not older than foo.cpp, will not need to be recompiled,
even if should be for variant-Y, and that variant-X foo.o will be linked into the
variant Y executable, no matter what it is called.

The conventional solution is to differentiate, per variant, the place where the compiler
will output the object files and correspondingly the place where the linker
outputs the executable. No doubt all of the C/C++ IDEs you have ever used allow you
to build either a debug variant or a release variant of your project, and
they differentiate the debug object files and executable from the release object
files and executables by generating them in different subdirectories of the
project directory, e.g.

<projdir>/Debug/{obj|bin}
<projdir>/Release/{obj|bin}

or maybe:

<projdir>/obj/{debug|release}
<projdir>/bin/{debug|release}

This approach automatically encodes the variant of an object file or executable
into its absolute pathname,e.g.

<projdir>/Debug/obj/foo.o
<projdir>/bin/release/prog

without any further ado, and the variants can be built independently.

It's straightforward to implement this scheme in a makefile. Most of the IDEs
that use it do implement it in the makefiles that they generate behind the scenes.
And it's also straightforward to extend the scheme to more variants than just debug
and release (although whatever variants you want, you'll certainly want debug
and release variants of those variants).

Here's an illustration for a toy program that we want to build in any of the
variants that we get for combinations of two build-properties that we'll call
TRAIT_A and TRAIT_B:

 | TRAIT_A | TRAIT_B |
|---------|---------|
| Y | Y |
|---------|---------|
| Y | N |
|---------|---------|
| N | Y |
|---------|---------|
| N | N |

And we want to be able to build any of those variants in debug mode or release
mode. TRAIT_{A|B} might map directly to a preprocessor macro, or to an
arbitrary combination of preprocessor flags, compiler options and/or linkage options.

Our program, prog, is built from just one source file:

main.cpp

#include <string>
#include <cstdlib>

int main(int atgc, char * argv[])
{
std::string cmd{"readelf -p .GCC.command.line "};
cmd += argv[0];
return system(cmd.c_str());
}

And all it does is invoke readelf to dump the linkage section .GCC.command.line
within its own executable. That linkage section only exists when we compile or
link with the GCC option -frecord-gcc-switches.
So purely for the purpose of the demo we'll always compile and link with that option.
Here's a makefile that adopts one way of differentiating all the variants:
object files are compiled in ./obj[/trait...]; executables are linked in
./bin[/trait...]:

Makefile

CXX = g++
CXXFLAGS := -frecord-gcc-switches
BINDIR := ./bin
OBJDIR := ./obj

ifdef RELEASE
ifdef DEBUG
$(error RELEASE and DEBUG are mutually exclusive)
endif
CPPFLAGS := -DNDEBUG
CXXFLAGS += -O3
BINDIR := $(BINDIR)/release
OBJDIR := $(OBJDIR)/release
endif

ifdef DEBUG
ifdef RELEASE
$(error RELEASE and DEBUG are mutually exclusive)
endif
CXXFLAGS += -O0 -g
BINDIR := $(BINDIR)/debug
OBJDIR := $(OBJDIR)/debug
endif

ifdef TRAIT_A
CPPFLAGS += -DTRAIT_A # or whatever
BINDIR := $(BINDIR)/TRAIT_A
OBJDIR := $(OBJDIR)/TRAIT_A
endif

ifdef TRAIT_B
CPPFLAGS += -DTRAIT_B # or whatever
BINDIR := $(BINDIR)/TRAIT_B
OBJDIR := $(OBJDIR)/TRAIT_B
endif

SRCS := main.cpp
OBJS := $(OBJDIR)/$(SRCS:.cpp=.o)
EXE := $(BINDIR)/prog

.PHONY: all clean

all: $(EXE)

$(EXE): $(OBJS) | $(BINDIR)
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -o $@ $(LDFLAGS) $^ $(LIBS)

$(OBJDIR)/%.o: %.cpp | $(OBJDIR)
$(CXX) -c -o $@ $(CPPFLAGS) $(CXXFLAGS) $<

$(BINDIR) $(OBJDIR):
mkdir -p $@

clean:
$(RM) $(EXE) $(OBJS)

Now let's build, say, two variants in debug mode and two other variants in
release mode, one after the other

$ make DEBUG=1 TRAIT_A=1
mkdir -p obj/debug/TRAIT_A
g++ -c -o obj/debug/TRAIT_A/main.o -DTRAIT_A -frecord-gcc-switches -O0 -g main.cpp
mkdir -p bin/debug/TRAIT_A
g++ -DTRAIT_A -frecord-gcc-switches -O0 -g -o bin/debug/TRAIT_A/prog obj/debug/TRAIT_A/main.o

$ make DEBUG=1 TRAIT_B=1
mkdir -p obj/debug/TRAIT_B
g++ -c -o obj/debug/TRAIT_B/main.o -DTRAIT_B -frecord-gcc-switches -O0 -g main.cpp
mkdir -p bin/debug/TRAIT_B
g++ -DTRAIT_B -frecord-gcc-switches -O0 -g -o bin/debug/TRAIT_B/prog obj/debug/TRAIT_B/main.o

$ make RELEASE=1 TRAIT_A=1 TRAIT_B=1
mkdir -p obj/release/TRAIT_A/TRAIT_B
g++ -c -o obj/release/TRAIT_A/TRAIT_B/main.o -DNDEBUG -DTRAIT_A -DTRAIT_B -frecord-gcc-switches -O3 main.cpp
mkdir -p bin/release/TRAIT_A/TRAIT_B
g++ -DNDEBUG -DTRAIT_A -DTRAIT_B -frecord-gcc-switches -O3 -o bin/release/TRAIT_A/TRAIT_B/prog obj/release/TRAIT_A/TRAIT_B/main.o

$ make RELEASE=1
g++ -c -o obj/release/main.o -DNDEBUG -frecord-gcc-switches -O3 main.cpp
g++ -DNDEBUG -frecord-gcc-switches -O3 -o bin/release/prog obj/release/main.o

That last one is the release variant with neither TRAIT_A nor TRAIT_B.

We've now built four versions of program prog in different ./bin[/...] subdirectories
of the project, from different object files that are in different ./obj[/...] subdirectories,
and those versions will all tell us how they were differently built. Running in the order
we built them:-

$ bin/debug/TRAIT_A/prog

String dump of section '.GCC.command.line':
[ 0] -imultiarch x86_64-linux-gnu
[ 1d] -D_GNU_SOURCE
[ 2b] -D TRAIT_A
[ 36] main.cpp
[ 3f] -mtune=generic
[ 4e] -march=x86-64
[ 5c] -auxbase-strip obj/debug/TRAIT_A/main.o
[ 84] -g
[ 87] -O0
[ 8b] -frecord-gcc-switches
[ a1] -fstack-protector-strong
[ ba] -Wformat
[ c3] -Wformat-security

$ bin/debug/TRAIT_B/prog

String dump of section '.GCC.command.line':
[ 0] -imultiarch x86_64-linux-gnu
[ 1d] -D_GNU_SOURCE
[ 2b] -D TRAIT_B
[ 36] main.cpp
[ 3f] -mtune=generic
[ 4e] -march=x86-64
[ 5c] -auxbase-strip obj/debug/TRAIT_B/main.o
[ 84] -g
[ 87] -O0
[ 8b] -frecord-gcc-switches
[ a1] -fstack-protector-strong
[ ba] -Wformat
[ c3] -Wformat-security

$ bin/release/TRAIT_A/TRAIT_B/prog

String dump of section '.GCC.command.line':
[ 0] -imultiarch x86_64-linux-gnu
[ 1d] -D_GNU_SOURCE
[ 2b] -D NDEBUG
[ 35] -D TRAIT_A
[ 40] -D TRAIT_B
[ 4b] main.cpp
[ 54] -mtune=generic
[ 63] -march=x86-64
[ 71] -auxbase-strip obj/release/TRAIT_A/TRAIT_B/main.o
[ a3] -O3
[ a7] -frecord-gcc-switches
[ bd] -fstack-protector-strong
[ d6] -Wformat
[ df] -Wformat-security

$ bin/release/prog

String dump of section '.GCC.command.line':
[ 0] -imultiarch x86_64-linux-gnu
[ 1d] -D_GNU_SOURCE
[ 2b] -D NDEBUG
[ 35] main.cpp
[ 3e] -mtune=generic
[ 4d] -march=x86-64
[ 5b] -auxbase-strip obj/release/main.o
[ 7d] -O3
[ 81] -frecord-gcc-switches
[ 97] -fstack-protector-strong
[ b0] -Wformat
[ b9] -Wformat-security

We can clean the first one:

$ make DEBUG=1 TRAIT_A=1 clean
rm -f ./bin/debug/TRAIT_A/prog ./obj/debug/TRAIT_A/main.o

And the last one:

$ make RELEASE=1 clean
rm -f ./bin/release/prog ./obj/release/main.o

The second and third are still there and up to date:

$ make DEBUG=1 TRAIT_B=1
make: Nothing to be done for 'all'.

$ make RELEASE=1 TRAIT_A=1 TRAIT_B=1
make: Nothing to be done for 'all'.

For the exercise, you might consider refining the makefile to let you build, or clean,
all the variants at the same time. Or to default to DEBUG if RELEASE not is defined, or vice versa. Or to fail if no valid combination of traits is selected, for some definition of valid.

BTW, note that preprocessor options are conventionally assigned in the make variable
CPPFLAGS, for either C or C++ compilation; C compiler options are assigned
in CFLAGS and C++ compiler options in CXXFLAGS. GNU Make's built-in
rules assume that you follow these conventions.

Is it possible to determine or set compiler options from within the source code in gcc?

In my experience, no. This is not the way you go about this. Instead, you put compiler/platform/OS specific code in your source, and wrap it with the appropriate ifdef statements. These include:

#ifdef __GNUC__
/*code for GNU C compiler */
#elif _MSC_VER
/*usually has the version number in _MSC_VER*/
/*code specific to MSVC compiler*/
#elif __BORLANDC__
/*code specific to borland compilers*/
#elif __MINGW32__
/*code specific to mingw compilers*/
#endif

Within this, you can have version-specific requirements and code:

#ifdef __GNUC__
# include <features.h>
# if __GNUC_PREREQ(4,0)
// If gcc_version >= 4.0
# elif __GNUC_PREREQ(3,2)
// If gcc_version >= 3.2
# else
// Else
# endif
#else
// If not gcc
#endif

From there, you have your makefile pass the appropriate compiler flags based on the compiler type, version, etc, and you're all set.



Related Topics



Leave a reply



Submit