How to Get Assembler Output from C/C++ Source in Gcc

How do you get assembler output from C/C++ source in gcc?

Use the -S option to gcc (or g++), optionally with -fverbose-asm which works well at the default -O0 to attach C names to asm operands as comments. Less well at any optimization level, which you normally want to use to get asm worth looking at.

gcc -S helloworld.c

This will run the preprocessor (cpp) over helloworld.c, perform the initial compilation and then stop before the assembler is run. For useful compiler options to use in that case, see How to remove "noise" from GCC/clang assembly output? (or just look at your code on Matt Godbolt's online Compiler Explorer which filters out directives and stuff, and has highlighting to match up source lines with asm using debug info.)

By default this will output a file helloworld.s. The output file can be still be set by using the -o option, including -o - to write to stdout for pipe into less.

gcc -S -o my_asm_output.s helloworld.c

Of course this only works if you have the original source.
An alternative if you only have the resultant object file is to use objdump, by setting the --disassemble option (or -d for the abbreviated form).

objdump -S --disassemble helloworld > helloworld.dump

-S interleaves source lines with normal disassembly output, so this option works best if debugging option is enabled for the object file (-g at compilation time) and the file hasn't been stripped.

Running file helloworld will give you some indication as to the level of detail that you will get by using objdump.

Other useful objdump options include -rwC (to show symbol relocations, disable line-wrapping of long machine code, and demangle C++ names). And if you don't like AT&T syntax for x86, -Mintel. See the man page.

So for example, objdump -drwC -Mintel -S foo.o | less.

-r is very important with a .o that only has 00 00 00 00 placeholders for symbol references, as opposed to a linked executable.

Using GCC to produce readable assembly?

If you compile with debug symbols (add -g to your GCC command line, even if you're also using -O31),
you can use objdump -S to produce a more readable disassembly interleaved with C source.

>objdump --help
[...]
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output

objdump -drwC -Mintel is nice:

  • -r shows symbol names on relocations (so you'd see puts in the call instruction below)
  • -R shows dynamic-linking relocations / symbol names (useful on shared libraries)
  • -C demangles C++ symbol names
  • -w is "wide" mode: it doesn't line-wrap the machine-code bytes
  • -Mintel: use GAS/binutils MASM-like .intel_syntax noprefix syntax instead of AT&T
  • -S: interleave source lines with disassembly.

You could put something like alias disas="objdump -drwCS -Mintel" in your ~/.bashrc. If not on x86, or if you like AT&T syntax, omit -Mintel.


Example:

> gcc -g -c test.c
> objdump -d -M intel -S test.o

test.o: file format elf32-i386


Disassembly of section .text:

00000000 <main>:
#include <stdio.h>

int main(void)
{
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 e4 f0 and esp,0xfffffff0
6: 83 ec 10 sub esp,0x10
puts("test");
9: c7 04 24 00 00 00 00 mov DWORD PTR [esp],0x0
10: e8 fc ff ff ff call 11 <main+0x11>

return 0;
15: b8 00 00 00 00 mov eax,0x0
}
1a: c9 leave
1b: c3 ret

Note that this isn't using -r so the call rel32=-4 isn't annotated with the puts symbol name. And looks like a broken call that jumps into the middle of the call instruction in main. Remember that the rel32 displacement in the call encoding is just a placeholder until the linker fills in a real offset (to a PLT stub in this case, unless you statically link libc).


Footnote 1: Interleaving source can be messy and not very helpful in optimized builds; for that, consider https://godbolt.org/ or other ways of visualizing which instructions go with which source lines. In optimized code there's not always a single source line that accounts for an instruction but the debug info will pick one source line for each asm instruction.

How do I get full assembler output in gcc?

I like objdump for this, but the most useful options are non-obvious - especially if you're using it on an object file which contains relocations, rather than a final binary.

objdump -d some_binary does a reasonable job.

objdump -d some_object.o is less useful because calls to external functions don't get disassembled helpfully:

...
00000005 <foo>:
5: 55 push %ebp
6: 89 e5 mov %esp,%ebp
8: 53 push %ebx
...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
30: e8 fc ff ff ff call 31 <foo+0x2c>
35: 89 d8 mov %ebx,%eax
...

The call is actually to printf()... adding the -r flag helps with that; it marks relocations. objdump -dr some_object.o gives:

...
29: c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
30: e8 fc ff ff ff call 31 <foo+0x2c>
31: R_386_PC32 printf
...

Then, I find it useful to see each line annotated as <symbol+offset>. objdump has a handy option for that, but it has the annoying side effect of turning off the dump of the actual bytes - objdump --prefix-addresses -dr some_object.o gives:

...
00000005 <foo> push %ebp
00000006 <foo+0x1> mov %esp,%ebp
00000008 <foo+0x3> push %ebx
...

But it turns out that you can undo that by providing another obscure option, finally arriving at my favourite objdump incantation:

objdump --prefix-addresses --show-raw-insn -dr file.o

which gives output like this:

...
00000005 <foo> 55 push %ebp
00000006 <foo+0x1> 89 e5 mov %esp,%ebp
00000008 <foo+0x3> 53 push %ebx
...
00000029 <foo+0x24> c7 04 24 00 00 00 00 movl $0x0,(%esp)
2c: R_386_32 .rodata.str1.1
00000030 <foo+0x2b> e8 fc ff ff ff call 00000031 <foo+0x2c>
31: R_386_PC32 printf
00000035 <foo+0x30> 89 d8 mov %ebx,%eax
...

And if you've built with debugging symbols (i.e. compiled with -g), and you replace the -dr with -Srl, it will attempt to annotate the output with the corresponding source lines.

gcc - compile, generate assembly with inline source without calling objdump

The GNU C compiler (gcc) produces assembly output if you specify the option -S during compilation. Note that this output is not like the output of objdump -S in the source code is not interspersed with the assembly. To get such output, there is currently no way around creating an object file and then disassembling it. Consider filing a bug report if you would like to have such a feature.

Getting assember output from GCC/Clang in LTO mode

For GCC just add -save-temps to linker command:

$ gcc -flto -save-temps ... *.o -o bin/libsortcheck.so
$ ls -1
...
libsortcheck.so.ltrans0.s

For Clang the situation is more complicated. In case you use GNU ld (default or -fuse-ld=ld) or Gold linker (enabled via -fuse-ld=gold), you need to run with -Wl,-plugin-opt=emit-asm:

$ clang tmp.c -flto -Wl,-plugin-opt=emit-asm -o tmp.s

For newer (11+) versions of LLD linker (enabled via -fuse-ld=lld) you can generate asm with -Wl,--lto-emit-asm.

How can I get the source lines inline with the assembly output using GCC?

Maybe debug + a post-process step?

gcc <source.c> -S -g -fverbose-asm

Find .file 1 "1.c" to tell you the file number to file name mapping.
Then add source after the .loc 1 8 0 lines. I'm not sure how to do it with a single shell command, but a short script should be able to do it:

#!/usr/bin/env python

import re
import sys

filename = sys.argv[1]

f = open(filename)
lines = f.readlines()
f.close()

FILE_RE=re.compile(r"\s+\.file (\d+) \"(.*)\"")
LOC_RE =re.compile(r"\s+\.loc (\d+) (\d+)")

output = []
files = {}
for line in lines:
output.append(line)
mo = FILE_RE.match(line)
if mo is not None:
files[mo.group(1)] = open(mo.group(2)).readlines()
print mo.group(1),"=",mo.group(2)
continue
mo = LOC_RE.match(line)
if mo is not None:
print mo.group(1),"=",mo.group(2)
source = files[mo.group(1)][int(mo.group(2))-1]
output.append("\t#"+source)

f = open(filename+".2","w")
f.writelines(output)
f.close()

How to remove noise from GCC/clang assembly output?

Stripping out the .cfi directives, unused labels, and comment lines is a solved problem: the scripts behind Matt Godbolt's compiler explorer are open source on its github project. It can even do colour highlighting to match source lines to asm lines (using the debug info).

You can set it up locally so you can feed it files that are part of your project with all the #include paths and so on (using -I/...). And so you can use it on private source code that you don't want to send out over the Internet.

Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” shows how to use it (it's pretty self-explanatory but has some neat features if you read the docs on github), and also how to read x86 asm, with a gentle introduction to x86 asm itself for total beginners, and to looking at compiler output. He goes on to show some neat compiler optimizations (e.g. for dividing by a constant), and what kind of functions give useful asm output for looking at optimized compiler output (function args, not int a = 123;).

On the Godbolt compiler explorer, it can be useful to use -g0 -fno-asynchronous-unwind-tables if you want to uncheck the filter option for directives, e.g. because you want to see the .section and .p2align stuff in the compiler output. The default is to add -g to your options to get the debug info it uses to colour-highlight matching source and asm lines, but this means .cfi directives for every stack operation, and .loc for every source line, among other things.


With plain gcc/clang (not g++), -fno-asynchronous-unwind-tables avoids .cfi directives. Possibly also useful: -fno-exceptions -fno-rtti -masm=intel. Make sure to omit -g.

Copy/paste this for local use:

g++ -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti -fverbose-asm \
-Wall -Wextra foo.cpp -O3 -masm=intel -S -o- | less

Or -Os can be more readable, e.g. using div for division by non-power-of-2 constants instead of a multiplicative inverse even though that's a lot worse for performance and only a bit smaller, if at all.


But really, I'd recommend just using Godbolt directly (online or set it up locally)! You can quickly flip between versions of gcc and clang to see if old or new compilers do something dumb. (Or what ICC does, or even what MSVC does.) There's even ARM / ARM64 gcc 6.3, and various gcc for PowerPC, MIPS, AVR, MSP430. (It can be interesting to see what happens on a machine where int is wider than a register, or isn't 32-bit. Or on a RISC vs. x86).

For C instead of C++, you can use -xc -std=gnu11 to avoid flipping the language drop-down to C, which resets your source pane and compiler choices, and has a different set of compilers available.


Useful compiler options for making asm for human consumption:

  • Remember, your code only has to compile, not link: passing a pointer to an external function like void ext(void*p) is a good way to stop something from optimizing away. You only need a prototype for it, with no definition so the compiler can't inline it or make any assumptions about what it does. (Or inline asm like Benchmark::DoNotOptimize can force a compiler to materialize a value in a register, or forget about it being a known constant, if you know GNU C inline asm syntax well enough to use constraints to understand the effect you're having on what you're requiring of the compiler.)

  • I'd recommend using -O3 -Wall -Wextra -fverbose-asm -march=haswell for looking at code. (-fverbose-asm can just make the source look noisy, though, when all you get are numbered temporaries as names for the operands.) When you're fiddling with the source to see how it changes the asm, you definitely want compiler warnings enabled. You don't want to waste time scratching your head over the asm when the explanation is that you did something that deserves a warning in the source.

  • To see how the calling convention works, you often want to look at caller and callee without inlining.

    You can use __attribute__((noipa)) foo_t foo(bar_t x) { ... } on a definition, or compile with gcc -O3 -fno-inline-functions -fno-inline-functions-called-once -fno-inline-small-functions to disable inlining. (But those command line options don't disable cloning a function for constant-propagation. noipa = no Inter-Procedural Analysis. It's even stronger than __attribute__((noinline,noclone)).) See From compiler perspective, how is reference for array dealt with, and, why passing by value(not decay) is not allowed? for an example.

    Or if you just want to see how functions pass / receive args of different types, you could use different names but the same prototype so the compiler doesn't have a definition to inline. This works with any compiler. Without a definition, a function is just a black box to the optimizer, governed only by the calling convention / ABI.

  • -ffast-math will get many libm functions to inline, some to a single instruction (esp. with SSE4 available for roundsd). Some will inline with just -fno-math-errno, or other "safer" parts of -ffast-math, without the parts that allow the compiler to round differently. If you have FP code, definitely look at it with/without -ffast-math. If you can't safely enable any of -ffast-math in your regular build, maybe you'll get an idea for a safe change you can make in the source to allow the same optimization without -ffast-math.

  • -O3 -fno-tree-vectorize will optimize without auto-vectorizing, so you can get full optimization without if you want to compare with -O2 (which doesn't enable autovectorization on gcc11 and earlier, but does on all clang).

    -Os (optimize for size and speed) can be helpful to keep the code more compact, which means less code to understand. clang's -Oz optimizes for size even when it hurts speed, even using push 1 / pop rax instead of mov eax, 1, so that's only interesting for code golf.

    Even -Og (minimal optimization) might be what you want to look at, depending on your goals. -O0 is full of store/reload noise, which makes it harder to follow, unless you use register vars. The only upside is that each C statement compiles to a separate block of instructions, and it makes -fverbose-asm able to use the actual C var names.

  • clang unrolls loops by default, so -fno-unroll-loops can be useful in complex functions. You can get a sense of "what the compiler did" without having to wade through the unrolled loops. (gcc enables -funroll-loops with -fprofile-use, but not with -O3). (This is a suggestion for human-readable code, not for code that would run faster.)

  • Definitely enable some level of optimization, unless you specifically want to know what -O0 did. Its "predictable debug behaviour" requirement makes the compiler store/reload everything between every C statement, so you can modify C variables with a debugger and even "jump" to a different source line within the same function, and have execution continue as if you did that in the C source. -O0 output is so noisy with stores/reloads (and so slow) not just from lack of optimization, but forced de-optimization to support debugging. (also related).


To get a mix of source and asm, use gcc -Wa,-adhln -c -g foo.c | less to pass extra options to as. (More discussion of this in a blog post, and another blog.). Note that the output of this isn't valid assembler input, because the C source is there directly, not as an assembler comment. So don't call it a .s. A .lst might make sense if you want to save it to a file.

Godbolt's color highlighting serves a similar purpose, and is great at helping you see when multiple non-contiguous asm instructions come from the same source line. I haven't used that gcc listing command at all, so IDK how well it does, and how easy it is for the eye to see, in that case.

I like the high code density of godbolt's asm pane, so I don't think I'd like having source lines mixed in. At least not for simple functions. Maybe with a function that was too complex to get a handle on the overall structure of what the asm does...


And remember, when you want to just look at the asm, leave out the main() and the compile-time constants. You want to see the code for dealing with a function arg in a register, not for the code after constant-propagation turns it into return 42, or at least optimizes away some stuff.

Removing static and/or inline from functions will produce a stand-alone definition for them, as well as a definition for any callers, so you can just look at that.

Don't put your code in a function called main(). gcc knows that main is special and assumes it will only be called once, so it marks it as "cold" and optimizes it less.


The other thing you can do: If you did make a main(), you can run it and use a debugger. stepi (si) steps by instruction. See the bottom of the x86 tag wiki for instructions. But remember that code might optimize away after inlining into main with compile-time-constant args.

__attribute__((noinline)) may help, on a function that you want to not be inlined. gcc will also make constant-propagation clones of functions, i.e. a special version with one of the args as a constant, for call-sites that know they're passing a constant. The symbol name will be .clone.foo.constprop_1234 or something in the asm output. You can use __attribute__((noclone)) to disable that, too.).



For example

If you want to see how the compiler multiplies two integers: I put the following code on the Godbolt compiler explorer to get the asm (from gcc -O3 -march=haswell -fverbose-asm) for the wrong way and the right way to test this.

// the wrong way, which people often write when they're used to creating a runnable test-case with a main() and a printf
// or worse, people will actually look at the asm for such a main()
int constants() { int a = 10, b = 20; return a * b; }
mov eax, 200 #,
ret # compiles the same as return 200; not interesting

// the right way: compiler doesn't know anything about the inputs
// so we get asm like what would happen when this inlines into a bigger function.
int variables(int a, int b) { return a * b; }
mov eax, edi # D.2345, a
imul eax, esi # D.2345, b
ret

(This mix of asm and C was hand-crafted by copy-pasting the asm output from godbolt into the right place. I find it's a good way to show how a short function compiles in SO answers / compiler bug reports / emails.)

Assembly code compiled from gcc not running

gcc output does not use NASM syntax. You should use as to assemble it into an object file, and then link with the standard library:

> as -o foo.o foo.s
> gcc foo.o # Link
> ./a.out
Hi!!!


Related Topics



Leave a reply



Submit