What Is the Meaning of Each Line of the Assembly Output of a C Hello World

What is the meaning of each line of the assembly output of a C hello world?

Here how it goes:

        .file   "test.c"

The original source file name (used by debuggers).

        .section        .rodata
.LC0:
        .string "Hello world!"

A zero-terminated string is included in the section ".rodata" ("ro" means "read-only": the application will be able to read the data, but any attempt at writing into it will trigger an exception).

        .text

Now we write things into the ".text" section, which is where code goes.

.globl main
        .type   main, @function
main:

We define a function called "main" and globally visible (other object files will be able to invoke it).

        leal    4(%esp), %ecx

We store in register %ecx the value 4+%esp (%esp is the stack pointer).

        andl    $-16, %esp

%esp is slightly modified so that it becomes a multiple of 16. For some data types (the floating-point format corresponding to C's double and long double), performance is better when the memory accesses are at addresses which are multiple of 16. This is not really needed here, but when used without the optimization flag (-O2...), the compiler tends to produce quite a lot of generic useless code (i.e. code which could be useful in some cases but not here).

        pushl   -4(%ecx)

This one is a bit weird: at that point, the word at address -4(%ecx) is the word which was on top of the stack prior to the andl. The code retrieves that word (which should be the return address, by the way) and pushes it again. This kind of emulates what would be obtained with a call from a function which had a 16-byte aligned stack. My guess is that this push is a remnant of an argument-copying sequence. Since the function has adjusted the stack pointer, it must copy the function arguments, which were accessible through the old value of the stack pointer. Here, there is no argument, except the function return address. Note that this word will not be used (yet again, this is code without optimization).

        pushl   %ebp
        movl    %esp, %ebp

This is the standard function prologue: we save %ebp (since we are about to modify it), then set %ebp to point to the stack frame. Thereafter, %ebp will be used to access the function arguments, making %esp free again. (Yes, there is no argument, so this is useless for that function.)

        pushl   %ecx

We save %ecx (we will need it at function exit, to restore %esp at the value it had before the andl).

        subl    $20, %esp

We reserve 32 bytes on the stack (remember that the stack grows "down"). That space will be used to storea the arguments to printf() (that's overkill, since there is a single argument, which will use 4 bytes [that's a pointer]).

        movl    $.LC0, (%esp)
        call    printf

We "push" the argument to printf() (i.e. we make sure that %esp points to a word which contains the argument, here $.LC0, which is the address of the constant string in the rodata section). Then we call printf().

        addl    $20, %esp

When printf() returns, we remove the space allocated for the arguments. This addl cancels what the subl above did.

        popl    %ecx

We recover %ecx (pushed above); printf() may have modified it (the call conventions describe which register can a function modify without restoring them upon exit; %ecx is one such register).

        popl    %ebp

Function epilogue: this restores %ebp (corresponding to the pushl %ebp above).

        leal    -4(%ecx), %esp

We restore %esp to its initial value. The effect of this opcode is to store in %esp the value %ecx-4. %ecx was set in the first function opcode. This cancels any alteration to %esp, including the andl.

ret

Function exit.

        .size   main, .-main

This sets the size of the main() function: at any point during assembly, "." is an alias for "the address at which we are adding things right now". If another instruction was added here, it would go at the address specified by ".". Thus, ".-main", here, is the exact size of the code of the function main(). The .size directive instructs the assembler to write that information in the object file.

        .ident  "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"

GCC just loves to leave traces of its action. This string ends up as a kind of comment in the object file. The linker will remove it.

        .section        .note.GNU-stack,"",@progbits

A special section where GCC writes that the code can accommodate a non-executable stack. This is the normal case. Executable stacks are needed for some special usages (not standard C). On modern processors, the kernel can make a non-executable stack (a stack which triggers an exception if someone tries to execute as code some data which is on the stack); this is viewed by some people as a "security feature" because putting code on the stack is a common way to exploit buffer overflows. With this section, the executable will be marked as "compatible with a non-executable stack" which the kernel will happily provide as such.

Understanding Assembly Hello World

I'm trying to understand what each line does.

That would fall under the general category of learning assembly language. There are entire books written about this topic; some of them are probably even pretty good. You should purchase one. To ensure that you get maximum bang for your buck, be sure to select one that focuses on the architecture and operating system you're interested in. x86 assembly language is, of course, always the same, but the programming model differs enough between Windows and Linux that the differences would be confusing to a beginner.

If you're too cheap to buy a book, at least read Matt Pietrek's classic series of articles, "Just Enough Assembly To Get By", from the Microsoft System Journal. Start here, and proceed to the follow-up.

The first line is push ebp. I know ebp stands for base pointer. What is its function?
I see that in the second line the value in esp is moved into ebp and searching online I see that there first 2 instructions are very common at the beginning of an assembly program.
I'm new to assembly. Is ebp used for stack frames, so when we have a function in our code and is it optional for a simple program?

To understand this first line in isolation, you just need to know what a PUSH instruction does. It pushes the operand (in this case, a register) onto the top of the stack. EBP is the register that almost always contains the stack base pointer.

That doesn't tell you much about the purpose of this code, though. This line and the next one are part of the standard function prologue. Matt talks about that near the beginning of his very first article, in the "Procedure Entry and Exit" section. First, the stack base pointer from EBP is saved by PUSHing it onto the stack. Then, the second instruction copies the value of ESP into the EBP register. This makes interacting with the stack throughout the function easier. Generally, the prologue section would end with an instruction that reserved an arbitrary amount of space on the stack for temporary variables (e.g., sub esp, 8 to reserve 8 bytes on the stack). This function doesn't need any.

Yes, this prologue code is optional. If you don't need any stack space and/or you use EBP-relative addressing, then you don't need the standard prologue. Optimizing compilers often omit it when possible.

Though are ebp and esp empty at the beginning?

No, of course they are not empty. If they were empty, the code wouldn't bother to save the value of EBP or use the value of ESP.

In fact, no registers are empty at the beginning of a function. They contain either the values that the function's prototype (in conjunction with its calling convention) says that they do, they contain values that you must preserve (that is, they must still have the same values when your function returns control that they did when your function was first called; these are called caller-save registers, and which ones they are differ depending on the calling convention), or they contain what you can assume to be garbage values (these are the callee-save registers and you are free to clobber them in the callee function's code).

Then push offset aHelloWorld; "Hello world\n"
The part after ; is a comment so it doesn't get executed right? The first part instead adds the address containing the string Hello World to the stack, right? But where is the string declared? I'm not sure I understand.

aHelloWorld is a piece of global data declared in the executable image. It was put there at link time, probably because the original code used a string literal. This instruction PUSHes the offset of that global data (that is, its address) onto the stack.

Yes, the part after the semicolon is a comma. The disassembler is adding this comment as a favor to you. It has looked up the value of aHelloWorld, determined that it contains the string Hello world\n, and placed that definition in-line, saving you from having to look up the data's value yourself.

Then call ds:__imp__printf
it seems it's a call to a function, anyway printf is a builtin function right?

Yes, CALL always calls a function. In this case, it is calling the printf function. Is it a "built-in" function? That depends on your definition. From the perspective of assembly language, no: no function is built-in. printf is a function provided by the C standard library. When the original code was compiled and linked, it was also linked with the C run-time library, which provides the C standard library functions, including printf. Since this is MSVC, the __imp__ prefix is a big hint that the function being called is part of either the standard library or the Windows API. These are implicitly linked functions.

Looking up the printf function shows that it takes a variable number of arguments. In the most common x86-32 calling conventions, these arguments are passed on the stack. So that explains why the previous instruction PUSHed the address of string data onto the stack: it's passing that address to the printf function so that string can be printed to the standard output. It could have passed additional arguments to printf, but it didn't, because it didn't need to: it just needed one to print a literal string.

And does ds stand for data segment register? Is it used because we are trying to access a memory operand that isn't on the stack?

Yes, DS is the data segment. Your disassembler is just being verbose here. In Windows, x86-32 uses a flat memory model, so you can basically ignore the segment registers entirely and still understand everything that is going on perfectly well.

then add esp, 4
do we add 4 bytes to esp? Why?

Yes, this adds 4 bytes to the ESP register. Why? To clean up the stack. Recall that before CALLing the printf function, you PUSHed a 4-byte value (the offset of the string data in the executable image) on the stack. The printf function is variadic (takes a variable number of arguments), so the caller is always responsible for cleaning up the stack after calling it.

Here, you can think of adding 4 to ESP is equivalent to popping the stack with a POP instruction. On x86, the stack always grows downwards, so adding is equivalent to popping (and the inverse of pushing).

then move eax, 1234h what is 1234h here?

This instruction MOVes the constant value 0x1234 (the h means hexadecimal) into the EAX register.

Why? Well, I can guess. In all of the x86 calling conventions, the EAX register contains a function's return value. So it is very likely that the function's original code ended with return 0x1234;.

then pop ebx..it was pushed at the beginning. is it necessary to pop it at the end?

Actually, it pops EBP, which is what was actually pushed at the beginning of the function.

And yes. Everything that you PUSH onto the stack has to be POPed off the stack. (Or equivalent, as we saw earlier with ADDing to ESP.) You have to clean up the stack. This is the function epilogue that corresponds to the prologue that we saw at the beginning. Refer back to Matt's article, where it talks about "Procedure Entry and Exit".

then retn ( i knew about ret for returning a value after calling a function). I read that the n in retn refers to the number of pushed arguments by the caller.

This is just an idiosyncracy of your disassembler again. IDA Pro uses the retn mnemonic. This actually means a near return, but since x86-32 uses a flat (non-segmented) memory model, the near vs. far distinction is not relevant. You can think of retn as simply being equivalent to ret.

Note that this is distinct from the ret instruction that takes an argument, which is what you're thinking of. It doesn't "return" its argument, though. The function returns its result in the EAX register. Rather, ret n (where n is 16-byte immediate value) returns and pops the specified number of bytes off the stack. This is used only for certain calling conventions (most commonly __stdcall) where the callee is responsible for cleaning up the stack.

See links in the x86 tag wiki and Wikipedia for more information on calling conventions.

It isn't very clear for me.
Can you help me to understand?

Did I mention you should get a book that teaches assembly language programming?

Number of executed Instructions different for Hello World program Nasm Assembly and C

The number of instructions executed in program 1) is high because of linking the program with system library's at runtime?

Yep, dynamic linking plus CRT (C runtime) startup files.

used -static and which reduces the count by a factor of 1/10.

So that just left the CRT start files, which do stuff before calling main, and after.

How can I ensure that the instruction count is only that of the main function in Program 1)`

Measure an empty main, then subtract that number from future measurements.

Unless your instruction-counters is smarter, and looks at symbols in the executable for the process it's tracing, it won't be able to tell which code came from where.

and which is how Program 2) is reporting for the debugger.

That's because there is no other code in that program. It's not that you somehow helped the debugger ignore some instructions, it's that you made a program without any instructions you didn't put there yourself.

If you want to see what actually happens when you run the gcc output, gdb a.out, b _start, r, and single-step. Once you get deep in the call tree, you're prob. going to want to use fin to finish execution of the current function, since you don't want to single-step through literally 1 million instructions, or even 10k.

related: How do I determine the number of x86 machine instructions executed in a C program? shows perf stat will count 3 user-space instructions total in a NASM program that does mov eax, 231 / syscall, linked into a static executable.

What is function in assembly?

ELF symbol metadata can be set by some assemblers, e.g. in NASM, global main:function to mark the symbol type as FUNC. (https://nasm.us/doc/nasmdoc8.html#section-8.9.5).

The GAS syntax equivalent (which C compilers emit) is .type main, function. e.g. put some code on https://godbolt.org and disable filtering to see asm directives in compiler output.

But note this is just metadata for linkers and debuggers to use; the CPU doesn't see that when executing. That's why nobody bothers with it for NASM examples.

Assembly language doesn't truly have functions, just the tools to implement that concept, e.g. jump and store a return address somewhere = call, indirect jump to a return address = ret. On x86, return addresses are pushed and popped on the stack.

The model of execution is purely sequential and local, one instruction at a time (on most ISAs, but some ISAs are VLIW and execute 3 at a time for example, but still local in scope), with each instruction just making a well-defined change to the architectural state. The CPU itself doesn't know or care that it's "in a function" or anything about nesting, other than the return-address predictor stack which optimistically assumes that ret will actually use a return address pushed by a corresponding call. But that's a performance optimization; you do sometimes get mismatched call/ret if code is doing something weird (e.g. a context switch).

A C compiler won't put any instructions outside of functions.

Technically the _start entry point that indirectly calls main isn't a function; it can't return and has to make an exit system call, but that's written in asm and is part of libc. It's not generated by the C compiler proper, only linked with the C compiler's output to make a working program.) See Linux x86 Program Start Up
or - How the heck do we get to main()? for example.

How do you get assembler output from C/C++ source in GCC?

Use the -S option to gcc (or g++), optionally with -fverbose-asm which works well at the default -O0 to attach C names to asm operands as comments. It works less well at any optimization level, which you normally want to use to get asm worth looking at.

gcc -S helloworld.c

This will run the preprocessor (cpp) over helloworld.c, perform the initial compilation and then stop before the assembler is run. For useful compiler options to use in that case, see How to remove "noise" from GCC/clang assembly output? (or just look at your code on Matt Godbolt's online Compiler Explorer which filters out directives and stuff, and has highlighting to match up source lines with asm using debug information.)

By default, this will output the file helloworld.s. The output file can be still be set by using the -o option, including -o - to write to standard output for pipe into less.

gcc -S -o my_asm_output.s helloworld.c

Of course, this only works if you have the original source.
An alternative if you only have the resultant object file is to use objdump, by setting the --disassemble option (or -d for the abbreviated form).

objdump -S --disassemble helloworld > helloworld.dump

-S interleaves source lines with normal disassembly output, so this option works best if debugging option is enabled for the object file (-g at compilation time) and the file hasn't been stripped.

Running file helloworld will give you some indication as to the level of detail that you will get by using objdump.

Other useful objdump options include -rwC (to show symbol relocations, disable line-wrapping of long machine code, and demangle C++ names). And if you don't like AT&T syntax for x86, -Mintel. See the man page.

So for example, objdump -drwC -Mintel -S foo.o | less.
-r is very important with a .o that only has 00 00 00 00 placeholders for symbol references, as opposed to a linked executable.

Printing strings in Assembly, calling that function in C

As was cleared up in comments, any interaction between the world outside and your code is done through system calls. C stdio functions format text into an output buffer, then write it with write(2). Or read(2) into an input buffer, and scanf or read lines from that.

Writing in asm doesn't mean you should avoid libc functions when they're useful, e.g. printf/scanf. Usually it only makes sense to write small parts of a program in asm for speed. e.g. write one function that has a hot loop in asm, and call it from C or whatever other language. Doing the I/O with all the necessary error-checking of system call return values would not be very fun in asm. If you're curious what happens under the hood, read the compiler output and/or single-step the asm. You'll sometimes learn nice tricks from the compiler, and sometimes you'll see it generate less efficient code than you could have written by hand.

This is a problem:

mov     rax,4       ; System call number (sys_write)
int     0x80        ; Call kernel

Although 64bit processes can use the i386 int 0x80 system call ABI, it is the 32bit ABI, with only 32bit pointers and so on. You will have a problem as soon as you go to write(2) a char array that's on the stack (since amd64 Linux processes start with a stack pointer that has the high bits set. Heap memory, and .data and .rodata memory mapped from the executable are mapped into the lower 32b of address space.)

The native amd64 ABI uses syscall, and the system call numbers aren't the same as the i386 ABI. I found this table of syscalls listing the number and which parameter goes in which register. sys/syscall.h eventually includes /usr/include/x86_64-linux-gnu/asm/unistd_64.h to get the actual #define __NR_write 1 macros, and so on. There are standard rules for mapping arguments in order to registers. (Given in the ABI doc, IIRC).

What Is the Meaning of Each Line of the Assembly Output of a C Hello World