What is the meaning of each line of the assembly output of a C hello world?
Here how it goes:
.file "test.c"
The original source file name (used by debuggers).
.section .rodata
.LC0:
.string "Hello world!"
A zero-terminated string is included in the section ".rodata" ("ro" means "read-only": the application will be able to read the data, but any attempt at writing into it will trigger an exception).
.text
Now we write things into the ".text" section, which is where code goes.
.globl main
.type main, @function
main:
We define a function called "main" and globally visible (other object files will be able to invoke it).
leal 4(%esp), %ecx
We store in register %ecx
the value 4+%esp
(%esp
is the stack pointer).
andl $-16, %esp
%esp
is slightly modified so that it becomes a multiple of 16. For some data types (the floating-point format corresponding to C's double
and long double
), performance is better when the memory accesses are at addresses which are multiple of 16. This is not really needed here, but when used without the optimization flag (-O2
...), the compiler tends to produce quite a lot of generic useless code (i.e. code which could be useful in some cases but not here).
pushl -4(%ecx)
This one is a bit weird: at that point, the word at address -4(%ecx)
is the word which was on top of the stack prior to the andl
. The code retrieves that word (which should be the return address, by the way) and pushes it again. This kind of emulates what would be obtained with a call from a function which had a 16-byte aligned stack. My guess is that this push
is a remnant of an argument-copying sequence. Since the function has adjusted the stack pointer, it must copy the function arguments, which were accessible through the old value of the stack pointer. Here, there is no argument, except the function return address. Note that this word will not be used (yet again, this is code without optimization).
pushl %ebp
movl %esp, %ebp
This is the standard function prologue: we save %ebp
(since we are about to modify it), then set %ebp
to point to the stack frame. Thereafter, %ebp
will be used to access the function arguments, making %esp
free again. (Yes, there is no argument, so this is useless for that function.)
pushl %ecx
We save %ecx
(we will need it at function exit, to restore %esp
at the value it had before the andl
).
subl $20, %esp
We reserve 32 bytes on the stack (remember that the stack grows "down"). That space will be used to storea the arguments to printf()
(that's overkill, since there is a single argument, which will use 4 bytes [that's a pointer]).
movl $.LC0, (%esp)
call printf
We "push" the argument to printf()
(i.e. we make sure that %esp
points to a word which contains the argument, here $.LC0
, which is the address of the constant string in the rodata section). Then we call printf()
.
addl $20, %esp
When printf()
returns, we remove the space allocated for the arguments. This addl
cancels what the subl
above did.
popl %ecx
We recover %ecx
(pushed above); printf()
may have modified it (the call conventions describe which register can a function modify without restoring them upon exit; %ecx
is one such register).
popl %ebp
Function epilogue: this restores %ebp
(corresponding to the pushl %ebp
above).
leal -4(%ecx), %esp
We restore %esp
to its initial value. The effect of this opcode is to store in %esp
the value %ecx-4
. %ecx
was set in the first function opcode. This cancels any alteration to %esp
, including the andl
.
ret
Function exit.
.size main, .-main
This sets the size of the main()
function: at any point during assembly, ".
" is an alias for "the address at which we are adding things right now". If another instruction was added here, it would go at the address specified by ".
". Thus, ".-main
", here, is the exact size of the code of the function main()
. The .size
directive instructs the assembler to write that information in the object file.
.ident "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"
GCC just loves to leave traces of its action. This string ends up as a kind of comment in the object file. The linker will remove it.
.section .note.GNU-stack,"",@progbits
A special section where GCC writes that the code can accommodate a non-executable stack. This is the normal case. Executable stacks are needed for some special usages (not standard C). On modern processors, the kernel can make a non-executable stack (a stack which triggers an exception if someone tries to execute as code some data which is on the stack); this is viewed by some people as a "security feature" because putting code on the stack is a common way to exploit buffer overflows. With this section, the executable will be marked as "compatible with a non-executable stack" which the kernel will happily provide as such.
Understanding Assembly Hello World
I'm trying to understand what each line does.
That would fall under the general category of learning assembly language. There are entire books written about this topic; some of them are probably even pretty good. You should purchase one. To ensure that you get maximum bang for your buck, be sure to select one that focuses on the architecture and operating system you're interested in. x86 assembly language is, of course, always the same, but the programming model differs enough between Windows and Linux that the differences would be confusing to a beginner.
If you're too cheap to buy a book, at least read Matt Pietrek's classic series of articles, "Just Enough Assembly To Get By", from the Microsoft System Journal. Start here, and proceed to the follow-up.
The first line is
push ebp
. I knowebp
stands for base pointer. What is its function?I see that in the second line the value in
esp
is moved intoebp
and searching online I see that there first 2 instructions are very common at the beginning of an assembly program.I'm new to assembly. Is
ebp
used for stack frames, so when we have a function in our code and is it optional for a simple program?
To understand this first line in isolation, you just need to know what a PUSH
instruction does. It pushes the operand (in this case, a register) onto the top of the stack. EBP
is the register that almost always contains the stack base pointer.
That doesn't tell you much about the purpose of this code, though. This line and the next one are part of the standard function prologue. Matt talks about that near the beginning of his very first article, in the "Procedure Entry and Exit" section. First, the stack base pointer from EBP
is saved by PUSH
ing it onto the stack. Then, the second instruction copies the value of ESP
into the EBP
register. This makes interacting with the stack throughout the function easier. Generally, the prologue section would end with an instruction that reserved an arbitrary amount of space on the stack for temporary variables (e.g., sub esp, 8
to reserve 8 bytes on the stack). This function doesn't need any.
Yes, this prologue code is optional. If you don't need any stack space and/or you use EBP
-relative addressing, then you don't need the standard prologue. Optimizing compilers often omit it when possible.
Though are
ebp
andesp
empty at the beginning?
No, of course they are not empty. If they were empty, the code wouldn't bother to save the value of EBP
or use the value of ESP
.
In fact, no registers are empty at the beginning of a function. They contain either the values that the function's prototype (in conjunction with its calling convention) says that they do, they contain values that you must preserve (that is, they must still have the same values when your function returns control that they did when your function was first called; these are called caller-save registers, and which ones they are differ depending on the calling convention), or they contain what you can assume to be garbage values (these are the callee-save registers and you are free to clobber them in the callee function's code).
Then
push offset aHelloWorld; "Hello world\n"
The part after
;
is a comment so it doesn't get executed right? The first part instead adds the address containing the string Hello World to the stack, right? But where is the string declared? I'm not sure I understand.
aHelloWorld
is a piece of global data declared in the executable image. It was put there at link time, probably because the original code used a string literal. This instruction PUSH
es the offset
of that global data (that is, its address) onto the stack.
Yes, the part after the semicolon is a comma. The disassembler is adding this comment as a favor to you. It has looked up the value of aHelloWorld
, determined that it contains the string Hello world\n
, and placed that definition in-line, saving you from having to look up the data's value yourself.
Then
call ds:__imp__printf
it seems it's a call to a function, anyway
printf
is a builtin function right?
Yes, CALL
always calls a function. In this case, it is calling the printf
function. Is it a "built-in" function? That depends on your definition. From the perspective of assembly language, no: no function is built-in. printf
is a function provided by the C standard library. When the original code was compiled and linked, it was also linked with the C run-time library, which provides the C standard library functions, including printf
. Since this is MSVC, the __imp__
prefix is a big hint that the function being called is part of either the standard library or the Windows API. These are implicitly linked functions.
Looking up the printf
function shows that it takes a variable number of arguments. In the most common x86-32 calling conventions, these arguments are passed on the stack. So that explains why the previous instruction PUSH
ed the address of string data onto the stack: it's passing that address to the printf
function so that string can be printed to the standard output. It could have passed additional arguments to printf
, but it didn't, because it didn't need to: it just needed one to print a literal string.
And does
ds
stand for data segment register? Is it used because we are trying to access a memory operand that isn't on the stack?
Yes, DS is the data segment. Your disassembler is just being verbose here. In Windows, x86-32 uses a flat memory model, so you can basically ignore the segment registers entirely and still understand everything that is going on perfectly well.
then
add esp, 4
do we add 4 bytes to esp? Why?
Yes, this adds 4 bytes to the ESP
register. Why? To clean up the stack. Recall that before CALL
ing the printf
function, you PUSH
ed a 4-byte value (the offset of the string data in the executable image) on the stack. The printf
function is variadic (takes a variable number of arguments), so the caller is always responsible for cleaning up the stack after calling it.
Here, you can think of adding 4 to ESP
is equivalent to popping the stack with a POP
instruction. On x86, the stack always grows downwards, so adding is equivalent to popping (and the inverse of pushing).
then
move eax, 1234h
what is 1234h here?
This instruction MOV
es the constant value 0x1234
(the h
means hexadecimal) into the EAX
register.
Why? Well, I can guess. In all of the x86 calling conventions, the EAX
register contains a function's return value. So it is very likely that the function's original code ended with return 0x1234;
.
then
pop ebx
..it was pushed at the beginning. is it necessary to pop it at the end?
Actually, it pops EBP
, which is what was actually pushed at the beginning of the function.
And yes. Everything that you PUSH
onto the stack has to be POP
ed off the stack. (Or equivalent, as we saw earlier with ADD
ing to ESP
.) You have to clean up the stack. This is the function epilogue that corresponds to the prologue that we saw at the beginning. Refer back to Matt's article, where it talks about "Procedure Entry and Exit".
then
retn
( i knew aboutret
for returning a value after calling a function). I read that the n in retn refers to the number of pushed arguments by the caller.
This is just an idiosyncracy of your disassembler again. IDA Pro uses the retn
mnemonic. This actually means a near return, but since x86-32 uses a flat (non-segmented) memory model, the near vs. far distinction is not relevant. You can think of retn
as simply being equivalent to ret
.
Note that this is distinct from the ret
instruction that takes an argument, which is what you're thinking of. It doesn't "return" its argument, though. The function returns its result in the EAX
register. Rather, ret n
(where n
is 16-byte immediate value) returns and pops the specified number of bytes off the stack. This is used only for certain calling conventions (most commonly __stdcall
) where the callee is responsible for cleaning up the stack.
See links in the x86 tag wiki and Wikipedia for more information on calling conventions.
It isn't very clear for me.
Can you help me to understand?
Did I mention you should get a book that teaches assembly language programming?
Number of executed Instructions different for Hello World program Nasm Assembly and C
The number of instructions executed in program 1) is high because of linking the program with system library's at runtime?
Yep, dynamic linking plus CRT (C runtime) startup files.
used
-static
and which reduces the count by a factor of 1/10.
So that just left the CRT start files, which do stuff before calling main
, and after.
How can I ensure that the instruction count is only that of the main function in Program 1)`
Measure an empty main
, then subtract that number from future measurements.
Unless your instruction-counters is smarter, and looks at symbols in the executable for the process it's tracing, it won't be able to tell which code came from where.
and which is how Program 2) is reporting for the debugger.
That's because there is no other code in that program. It's not that you somehow helped the debugger ignore some instructions, it's that you made a program without any instructions you didn't put there yourself.
If you want to see what actually happens when you run the gcc output, gdb a.out
, b _start
, r
, and single-step. Once you get deep in the call tree, you're prob. going to want to use fin
to finish execution of the current function, since you don't want to single-step through literally 1 million instructions, or even 10k.
related: How do I determine the number of x86 machine instructions executed in a C program? shows perf stat
will count 3 user-space instructions total in a NASM program that does mov eax, 231
/ syscall
, linked into a static executable.
What is function in assembly?
ELF symbol metadata can be set by some assemblers, e.g. in NASM, global main:function
to mark the symbol type as FUNC. (https://nasm.us/doc/nasmdoc8.html#section-8.9.5).
The GAS syntax equivalent (which C compilers emit) is .type main, function
. e.g. put some code on https://godbolt.org and disable filtering to see asm directives in compiler output.
But note this is just metadata for linkers and debuggers to use; the CPU doesn't see that when executing. That's why nobody bothers with it for NASM examples.
Assembly language doesn't truly have functions, just the tools to implement that concept, e.g. jump and store a return address somewhere = call
, indirect jump to a return address = ret
. On x86, return addresses are pushed and popped on the stack.
The model of execution is purely sequential and local, one instruction at a time (on most ISAs, but some ISAs are VLIW and execute 3 at a time for example, but still local in scope), with each instruction just making a well-defined change to the architectural state. The CPU itself doesn't know or care that it's "in a function" or anything about nesting, other than the return-address predictor stack which optimistically assumes that ret
will actually use a return address pushed by a corresponding call
. But that's a performance optimization; you do sometimes get mismatched call/ret if code is doing something weird (e.g. a context switch).
A C compiler won't put any instructions outside of functions.
Technically the _start
entry point that indirectly calls main
isn't a function; it can't return and has to make an exit
system call, but that's written in asm and is part of libc. It's not generated by the C compiler proper, only linked with the C compiler's output to make a working program.) See Linux x86 Program Start Up
or - How the heck do we get to main()? for example.
How do you get assembler output from C/C++ source in GCC?
Use the -S option to gcc
(or g++
), optionally with -fverbose-asm which works well at the default -O0 to attach C names to asm operands as comments. It works less well at any optimization level, which you normally want to use to get asm worth looking at.
gcc -S helloworld.c
This will run the preprocessor (cpp) over helloworld.c, perform the initial compilation and then stop before the assembler is run. For useful compiler options to use in that case, see How to remove "noise" from GCC/clang assembly output? (or just look at your code on Matt Godbolt's online Compiler Explorer which filters out directives and stuff, and has highlighting to match up source lines with asm using debug information.)
By default, this will output the file helloworld.s
. The output file can be still be set by using the -o option, including -o -
to write to standard output for pipe into less.
gcc -S -o my_asm_output.s helloworld.c
Of course, this only works if you have the original source.
An alternative if you only have the resultant object file is to use objdump, by setting the --disassemble
option (or -d
for the abbreviated form).
objdump -S --disassemble helloworld > helloworld.dump
-S
interleaves source lines with normal disassembly output, so this option works best if debugging option is enabled for the object file (-g at compilation time) and the file hasn't been stripped.
Running file helloworld
will give you some indication as to the level of detail that you will get by using objdump.
Other useful objdump
options include -rwC
(to show symbol relocations, disable line-wrapping of long machine code, and demangle C++ names). And if you don't like AT&T syntax for x86, -Mintel
. See the man page.
So for example, objdump -drwC -Mintel -S foo.o | less
.-r
is very important with a .o
that only has 00 00 00 00
placeholders for symbol references, as opposed to a linked executable.
Printing strings in Assembly, calling that function in C
As was cleared up in comments, any interaction between the world outside and your code is done through system calls. C stdio functions format text into an output buffer, then write it with write(2)
. Or read(2)
into an input buffer, and scanf or read lines from that.
Writing in asm doesn't mean you should avoid libc
functions when they're useful, e.g. printf/scanf. Usually it only makes sense to write small parts of a program in asm for speed. e.g. write one function that has a hot loop in asm, and call it from C or whatever other language. Doing the I/O with all the necessary error-checking of system call return values would not be very fun in asm. If you're curious what happens under the hood, read the compiler output and/or single-step the asm. You'll sometimes learn nice tricks from the compiler, and sometimes you'll see it generate less efficient code than you could have written by hand.
This is a problem:
mov rax,4 ; System call number (sys_write)
int 0x80 ; Call kernel
Although 64bit processes can use the i386 int 0x80
system call ABI, it is the 32bit ABI, with only 32bit pointers and so on. You will have a problem as soon as you go to write(2)
a char array that's on the stack (since amd64 Linux processes start with a stack pointer that has the high bits set. Heap memory, and .data
and .rodata
memory mapped from the executable are mapped into the lower 32b of address space.)
The native amd64 ABI uses syscall
, and the system call numbers aren't the same as the i386 ABI. I found this table of syscalls listing the number and which parameter goes in which register. sys/syscall.h
eventually includes /usr/include/x86_64-linux-gnu/asm/unistd_64.h
to get the actual #define __NR_write 1
macros, and so on. There are standard rules for mapping arguments in order to registers. (Given in the ABI doc, IIRC).
Related Topics
Split Files Using Tar, Gz, Zip, or Bzip2
How to Check If There Are Symbolic Links Pointing to a Directory
How to Install Node Binary Distribution Files on Linux
How to Remove Duplicate Words from a Plain Text File Using Linux Command
Can't Change Tomcat 7 Heap Size
Run Matlab in Linux Without Graphical Environment
Shared Library in Fortran, Minimal Example Does Not Work
All Newlines Are Removed When Saving Cat Output into a Variable
On Linux: How to Programmatically Determine If a Nic Interface Is Enabled and Plugged In
Cross Compiling for Mips Router from X86
Linux How to Copy But Not Overwrite
Where Are All My Inodes Being Used
Is There a Limit on Number of Tcp/Ip Connections Between MAChines on Linux
Bluetooth Low Energy: Use Bluez Stack as a Peripheral (With Custom Services and Characteristics)
Sharing One Port Among Multiple Node.Js Http Processes