Why is the value of EDX overwritten when making call to printf?
According to the x86 ABI, EBX
, ESI
, EDI
, and EBP
are callee-save registers and EAX
, ECX
and EDX
are caller-save registers.
It means that functions can freely use and destroy previous values EAX
, ECX
, and EDX
.
For that reason, save values of EAX
, ECX
, EDX
before calling functions if you don't want their values to change. It is what "caller-save" mean.
Or better, use other registers for values that you're still going to need after a function call. push/pop of EBX
at the start/end of a function is much better than push/pop of EDX
inside a loop that makes a function call. When possible, use call-clobbered registers for temporaries that aren't needed after the call. Values that are already in memory, so they don't need to written before being re-read, are also cheaper to spill.
Since EBX
, ESI
, EDI
, and EBP
are callee-save registers, functions have to restore the values to the original for any of those they modify, before returning.
ESP
is also callee-saved, but you can't mess this up unless you copy the return address somewhere.
x86: Why does one stack-allocated array overwrite the other?
In several places there is code like this: lea rsi, [names_ptr+rbx*8]
or mov rax, [grades_ptr+rbx*8]
That doesn't indirect through a pointer in memory like you want. What it does is index relative to the address of the variable names_ptr, rather than to the pointer stored in that variable.
To fix this, you have to load the pointer into a register and then do the index operation. So you could replace the first one with something like:
mov rsi, [names_ptr]
lea rsi, [rsi+rbx*8]
Even better would be to take advantage of the registers available. Put names_ptr in r14 and grades_ptr in r15. Then you can use lea rsi, [r14+rbx*8]
without an additional load each time.
Be sure to push r14 and r15 (and also rbx) at the beginning of the function. It's not strictly necessary since you don't return from the function, but it's good habit.
Why does %rdi still have the right value, after I clobber it and call printf?
You didn't specify which x86-64 ABI you're using, but from your use of %rdi
/ %rsi
for arg passing, I'll assume you're targeting the SysV ABI (everything except windows). See the x86 wiki for links to docs and stuff.
... clobbering of return values from first two sidefx() calls... In order to fix this problem I need to store the return values either somewhere in the stack or maybe in callee-saved registers.
That's correct. gcc prefers using call-preserved regs, because then you don't have to fiddle with the stack alignment when pushing or popping between calls.
Why is the final printf returning
one: 1, two: 2147483641, three: 3
? Shouldn't the first number printed also be mangled like what happened to the second number due to the succeedingsidefx
calls?
It's just a coincidence that %rdi=1
when you call threeargs()
. If you single-step your code, you'd probably find it happens to have that value when printf
returns. It's not from saving/restoring, since the original value is destroyed by movq $.string1, %rdi
before the call to printf. It just happens that 1
is a common thing to find in a register.
Best guess: 1
is the file-descriptor arg to the write(2)
system call, which is the last thing printf
needed to do before returning. (Because stdout
is line-buffered).
Your C doesn't match your implementation. In the asm, global_a
is 8 bytes, but in C you're treating it as a 4 byte integer (printing with %d
, not %ld
). Your C doesn't declare it at all. I was going to edit in a declaration into the question, but you should resolve the ambiguity yourself (between long global_a = 0;
or int global_a = 0;
). The AMD64 SysV ABI specifies that long
is 8 bytes. Use int64_t
whenever you're writing portable C, though. There's no harm in writing int64_t
when interoperating with asm, even when you do happen to know the sizes of short
, int
and long
in the ABI you're using.
Avoid the enter
instruction, unless you only care about code size, and not speed. It's horribly slow. leave
is ok, maybe slower than mov %rbp, %rsp
/ pop %rbp
, but usually you only need pop %rbp
because you either didn't modify %rsp
, or you needed to restore rsp anyway with add $something, %rsp
before popping some other registers that you saved after %rbp
.
Zeroing 64bit registers with xor %eax,%eax
(2 bytes) has many advantages beyond code-size over mov $0, %rax
(7 bytes: mov $sign-extended-imm32, r64
).
Compare your code with compiler output: gcc -fverbose-asm -O3 -fno-inline
will actually generate code from your C; all you need is a declaration of a
, and to make main
return an int
, and it compiles just fine as C11. Of course, it mostly uses 32bit operand size because you used int
, but the data movement (which thing goes in which register) is the same.
Also, the order of evaluation of argument lists is not specified, so threeargs(sidefx(), sidefx(), sidefx())
is undefined behaviour. You have multiple expressions with side effects with no sequence points separating them. I guess this is why you called it pseudo-code, not C, but it's poor way to express what you mean.
Anyway, here's your code on the Godbolt Compiler Explorer from gcc 5.3 -O3.
threeargs
uses a jmp
to tail-call printf, instead of call/ret.
The significant differences in main
are all about correctly saving the return values from sidefx
. Note that a=0
in main is not needed, because it's already initialized to zero by being in the BSS, but with -fwhole-program
, gcc can't optimize it away. (A constructor could modify a
before main
runs, or maybe after linking a different definition of a
could be used, that has a different initializer.)
The implementation of sidefx
is noticeably tighter than yours:
sidefx:
subq $8, %rsp # aligns the stack for another function call
movl a(%rip), %eax # a, tmp94 # load `a`
movl $.LC0, %edi #, # the format string
leal 1(%rax), %esi #, D.2311 # esi = a+1
xorl %eax, %eax # # only needed because printf is a varargs function. Your `main` is doing this unnecessarily.
movl %esi, a(%rip) # D.2311, a # store back to the global
call printf #
movl a(%rip), %eax # a, # reload a
addq $8, %rsp #,
ret
IDK why gcc didn't load into %esi
in the first place, and inc %esi
instead of using lea
to add one and store in a different dest. Your version moves an immediate 1
into a register, which is silly. Use immediate operands, and lea
. The CPU designers already paid the x86 tax (extra design complexity to support the CISC instruction set), make sure you get your money's worth by taking full advantage of lea
and immediate operands.
Note that it doesn't store/reload a
before the call to printf. Your version doesn't need to do that.
Also note that none of the functions waste instructions making stack frames.
Why does the compiler make space on the stack
You should compile your (real) code with gcc -S -fverbose-asm -O
if you want to look into the generated .s
assembler file.
Notice that recent ABI and calling conventions require the stack pointer to be 16 byte aligned at least (in particular, for compatibility with AVX or SSE). Read also about the Red Zone (as suggested by Zang Ming Jie).
But why did compiler put a
subq $32, %rsp
line here? Why doesn't it appear in the first example, withoutprintf
statement?
Probably because without any calls to printf
your main
has become a leaf routine.
So the compiler don't need to update %rsp
to be ABI compliant (in the called printf
call frame).
Loop with printf in NASM
Several issues as mentioned in comments:
array resb 10
will reserve space for 10 bytes, but you want to store 10 dwords there (40 bytes). Change toarray resd 10
.(Pointed out by Sep Roland) In
_loop
you have an off-by-one bug; since theinc
is done before themov
you will access the dwords at[array+4], [array+8], ... [array+40]
, where the last one is out of range. This is like doingint array[10]; for (i=1; i <= 10; i++) array[i]=i;
in C, and is incorrect for exactly the same reason. One fix would be to domov [array + ecx * 4 - 4], ecx
instead.After
_loop2
you havejmp print
, which will transfer control toprint
and never come back. Since you apparently want to callprint
as a subroutine and continue executing withadd ecx, 1 ; cmp ecx, 10
, etc, you need tocall print
instead ofjmp
. And also uncomment theret
at the end ofprint
so that it will actually return. Subroutines in assembly language don't automatically return unless you actually executeret
; otherwise the CPU will just continue executing whatever garbage happens to be next in memory.You have a
push ecx
to save the value ofecx
before the call toprintf
, which is good sinceprintf
will overwrite that register, but you need topop ecx
afterwards to get that value back and put the stack back to where it was.Specifically, the
pop ecx
should follow theadd esp, 8
; a stack is a last-in-first-out structure, and thepush ecx
was before the pushing of theprintf
arguments, so you need topop ecx
after removing those arguments from the stack.The
mov eax, 0
as a return value at the end ofprint
is unnecessary since you never use it anywhere else.
With these changes the code works as it should.
Related Topics
Error on Execution -Version 'Qt_5' Not Found Required By
Check If a Variable Exists in a List in Bash
Randomly Shuffling Lines in Linux/Bash
How to Trim White Space from a Variable in Awk
Optimize PDF Files (With Ghostscript or Other)
Switching Users Inside Docker Image to a Non-Root User
What Is Path on a MAC (Unix) System
How to Create a Dynamic Variable and Assign Value to It
Compare Integer in Bash, Unary Operator Expected
How to Script a "Yes" Response for Installing Programs