Should %Rsp Be Aligned to 16-Byte Boundary Before Calling a Function in Nasm

Should %rsp be aligned to 16-byte boundary before calling a function in NASM?

The ABI is a set of rules for how functions should behave to be interoperable with each other. Each of the rules on one side are paired with allowed assumptions on the other. In this case, the rule about stack alignment for the caller is an allowed assumption about stack alignment for the callee. Since your inc function doesn't depend on 16-byte stack alignment, it's fine to call that particular function with a stack that's only 8-byte aligned.

If you're wondering why it didn't break when you enabled AC, that's because you're only loading 8-byte values from the stack, and the stack is still 8-byte aligned. If you did sub rsp, 4 or something to break 8-byte alignment too, then you would get a bus error.

Where the ABI becomes important is when the situation isn't one function you wrote yourself in assembly calling another function you wrote yourself in assembly. A function in someone else's library (including the C standard library), or one that you compiled from C instead of writing in assembly, is within its rights to do movaps [rsp - 24], xmm0 or something, which would break if you didn't properly align the stack before calling it.

Side note: the ABI also says how you're supposed to pass parameters (the calling convention), but you're just kind of passing them wherever. Again, fine from your own assembly, but they'll definitely break if you try to call them from C.

What size should I maintain for %rsp when alloc stack in asm, multiple of 16 or multiple of 16 plus 8


But In ABI ,it says rsp must be multiple of 16 in program entry

_start is not a function. It's not called by anything, there is no return address on the stack (just argc and the actual argv[] and envp[] arrays).

Yes, on process entry RSP is already 16-byte aligned, ready for a function call.


I edited Jester's answer again on the question you linked to clarify it.

16-byte aligned before a call is the requirement. You get back to that with an offset of 16 * n + 8 inside your function before another call, including any pushes.

Why linux nasm working even WITHOUT 16 bytes stack alignment

Just luck.

One of the main reasons for requiring stack alignment is so that functions can safely use SSE aligned data instructions such as movaps, which fault if used with unaligned data. However, if puts happens to not use any such instructions, or to do anything else which genuinely requires stack alignment, then no fault will occur (though there may still be a performance penalty).

The compiler is entitled to assume that the stack is aligned and that it can use those instructions if it feels like it. So at any time, if your libc is upgraded or recompiled, it may happen that the new version of puts uses such instructions and your code will mysteriously begin failing.

Obviously you don't want that, so align the darn stack like you're supposed to.

There are relatively few situations in C or assembly programming where violating rules like this is guaranteed to segfault or fail in any other predictable way; instead people say things like "the behavior is undefined" meaning that it could fail in any way you can imagine, or then again it might not. So you really can't draw any conclusions from an experiment where illegal code happens to appear to work. Trial and error is not a good way to learn assembly programming.

Understanding stack alignment

rsp % 16 == 0 at _start - that's the OS entry point. It's not a function (there's no return address on the stack, instead RSP points at argc).
Unlike functions, RSP is aligned by 16 on entry to _start, as specified by the x86-64 System V ABI.

From _start, you're ready to call a function right away, without having to adjust the stack, because the stack should be aligned before call. call itself will add 8B of return address, and you can expect the rsp % 16 == 8 upon entry, one more push away from 16-byte alignment. That's guaranteed upon entry to any function1.

Upon app entry, you can trust the kernel to give you 16-byte RSP alignment, or you could align the stack manually with and rsp, -16 before calling any other code conforming to ABI. (Or if you plan to use C runtime lib, then the entry point of your app code should be main, and let libc's crt startup code code run as _start. main is a normal function like any other, so RSP & 0xF == 0x8 on entry to it when it's eventually called.)

Footnote 1: Unless you build with special options that change the ABI, like -mpreferred-stack-boundary=3 instead of the default 4. But that would make it unsafe to call functions in any code compiled without that. For example glibc scanf Segmentation faults when called from a function that doesn't align RSP



Now, after pushing the content of rsp became 0x7fffffffdce8. Is it a violation of the alignment requirements?

Yes, if you would at that point call some more complex function like for example printf with non trivial arguments (so it would use SSE instruction for implementation), it will highly likely segfault.


About push byte 0xFF:

That's not legal instruction in 64b mode (not even in 16 and 32 bit modes) (not legal in the sense of byte operand target size, byte immediate as source value is legal, but operand size can be only 16, 32 or 64 bits), so the NASM will guess the target size (any from legal ones, naturally picking qword in 64b mode), and use the guessed target size with the imm8 from source.

BTW use -w+all option to make the NASM emit (sort of weird, but at least you can investigate) warning in such case:

warning: signed byte value exceeds bounds

For example legit push word 0xFF would push only two bytes to stack, of word value 0x00FF.


How to align the stack: if you already know initial alignment, just adjust as needed before calling some ABI requiring subroutine (in common 64b code that is usually as simple as either not pushing anything, or doing one more redundant push, like push rbp).

If you are not sure about alignment, use some spare register to store original rsp (often rbp is used, so it also functions as stack frame pointer), and then and rsp,-16 to clear the bottom bits.

Keep in mind, when creating your own ABI conforming subroutines, that stack was aligned before call, so it is -8B upon entry. Again simple push rbp is often enough to resolve several issues at the same time, preserving rbp value (so mov rbp, rsp is possible "for free") and aligning stack for rest of subroutine.


EDIT: about encoding, source size, and immediate size...

Unfortunately I'm not 100% sure about how exactly this is supposed to be defined in NASM, but I think actually the push definition is so complex, that it breaks NASM syntax a bit (exhausting the current syntax to a point where you can't specify whether you mean operand size, or source immediate size, so it is silently assumed the size specifier is operand size mainly and affects immediate in certain cases).

By using push byte 0xFF the NASM will take the byte part ALSO as "operand size", not just as immediate size. And byte is not legal operand size for push, so NASM will instead choose qword as by default in 64b mode. Then it will also consider the byte as immediate size, and sign-extend the 0xFF to qword. I.e. this looks to me as a bit of undefined behaviour. NASM creators probably don't expect you to specify immediate size, because the NASM optimizes for size, so when you do push word -1, it will assemble that as "push word operand imm8". You can override that the other way, to make sure you get imm16 by push strict word -1.

See the machine code produced by the various combinations (in 64b mode) (some of them speaking strictly are worth at least of warning, or even error, like "strict qword" producing only imm32, not imm64 (as imm64 opcode does not exist of course) ... not even mentioning that the dword variants are effectively qword operand sizes, you can't use 32b operand size in 64b mode):

 6 00000000 6AFF                            push    -1
7 00000002 6AFF push strict byte 0xFF
8 ****************** warning: signed byte value exceeds bounds
9 00000004 6AFF push byte 0xFF
10 ****************** warning: signed byte value exceeds bounds
11 00000006 6AFF push strict byte -1
12 00000008 6AFF push byte -1
13 0000000A 6668FF00 push strict word 0xFF
14 0000000E 6668FF00 push word 0xFF
15 00000012 6668FFFF push strict word -1
16 00000016 666AFF push word -1
17 00000019 68FF000000 push strict dword 0xFF
18 0000001E 68FF000000 push dword 0xFF
19 00000023 68FFFFFFFF push strict dword -1
20 00000028 6AFF push dword -1
21 0000002A 68FF000000 push strict qword 0xFF
22 0000002F 68FF000000 push qword 0xFF
23 00000034 68FFFFFFFF push strict qword -1
24 00000039 6AFF push qword -1

Anyway, I guess not too many people are bothered by this, as in 64b mode you usually want qword push (rsp -= 8) with immediate encoded in shortest possible way, so you just write push -1 and let the NASM handle the imm8 optimization itself, expecting rsp to change by -8 of course. And in other case, they probably expect you to know legal operand sizes, and not to use byte at all.

If you think this is not acceptable, I would raise this on the NASM forum/bugzilla/somewhere, how it is supposed to work exactly. As far as I'm personally concerned, the current behaviour is "good enough" for me (makes both sense, plus I give quick look to listing file from time to time to verify there's no nasty surprise in the machine code bytes and it landed as expected). That said, I mostly code size intros, so I know about every byte produced and it's purpose. If the NASM would suddenly produce imm16 instead of expected imm8, I would see it on the binary size and investigate.

After entering _start, is rsp aligned?

The x86-64 ABI explicitly says (3.4.1 Initial Stack and Register State) :

%rsp The stack pointer holds the address of the byte with lowest
address which is part of the stack. It is guaranteed to be 16-byte
aligned at process entry.

Since _start is the first symbol that's called when a process is entered, you can be entirely sure that it is 16-byte aligned when the OS calls _start in your executable.



Related Topics



Leave a reply



Submit