X64 Bit Assembly

Where to learn x64 assembly from?

I think this is a valid question. It can be a bit hard to find up to date information on assembler.

  1. Yes. A lot of resources printed and online describe i386 (x86) assembler and not the new amd64 (x86_64). Many things have changed, e.g. function arguments used to be passed on the stack but now they are passed in registers. This applies both to Unix and Windows.

    Basically, ensure you are reading about 64-bit assembly.

  2. Why not try the online manual. Note that assemblers are not that different when it comes to simple stuff.

  3. I don't have Windows but if you are having trouble with any specific program, you could ask about it here. You will need to post the actual failing program.

  4. Yes but you have to link it against 32-bit libraries and use the 32-bit version of the Win32 API. But if you're starting out, why program in the 1980's when you can work with the modern instruction set?

  5. Try this.

Also, if you are going to be programming in assembly, I strongly recommend you get a debugger. I use GDB which works well and is free, on Windows you get to use the Visual Studio debugger which is superb.

Basics of Assembly programming in 64-bit NASM programming

  1. Zero.
  2. 10 is not the size of msg. Those are just embedded line feeds. The size is calculated as len equ $-msg and thus will always match the length of the text you provide.
  3. The sign bit is the most significant bit according to the size. Since you have qwords, 11234H is 0000000000011234H and is positive. FFFFFFFFH is a large positive number. FFFFFFFFFFFFFFFFH could be an even larger number or -1, depending on whether you interpret it as unsigned or signed.
  4. Nothing, it's just 1. It's just adding one to the value in rsi. That will mean 1 byte when used as an address later.
  5. That's just because your terminal uses ascii codes and hence will print 0 for a value of 30h (and so on). Yes, you will need to convert from ascii to binary if you read text.
  6. That is correct.
  7. You described it correctly, but that is not reversed. Humans start with the most significant digit first. So it makes sense that the code puts that first and increments rsi to put the other digits after it. Not sure why you think that is reversed.
  8. Because print is invoked for each array element separately, hence shortening the output applies to each element not to the whole output. It will be the higher digits of course, since that's how they are in memory. See point 7, above.

Is x86 32-bit assembly code valid x86 64-bit assembly code?

A modern x86 CPU has three main operation modes (this description is simplified):

  • In real mode, the CPU executes 16 bit code with paging and segmentation disabled. Memory addresses in your code refer to phyiscal addresses, the content of segment registers is shifted and added to the address to form an effective address.
  • In protected mode, the CPU executes 16 bit or 32 bit code depending on the segment selector in the CS (code segment) register. Segmentation is enabled, paging can (and usually is) enabled. Programs can switch between 16 bit and 32 bit code by far jumping to an appropriate segment. The CPU can enter the submode virtual 8086 mode to emulate real mode for individual processes from inside a protected mode operating system.
  • In long mode, the CPU executes 64 bit code. Segmentation is mostly disabled, paging is enabled. The CPU can enter the sub-mode compatibility mode to execute 16 bit and 32 bit protected mode code from within an operating system written for long mode. Compatibility mode is entered by far-jumping to a CS selector with the appropriate bits set. Virtual 8086 mode is unavailable.

Wikipedia has a nice table of x86-64 operating modes including legacy and real modes, and all 3 sub-modes of long mode. Under a mainstream x86-64 OS, after booting the CPU cores will always all be in long mode, switching between different sub-modes depending on 32 or 64-bit user-space. (Not counting System Management Mode interrupts...)


Now what is the difference between 16 bit, 32 bit, and 64 bit mode?

16-bit and 32-bit mode are basically the same thing except for the following differences:

  • In 16 bit mode, the default address and operand width is 16 bit. You can change these to 32 bit for a single instruction using the 0x67 and 0x66 prefixes, respectively. In 32 bit mode, it's the other way round.
  • In 16 bit mode, the instruction pointer is truncated to 16 bit, jumping to addresses higher than 65536 can lead to weird results.
  • VEX/EVEX encoded instructions (including those of the AVX, AVX2, BMI, BMI2 and AVX512 instruction sets) aren't decoded in real or Virtual 8086 mode (though they are available in 16 bit protected mode).
  • 16 bit mode has fewer addressing modes than 32 bit mode, though it is possible to override to a 32 bit addressing mode on a per-instruction basis if the need arises.

Now, 64 bit mode is a somewhat different. Most instructions behave just like in 32 bit mode with the following differences:

  • There are eight additional registers named r8, r9, ..., r15. Each register can be used as a byte, word, dword, or qword register. The family of REX prefixes (0x40 to 0x4f) encode whether an operand refers to an old or new register. Eight additional SSE/AVX registers xmm8, xmm9, ..., xmm15 are also available.
  • you can only push/pop 64 bit and 16 bit quantities (though you shouldn't do the latter), 32 bit quantities cannot be pushed/popped.
  • The single-byte inc reg and dec reg instructions are unavailable, their instruction space has been repurposed for the REX prefixes. Two-byte inc r/m and dec r/m is still available, so inc reg and dec reg can still be encoded.
  • A new instruction-pointer relative addressing mode exists, using the shorter of the 2 redundant ways 32-bit mode had to encode a [disp32] absolute address.
  • The default address width is 64 bit, a 32 bit address width can be selected through the 0x67 prefix. 16 bit addressing is unavailable.
  • The default operand width is 32 bit. A width of 16 bit can be selected through the 0x66 prefix, a 64 bit width can be selected through an appropriate REX prefix independently of which registers you use.
  • It is not possible to use ah, bh, ch, and dh in an instruction that requires a REX prefix. A REX prefix causes those register numbers to mean instead the low 8 bits of registers si, di, sp, and bp.
  • writing to the low 32 bits of a 64 bit register clears the upper 32 bit, avoiding false dependencies for out-of-order exec. (Writing 8 or 16-bit partial registers still merges with the 64-bit old value.)
  • as segmentation is nonfunctional, segment overrides are meaningless no-ops except for the fs and gs overrides (0x64, 0x65) which serve to support thread-local storage (TLS).
  • also, many instructions that specifically deal with segmentation are unavailable. These are: push/pop seg (except push/pop fs/gs), arpl, call far (only the 0xff encoding is valid), les, lds, jmp far (only the 0xff encoding is valid),
  • instructions that deal with decimal arithmetic are unavailable, these are: daa, das, aaa, aas, aam, aad,
  • additionally, the following instructions are unavailable: bound (rarely used), pusha/popa (not useful with the additional registers), salc (undocumented),
  • the 0x82 instruction alias for 0x80 is invalid.
  • on early amd64 CPUs, lahf and sahf are unavailable.

And that's basically all of it!

Assembly registers in 64-bit architecture

With the old names all registers remain the same size, just like when x86-16 was extended to x86-32. To access 64-bit registers you use the new names with R-prefix such as rax, rbx...

Register names don't change so you just use the byte registers (al, bl, cl, dl, ah, bh, ch, dh) for the LSB and MSB of ax, bx, cx, dx like before.

There are also 8 new registers called r8-r15. You can access their LSBs by adding the suffix b (or l if you're using AMD). For example r8b, r9b... You can also use the LSB of esi, edi, esp, ebp by the names sil, dil, spl, bpl with the new REX prefix, but you cannot use it at the same time with ah, bh, ch or dh.

Likewise the new registers' lowest word or double word can be accessed through the suffix w or d.













































































































64-bit registerLower 32 bitsLower 16 bitsLower 8 bits
raxeaxaxal
rbxebxbxbl
rcxecxcxcl
rdxedxdxdl
rsiesisisil
rdiedididil
rbpebpbpbpl
rspespspspl
r8r8dr8wr8b (r8l)
r9r9dr9wr9b (r9l)
r10r10dr10wr10b (r10l)
r11r11dr11wr11b (r11l)
r12r12dr12wr12b (r12l)
r13r13dr13wr13b (r13l)
r14r14dr14wr14b (r14l)
r15r15dr15wr15b (r15l)

Basic input with x64 assembly code

In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).

So check a Linux x64 SYS_CALL list for the appropriate parameters and try

_start:
mov rax, 0 ; set SYS_READ as SYS_CALL value
sub rsp, 8 ; allocate 8-byte space on the stack as read buffer
mov rdi, 0 ; set rdi to 0 to indicate a STDIN file descriptor
lea rsi, [rsp] ; set const char *buf to the 8-byte space on stack
mov rdx, 1 ; set size_t count to 1 for one char
syscall

Moving 64bit constants to memory

Looks like you didn't check for asmjit errors. The docs say there's a
kErrorInvalidImmediate - Invalid immediate (out of bounds on X86 and invalid pattern on ARM).

The only x86-64 instruction that can use a 64-bit immediate is mov-immediate to register, the special no-modrm opcode that gives us 5-byte mov eax, 12345, or 10-byte mov rax, 0x0123456789abcdef, where a REX.W prefix changes that opcode to look for a 64-bit immediate. See https://www.felixcloutier.com/x86/mov / why we can't move a 64-bit immediate value to memory?


Your title is a red herring. It's nothing to do with having an m64 operand for and, it's the constant that's the problem. You can verify that by single-stepping the asm with a debugger and checking both operands before the and, including the one in memory. (It's probably -1 from 0xFFFFFFFF as an immediate for mov m64, sign_extended_imm32, which would explain AND not changing the value in R14).

Also disassembly of the JITed machine code should show you what immediate is actually encoded; again a debugger could provide that as you single-step through it.


Use your temporary register for the constant (like mov r14, 0xFFFFFFFFFFFF), then and reg,mem to load-and-mask.

Or better, if the target machine you're JITint for has BMI1 andn, construct the inverted constant once outside a loop with mov r13, ~0xFFFFFFFFFFFF then inside the loop use andn r14, r13, [r15+32] which does a load+and without destroying the mask, all with one instructions which can decode to a single uop on Intel/AMD CPUs.

Of if you can't reuse a constant register over a loop, maybe mov reg,imm64, then push reg or mov mem,reg and use that in future AND instructions. Or emit some constant data somewhere near enough to reference with a RIP-relative addressing mode, although that takes a bit more code-size at every and instruction. (ModRM + 4 byte rel32, vs. ModRM + SIB + 0 or 1 bytes for data on the stack close to RSP).


BTW, if you're just truncating instead of sign-extending, you're also assuming this is address is in the low half of virtual address space (i.e. user-space). That's fine, though. Fun fact: future x86 CPUs (first Sapphire Rapids) will have an optional feature that OSes can enable to transparently ignore the high bits, except for the MSB: LAM = Linear Address Masking. See Intel's future-extensions manual.

So if this feature is enabled with 48-bit masking for user-space, you can skip the AND masking entirely. (If your code makes sure bit 47 matches bit 63; you might want to keep the top bit unmodified or 0 so your code can take advantage of LAM when available to save instructions).


If you were masking to keep the low 32, you could just mov r14d, [r15+32] to zero-extend the low dword of the value into 64-bit R14. But for keeping the high 48 or 57 bits, you need a mask or BMI2 bzhi with 48 in a register.



Related Topics



Leave a reply



Submit