Referencing Memory Operands in .Intel_Syntax Gnu C Inline Assembly

Referencing memory operands in .intel_syntax GNU C inline assembly

Compile with gcc -masm=intel and don't try to switch modes inside the asm template string. AFAIK there's no equivalent before clang14 (Note: MacOS installs clang as gcc / g++ by default.)

Also, of course you need to use valid GNU C inline asm, using operands to tell the compiler which C objects you want to read and write.

  • Can I use Intel syntax of x86 assembly with GCC? clang14 supports -masm=intel like GCC
  • How to set gcc to use intel syntax permanently? clang13 and earlier didn't.


I don't believe Intel syntax uses the percent sign. Perhaps I am missing something?

You're getting mixed up between %operand substitutions into the Extended-Asm template (which use a single %), vs. the final asm that the assembler sees.

You need %% to use a literal % in the final asm. You wouldn't use "mov %%eax, 1" in Intel-syntax inline asm, but you do still use "mov %0, 1" or %[named_operand].

See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html. In Basic asm (no operands), there is no substitution and % isn't special in the template, so you'd write mov $1, %eax in Basic asm vs. mov $1, %%eax in Extended, if for some reason you weren't using an operand like mov $1, %[tmp] or mov $1, %0.


uint32_t rnds_00_15; is a local with automatic storage. Of course it there's no asm symbol with that name.

Use %[rnds_00_15] and compile with -masm=intel (And remove the .att_syntax at the end; that would break the compiler-generate asm that comes after.)

You also need to remove the DWORD PTR, because the operand-expansion already includes that, e.g. DWORD PTR [rsp - 4], and clang errors on DWORD PTR DWORD PTR [rsp - 4]. (GAS accepts it just fine, but the 2nd one takes precendence so it's pointless and potentially misleading.)

And you'll want a "=m" output operand if you want the compiler to reserve you some scratch space on the stack. You must not modify input-only operands, even if it's unused in the C. Maybe the compiler decides it can overlap something else because it's not written and not initialized (i.e. UB). (I'm not sure if your "memory" clobber makes it safe, but there's no reason not to use an early-clobber output operand here.)

And you'll want to avoid label name conflicts by using %= to get a unique number.

Working example (GCC and ICC, but not clang unfortunately), on the Godbolt compiler explorer (which uses -masm=intel depending on options in the dropdown). You can use "binary mode" (the 11010 button) to prove that it actually assembles after compiling to asm without warnings.

int libtest_intel()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overridden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
( // ".intel_syntax noprefix \n\t"
"mov %[rnds_00_15], 1 \n\t"
"cmp %[rnds_00_15], 1 \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
:
: // no clobbers
);
return 0;
}

Compiles (with gcc -O3 -masm=intel) to this asm. Also works with gcc -m32 -masm=intel of course:

libtest_intel:
mov DWORD PTR [rsp-4], 1
cmp DWORD PTR [rsp-4], 1
je .Ldone8
.Ldone8:

xor eax, eax
ret

I couldn't get this to work with clang: It choked on .intel_syntax noprefix when I left that in explicitly.



Operand-size overrides:

You have to use %b[tmp] to get the compiler to substitute in BYTE PTR [rsp-4] to only access the low byte of a dword input operand. I'd recommend AT&T syntax if you want to do much of this.





Using %[rnds_00_15] results in Error: junk '(%ebp)' after expression.

That's because you switched to Intel syntax without telling the compiler. If you want it to use Intel addressing modes, compile with -masm=intel so the compiler can substitute into the template with the correct syntax.

This is why I avoid that crappy GCC inline assembly at nearly all costs. Man I despise this crappy tool.

You're just using it wrong. It's a bit cumbersome, but makes sense and mostly works well if you understand how it's designed.

Repeat after me: The compiler doesn't parse the asm string at all, except to do text substitutions of %operand. This is why it doesn't notice your .intel_syntax noprefex and keeps substituting AT&T syntax.

It does work better and more easily with AT&T syntax though, e.g. for overriding the operand-size of a memory operand, or adding an offset. (e.g. 4 + %[mem] works in AT&T syntax).



Dialect alternatives:

If you want to write inline asm that doesn't depend on -masm=intel or not, use Dialect alternatives (which makes your code super-ugly; not recommended for anything other than wrapping one or two instructions):

Also demonstrates operand-size overrides

#include <stdint.h>
int libtest_override_operand_size()
{
uint32_t rnds_00_15;
// Intel syntax operand-size can only be overriden with operand modifiers
// because the expansion includes an explicit DWORD PTR
__asm__ __volatile__
(
"{movl $1, %[rnds_00_15] | mov %[rnds_00_15], 1} \n\t"
"{cmpl $1, %[rnds_00_15] | cmp %k[rnds_00_15], 1} \n\t"
"{cmpw $1, %[rnds_00_15] | cmp %w[rnds_00_15], 1} \n\t"
"{cmpb $1, %[rnds_00_15] | cmp %b[rnds_00_15], 1} \n\t"
"je .Ldone%= \n\t"
".Ldone%=: \n\t"
: [rnds_00_15] "=&m" (rnds_00_15)
);
return 0;
}

With Intel syntax, gcc compiles it to:

     mov DWORD PTR [rsp-4], 1  
cmp DWORD PTR [rsp-4], 1
cmp WORD PTR [rsp-4], 1
cmp BYTE PTR [rsp-4], 1
je .Ldone38
.Ldone38:

xor eax, eax
ret

With AT&T syntax, compiles to:

    movl $1, -4(%rsp)   
cmpl $1, -4(%rsp)
cmpw $1, -4(%rsp)
cmpb $1, -4(%rsp)
je .Ldone38
.Ldone38:

xorl %eax, %eax
ret

How to set a variable in GCC with Intel syntax inline assembly?

You want temp to be an output, not an input, I think. Try:

  __asm__(
".intel_syntax;"
"mov eax, %1;"
"mov %0, eax;"
".att_syntax;"
: "=r"(temp)
: "r"(1)
: "eax");

How can I run C-code with assembly inserts

All I needed was just adding -masm=intel to the compilation command, and, of course, I removed -S flag (thanks to fuz). It is important because I am using Intel syntax, so gcc must be awared about it.

Now it looks like:

gcc -std=c11 -masm=intel main.c -o main

Distinguishing memory from constant in GNU as .intel_syntax

Use mov edx, OFFSET symbol to get the symbol "address" as an immediate, rather than loading from it as an address. This works for actual label addresses as well as symbols you set to an integer with .set.

For the msg address (not msg_size assemble-time constant) in 64-bit code, you may want

lea rdx, [RIP+msg] for a PIE executable where static addresses don't fit in 32 bits. How to load address of function or label into register


In GAS .intel_syntax noprefix mode:

  • OFFSET symbol works like AT&T $symbol. This is somewhat like MASM.
  • symbol works like AT&T symbol (i.e. a dereference) for unknown symbols.
  • [symbol] is always an effective-address, never an immediate, in GAS and NASM/YASM. LEA doesn't load from the address but it still uses the memory-operand machine encoding. (That's why lea uses the same syntax).


Interpretation of bare symbol depends on order of declaration

GAS is a one-pass assembler (which goes back and fills in
symbol values once they're known).

It decides on the opcode and encoding for mov rdx, symbol when it first encounters that line. An earlier msize= . - msg or .equ / .set will make it choose mov reg, imm32, but a later directive won't be visible yet.

The default assumption for not-yet-defined symbols is that symbol is an address in some section (like you get from defining it with a label like symbol:, or from .set symbol, .). And because GAS .intel_syntax is like MASM not NASM, a bare symbol is treated like [symbol] - a memory operand.

If you put a .set or msg_length=msg_end - msg directive at the top of your file, before the instructions that reference it, they would assemble to mov reg, imm32 mov-immediate. (Unlike in AT&T syntax where you always need a $ for an immediate even for numeric literals like 1234.)

For example: source and disassembly interleaved with objdump -dS:

Assembled with gcc -g -c foo.s and disassembled with objdump -drwC -S -Mintel foo.o (with as --version = GNU assembler (GNU Binutils) 2.34). We get this:

0000000000000000 <l1>:
.intel_syntax noprefix

l1:
mov eax, OFFSET equsym
0: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as a load
5: 8b 04 25 01 00 00 00 mov eax,DWORD PTR ds:0x1
mov rax, big #### 32-bit sign-extended absolute load address, even though the constant was unsigned positive
c: 48 8b 04 25 aa aa aa aa mov rax,QWORD PTR ds:0xffffffffReferencing Memory Operands in .Intel_Syntax Gnu C Inline Assembly
mov rdi, OFFSET label
14: 48 c7 c7 00 00 00 00 mov rdi,0x0 17: R_X86_64_32S .text+0x1b

000000000000001b <label>:

label:
nop
1b: 90 nop

.equ equsym, . - label # equsym = 1
big = 0xReferencing Memory Operands in .Intel_Syntax Gnu C Inline Assembly

mov eax, OFFSET equsym
1c: b8 01 00 00 00 mov eax,0x1
mov eax, equsym #### treated as an immediate
21: b8 01 00 00 00 mov eax,0x1
mov rax, big #### constant doesn't fit in 32-bit sign extended, assembler can see it when picking encoding so it picks movabs imm64
26: 48 b8 aa aa aa aa 00 00 00 00 movabs rax,0xReferencing Memory Operands in .Intel_Syntax Gnu C Inline Assembly

It's always safe to use mov edx, OFFSET msg_size to treat any symbol (or even a numeric literal) as an immediate regardless of how it was defined. So it's exactly like AT&T $ except that it's optional when GAS already knows the symbol value is just a number, not an address in some section. For consistency it's probably a good idea to always use OFFSET msg_size so your code doesn't change meaning if some future programmer moves code around so the data section and related directives are no longer first. (Including future you who's forgotten these strange details that are unlike most assemblers.)

BTW, .set is a synonym for .equ, and there's also symbol=value syntax for setting a value which is also synonymous to .set.



Operand-size: generally use 32-bit unless a value needs 64

mov rdx, OFFSET symbol will assemble to mov r/m64, sign_extended_imm32. You don't want that for a small length (vastly less than 4GiB) unless it's a negative constant, not an address. You also don't want movabs r64, imm64 for addresses; that's inefficient.

It's safe under GNU/Linux to write mov edx, OFFSET symbol in a position-dependent executable, and in fact you should always do that or use lea rdx, [rip + symbol], never sign-extended 32-bit immediate unless you're writing code that will be loaded into the high 2GB of virtual address space (e.g. a kernel). How to load address of function or label into register

See also 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIE executables being the default in modern distros.


Tip: if you know the AT&T or NASM syntax, or the NASM syntax, for something, use that to produce the encoding you want and then disassemble with objdump -Mintel to find out the right syntax for .intel_syntax noprefx.

But that doesn't help here because disassembly will just show the numeric literal like mov edx, 123, not mov edx, OFFSET name_not_in_object_file. Looking at gcc -masm=intel compiler output can also help, but again compilers do their own constant-propagation instead of using symbols for assemble-time constants.

BTW, no open-source projects that I'm aware of contain GAS intel_syntax source code. If they use gas, they use AT&T syntax. Otherwise they use NASM/YASM. (You sometimes also see MSVC inline asm in open source projects).



Same effect in AT&T syntax, or for [RIP + symbol]

This is a lot more artificial since you wouldn't normally do this with an integer constant that wasn't an address. I include it here just to show another facet of GAS's behaviour depending on a symbol being defined or not at a point during its 1 pass.

How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? - [RIP + symbol] is interpreted as using relative addressing to reach symbol, not actually adding two addresses. But [RIP + 4] is taken literally, as an offset relative to the end of this instruction.

So again, it matters what GAS knows about a symbol when it reaches an instruction that references it, because it's 1-pass. If undefined, it assumes it's a normal symbol. If defined as a numeric value with no section associated, it works like a literal number.

_start:
foo=4
jmpq *foo(%rip)
jmpq *bar(%rip)
bar=4

That assembles to the first jump being the same as jmp *4(%rip) loading a pointer from 4 bytes past the end of the current instruction. But the 2nd jump using a symbol relocation for bar, using a RIP-relative addressing mode to reach the absolute address of the symbol bar, whatever that may turn out to be.

0000000000000000 <.text>:
0: ff 25 04 00 00 00 jmp QWORD PTR [rip+0x4] # a <.text+0xa>
6: ff 25 00 00 00 00 jmp QWORD PTR [rip+0x0] # c <bar+0x8> 8: R_X86_64_PC32 *ABS*

After linking with ld foo.o, the executable has:

  401000:       ff 25 04 00 00 00       jmp    *0x4(%rip)        # 40100a <bar+0x401006>
401006: ff 25 f8 ef bf ff jmp *-0x401008(%rip) # 4 <bar>

How to set gcc or clang to use Intel syntax permanently for inline asm() statements?

Use -masm=intel and don't use any .att_syntax directives in your inline asm. This works with GCC and I think ICC, and with any constraints you use. Other methods don't. (See Can I use Intel syntax of x86 assembly with GCC? for a simple answer saying that; this answer explores exactly what goes wrong, including with clang 13 and earlier.)

That also works in clang 14 and later. (Which isn't released yet but the patch is part of current trunk; see https://reviews.llvm.org/D113707).

Clang 13 and earlier would always use AT&T syntax for inline asm, both in substituting operands and in assembling as op src, dst. But even worse, clang -masm=intel would do that even when taking the Intel side of an asm template using dialect-alternatives like asm ("add {att | intel}" : ... )`!

clang -masm=intel did still control how it printed asm after its built-in assembler turned an asm() statement into some internal representation of the instruction. e.g. Godbolt showing clang13 -masm=intel turning add %0, 1 as add dword ptr [1], eax, but clang trunk producing add eax, 1.

Some of the rest of this answer talking about clang hasn't been updated for this new clang patch.

Clang does support Intel-syntax inside MSVC-style asm-blocks, but that's terrible (no constraints so inputs / outputs have to go through memory.

If you were hard-coding register names with clang, -masm=intel would be usable (or the equivalent -mllvm --x86-asm-syntax=intel). But it chokes on mov %eax, 5 in Intel-syntax mode so you can't let %0 expand to an AT&T-syntax register name.


-masm=intel makes the compiler use .intel_syntax noprefix at the top of its asm output file, and use Intel-syntax when generating asm from C outside your inline-asm statement. Using .att_syntax at the bottom of your asm template breaks the compiler's asm, hence the error messages like PTR [rbp-4] looking like junk to the assembler (which is expecting AT&T syntax).

The "too many operands for mov" is because in AT&T syntax, mov eax, ebx is a mov from a memory operand (with symbol name eax) to a memory operand (with symbol name ebx)


Some people suggest using .intel_syntax noprefix and .att_syntax prefix around your asm template. That can sometimes work but it's problematic. And incompatible with the preferred method of -masm=intel.

Problems with the "sandwich" method:

When the compiler substitutes operands into your asm template, it will do so according to -masm=. This will always break for memory operands (the addressing-mode syntax is completely different).

It will also break with clang even for registers. Clang's built-in assembler does not accept %eax as a register name in Intel-syntax mode, and doesn't accept .intel_syntax prefix (as opposed to the noprefix that's usually used with Intel-syntax).

Consider this function:

int foo(int x) {
asm(".intel_syntax noprefix \n\t"
"add %0, 1 \n\t"
".att_syntax"
: "+r"(x)
);
return x;
}

It assembles as follows with GCC (Godbolt):

        movl    %edi, %eax
.intel_syntax noprefix
add %eax, 1 # AT&T register name in Intel syntax
.att_syntax

The sandwich method depends on GAS accepting %eax as a register name even in Intel-syntax mode. GAS from GNU Binutils does, but clang's built-in assembler doesn't.

On a Mac, even using real GCC the asm output has to assemble with an as that's based on clang, not GNU Binutils.

Using clang on that source code complains:

<source>:2:35: error: unknown token in expression
asm(".intel_syntax noprefix \n\t"
^
<inline asm>:2:6: note: instantiated into assembly here
add %eax, 1
^

(The first line of the error message didn't handle the multi-line string literal very well. If you use ; instead of \n\t and put everything on one line the clang error message works better but the source is a mess.)


I didn't check what happens with "ri" constraints when the compiler picks an immediate; it will still decorate it with $ but IDK if GAS silently ignores that, too, in Intel syntax mode.


PS: your asm statement has a bug: you forgot an early-clobber on your output operand so nothing is stopping the compiler from picking the same register for the %0 output and the %2 input that you don't read until the 2nd instruction. Then mov will destroy an input.

But using mov as the first or last instruction of an asm-template is usually also a missed-optimization bug. In this case you can and should just use lea %0, [%1 + %2] to let the compiler add with the result written to a 3rd register, non-destructively. Or just wrap the add instruction (using a "+r" operand and an "r", and let the compiler worry about data movement.) If it had to load the value from memory anyway, it can put it in the right register so no mov is needed.


PS: it's possible to write inline asm that works with -masm=intel or att, using GNU C inline asm dialect alternatives. e.g.

void atomic_inc(int *p) {
asm( "lock add{l $1, %0 | %0, 1}"
: "+m" (*p)
:: "memory"
);
}

compiles with gcc -O2 (-masm=att is the default) to

atomic_inc(int*):
lock addl $1, (%rdi)
ret

Or with -masm=intel to:

atomic_inc(int*):
lock add DWORD PTR [rdi], 1
ret

Notice that the l suffix is required for AT&T, and the dword ptr is required for intel, because memory, immediate doesn't imply an operand-size. And that the compiler filled in valid addressing-mode syntax for both cases.

This works with clang, but only the AT&T version ever gets used.

gcc inline-assembly compilation error when optimization flags is not enabled?

If you remove some of the operands, like for just a 256-bit add, you'll notice that with optimization disabled, GCC wants to put a pointer directly to each memory operand in a separate register, instead of inventing addressing modes for each of them relative to the same base. So it runs out of registers. (See the middle part of Strange 'asm' operand has impossible constraints error for compiler output that demos this.)

You might want __attribute__((optimize("-O3"))) on this function or something so it doesn't stop the rest of your program from compiling.


Also, this doesn't need a "memory" clobber; you don't write any memory, and you only read via "m" operands. It also doesn't need to be volatile: it has no side effects besides writing the "+r" in/output regs. Except for sum7, technically those should be "+&r" early-clobber operands, since you write them before reading all of the input and in-out operands, but there's basically no plausible way for GCC to overlap registers between pointers and integer in/out here.

You could also let the compiler choose "mre" instead of forcing memory source operands even if the source operand was a compile-time constant or in a register. But if that makes it generate worse asm for your actual use-case (e.g. separate mov load into regs instead of memory source for adc), then maybe just "me". (The "e" constraint is like "i" but only allowing constants that fit in a 32-bit signed integer, i.e. safe for use as an immediate with 64-bit operand-size for instructions other than mov.)


BTW, with clang (but not GCC) you don't need inline asm at all: use typedef unsigned _ExtInt(512) uint512; - see 256-bit arithmetic in Clang (extended integers)

How to use the immediate constraint with Intel syntax in gcc?

You should be able to use %c1 (see modifiers).

Note that if you are using symbolic names (which I find easier to read/maintain), you can use %c[five].

Lastly, I realize code this is just a "for-instance," but you are modifying memory without telling the compiler. This is a "bad thing." Consider either using a output constraint for the memory ("=m") or adding the "memory" clobber.



Related Topics



Leave a reply



Submit