Limitations of Intel Assembly Syntax Compared to At&T

Limitations of Intel Assembly Syntax Compared to AT&T

There is really no advantage to one over the other. I agree though that Intel syntax is much easier to read. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.

It looks like you can make GDB use Intel syntax with this:


set disassembly-flavor intel

GCC can do Intel syntax with -masm=intel.

NASM (Intel) versus AT&T Syntax: what are the advantages?


  • Focusing on which of these syntax is a better idea?

Depends on your projects. Not every compiler does allow both kinds of syntax. If you need to have to code assembled on other platforms Intel is probably better, even after some years experience with both I personally like Intel more, but it`s a small difference, and it doesn't really matter to me.

  • What are the advantages and disadvantages of these syntax?

In the AT&T syntax there are slightly to much %, even more if you need to use macros. OTOH I prefer the source, destination ordering, but thats personal taste, others may prefer it the other way around because it resembles the writing order of an assignment operator in say C (and many more).

I INTEL syntax the are obscenities like DWORD PTR, where in AT&T a small appended l is sufficient. The exact spelling of the mnemonics differs in many cases, I find the AT&T more logical while of course Intels way is the standard. The addressing modes in Intel are somewhat more readable.

  • Which one is more widely used and understood?

I believe AT&T is more used, because of the ubiquitousness of linux on embedded platforms, where assembler is much more often used that in other software projects. There are more assemblers which understand Intels syntax, that much is true, but I believe gcc/gas is more used in a field where assembler matters/is useful .

What was the original reason for the design of AT&T assembly syntax?

UNIX was for a long time developed on the PDP-11, a 16 bit computer from DEC, which had a fairly simple instruction set. Nearly every instruction has two operands, each of which can have one of the following eight addressing modes, here shown in the MACRO 16 assembly language:

0n  Rn        register
1n (Rn) deferred
2n (Rn)+ autoincrement
3n @(Rn)+ autoincrement deferred
4n -(Rn) autodecrement
5n @-(Rn) autodecrement deferred
6n X(Rn) index
7n @X(Rn) index deferred

Immediates and direct addresses can be encoded by cleverly re-using some addressing modes on R7, the program counter:

27  #imm      immediate
37 @#imm absolute
67 addr relative
77 @addr relative deferred

As the UNIX tty driver used @ and # as control characters, $ was substituted for # and * for @.

The first operand in a PDP11 instruction word refers to the source operand while the second operand refers to the destination. This is reflected in the assembly language's operand order which is source, then destination. For example, the opcode

011273

refers to the instruction

mov (R2),R3

which moves the word pointed to by R2 to R3.

This syntax was adapted to the 8086 CPU and its addressing modes:

mr0 X(bx,si)  bx + si indexed
mr1 X(bx,di) bx + di indexed
mr2 X(bp,si) bp + si indexed
mr3 X(bp,di) bp + di indexed
mr4 X(si) si indexed
mr5 X(di) di indexed
mr6 X(bp) bp indexed
mr7 X(bx) bx indexed
3rR R register
0r6 addr direct

Where m is 0 if there is no index, m is 1 if there is a one-byte index, m is 2 if there is a two-byte index and m is 3 if instead of a memory operand, a register is used. If two operands exist, the other operand is always a register and encoded in the r digit. Otherwise, r encodes another three bits of the opcode.

Immediates aren't possible in this addressing scheme, all instructions that take immediates encode that fact in their opcode. Immediates are spelled $imm just like in the PDP-11 syntax.

While Intel always used a dst, src operand ordering for its assembler, there was no particularly compelling reason to adapt this convention and the UNIX assembler was written to use the src, dst operand ordering known from the PDP11.

They made some inconsistencies with this ordering in their implementation of the 8087 floating point instructions, possibly because Intel gave the two possible directions of non-commutative floating point instructions different mnemonics which do not match the operand ordering used by AT&T's syntax.

The PDP11 instructions jmp (jump) and jsr (jump to subroutine) jump to the address of their operand. Thus, jmp foo would jump to foo and jmp *foo would jump to the address stored in the variable foo, similar to how lea works in the 8086.

The syntax for the x86's jmp and call instructions was designed as if these instructions worked like on the PDP11, which is why jmp foo jumps to foo and jmp *foo jumps to the value at address foo, even though the 8086 doesn't actually have deferred addressing. This has the advantage and convenience of syntactically distinguishing direct jumps from indirect jumps without requiring an $ prefix for every direct jump target but doesn't make a lot of sense logically.

The syntax was expanded to specify segment prefixes using a colon:

seg:addr

When the 80386 was introduced, this scheme was adapted to its new SIB addressing modes using a four-part generic addressing mode:

disp(base,index,scale)

where disp is a displacement, base is a base register, index an index register and scale is 1, 2, 4, or 8 to scale the index register by one of these amounts. This is equal to Intel syntax:

[disp+base+index*scale]

Another remarkable feature of the PDP-11 is that most instructions are available in a byte and a word variant. Which one you use is indicated by a b or w suffix to the opcode, which directly toggles the first bit of the opcode:

 010001   movw r0,r1
110001 movb r0,r1

this also was adapted for AT&T syntax as most 8086 instructions are indeed also available in a byte mode and a word mode. Later the 80386 and AMD K6 introduced 32 bit instructions (suffixed l for long) and 64 bit instructions (suffixed q for quad).

Last but not least, originally the convention was to prefix C language symbols with an underscore (as is still done on Windows) so you can distinguish a C function named ax from the register ax. When Unix System Laboratories developed the ELF binary format, they decided to get rid of this decoration. As there is no way to distinguish a direct address from a register otherwise, a % prefix was added to every register:

mov direct,%eax # move memory at direct to %eax

And that's how we got today's AT&T syntax.

Intel Assembly ljmp syntax from AT&T syntax

You can use jmp 0x08, start32

For some reason, jmp 0x8:start32 only works after .intel_syntax noprefix, even with command line args that should be equivalent. This is the syntax used by Binutils objdump -d -Mintel -mi8086, e.g. ea 16 00 08 00 jmp 0x8:0x16 so it's probably a GAS bug that it's not accepted sometimes.


I edited your question to create a small reproducible example with as 2.35.1 (which I have on Arch GNU/Linux) based on your comments replying to Jester. I included command line options: I assume you must have been using those because there's no .intel_syntax noprefix directive in your file.

That seems to be the problem: -msyntax=intel -mnaked-reg makes other Intel syntax things work, like xor ax,ax, but does not make jmp 0x8:start32 work (or other ways of writing it). Only a .intel_syntax noprefix1 directive makes that syntax for far jmp work.

# .intel_syntax noprefix        # rely on command line options to set this
.code16
xor ax, ax # verify that command-line setting of intel_syntax worked, otherwise this line errors.

ljmp 0x8, start32 # Working before or after a syntax directive, but is basically AT&T syntax
# jmp 0x8:start32 # fails here, works after a directive
jmp 0x8, start32 # Michael Petch's suggested syntax that's still somewhat AT&Tish. works with just cmdline opts.

.att_syntax
ljmp $0x8, $start32 # working everywhere, even with clang
.intel_syntax noprefix
jmp 0x8:start32 # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive

.code32
start32:
nop

I verified that -msyntax=intel -mnaked-reg work for other instructions where their effect is necessary: movzx ax, al works. But without -mnaked-reg we'd get "too many memory references" because "ax" and "al" would be taken as symbol names. Without or "operand size mismatch" without -msyntax=intel.

A GAS listing from as -32 -msyntax=intel -mmnemonic=intel -mnaked-reg -o foo.o foo.s -al --listing-lhs-width=2 --listing-rhs-width=140

(I'm pretty sure -mmnemonic=intel is irrelevant, and implied by syntax=intel.)

Note that you can see which instructions worked because they have machine code, and which didn't (the first jmp 0x8:start32) because the left-hand column is empty for it. The very first column would normally be addresses, but is ???? because assembly failed. (Because I uncommented the jmp 0x8:start32 to show it failing the first time, working the 2nd time.)

foo.s: Assembler messages:
foo.s:6: Error: junk `:start32' after expression
GAS LISTING foo.s page 1


1 # .intel_syntax noprefix # rely on command line options to set this
2 .code16
3 ???? 0FB6C0 movzx ax, al # verify that command-line setting of intel_syntax worked, otherwise this line errors.
4
5 ???? EA170008 00 ljmp 0x8, start32 # Working before or after a syntax directive, but is basically AT&T syntax
6 jmp 0x8:start32 # fails here, works after a directive
7 ???? EA170008 00 jmp 0x8, start32 # Michael Petch's suggested syntax that's still somewhat AT&Tish. works with just cmdline opts.
8
9 .att_syntax
10 ???? EA170008 00 ljmp $0x8, $start32 # working everywhere, even with clang
11 .intel_syntax noprefix
12 ???? EA170008 00 jmp 0x8:start32 # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
13
14 .code32
15 start32:
16 ???? 90 nop
17

(GAS does listing field widths for the left column in "words", which apparently means 32-bit chunks. That's why the 00 most-significant byte of the segment selector is separated by a space.)

Putting a label before the jmp 0x8:label didn't help; it's not an issue of forward vs. backward reference. Even jmp 0x8:23 fails to assemble.


Syntax "recommended" by disassemblers, from a working build:

objdump -drwC -Mintel -mi8086 foo.o :

foo.o:     file format elf32-i386

Disassembly of section .text:

00000000 <start32-0x17>:
0: 0f b6 c0 movzx ax,al
3: ea 17 00 08 00 jmp 0x8:0x17 4: R_386_16 .text
8: ea 17 00 08 00 jmp 0x8:0x17 9: R_386_16 .text
d: ea 17 00 08 00 jmp 0x8:0x17 e: R_386_16 .text
12: ea 17 00 08 00 jmp 0x8:0x17 13: R_386_16 .text

00000017 <start32>:
17: 90 nop

llvm-objdump --mattr=+16bit-mode --x86-asm-syntax=intel -d foo.o :

00000000 <.text>:
0: 0f b6 c0 movzx ax, al
3: ea 17 00 08 00 ljmp 8, 23
8: ea 17 00 08 00 ljmp 8, 23
d: ea 17 00 08 00 ljmp 8, 23
12: ea 17 00 08 00 ljmp 8, 23

00000017 <start32>:
17: 90 nop

And BTW, I didn't get clang 11.0 to assemble any Intel-syntax versions of this with a symbol name. ljmp 8, 12 assembles with clang, but not even ljmp 8, start32. Only by switching to AT&T syntax and back could I get clang's built-in assembler (clang -m32 -masm=intel -c) to emit a 16-bit mode far jmp.

.att_syntax
ljmp $0x8, $start32 # working everywhere, even with clang
.intel_syntax noprefix

Keep in mind this direct form of far JMP is not available in 64-bit mode; perhaps that's why LLVM's built-in assembler appears to have spent less effort on it.


Footnote 1: Actually .intel_syntax prefix works, too, but never use that. Nobody want to see the franken-monster that is mov %eax, [%eax], or especially add %edx, %eax that's using dst, src order, but with AT&T decorated register names.

Can I use Intel syntax of x86 assembly with GCC?

If you are using separate assembly files, gas has a directive to support Intel syntax:

.intel_syntax noprefix      # not recommended for inline asm

which uses Intel syntax and doesn't need the % prefix before register names.

(You can also run as with -msyntax=intel -mnaked-reg to have that as the default instead of att, in case you don't want to put .intel_syntax noprefix at the top of your files.)



Inline asm: compile with -masm=intel

For inline assembly, you can compile your C/C++ sources with gcc -masm=intel (See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8] instead of 8(%rdi).

This works with GCC itself and ICC, but for clang only clang 14 and later.

(Not released yet, but the patch is in current trunk.)


Using .intel_syntax noprefix at the start of inline asm, and switching back with .att_syntax can work, but will break if you use any m constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax as a register name even in intel-noprefix mode.

Using .att_syntax at the end of an asm() statement will also break compilation with -masm=intel; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as to be assembled separately.)

Related:

  • GCC manual: asm dialect alternatives: writing an asm statement with {att | intel} in the template so it works when compiled with -masm=att or -masm=intel. See an example using lock cmpxchg.
  • https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
  • AT&T syntax: https://stackoverflow.com/tags/att/info
  • Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
  • The x86 tag wiki has links to manuals, optimization guides, and tutorials.

Commenting syntax for x86 AT&T syntax assembly

Comments for at&t assembler are:

 # this is a comment
/* this is a comment */

According to the fourth result Google gave me

// and /* */ comments are only supported in .S files because GCC runs the C preprocessor on them before assembling. For .s files, the actual assembler itself (as) only handles # as a comment character, for x86.

For some other ISAs, GAS uses other comment characters, for example @ for ARM.



Related Topics



Leave a reply



Submit