Limitations of Intel Assembly Syntax Compared to AT&T
There is really no advantage to one over the other. I agree though that Intel syntax is much easier to read. Keep in mind that, AFAIK, all GNU tools have the option to use Intel syntax also.
It looks like you can make GDB use Intel syntax with this:
set disassembly-flavor intel
GCC can do Intel syntax with -masm=intel
.
NASM (Intel) versus AT&T Syntax: what are the advantages?
- Focusing on which of these syntax is a better idea?
Depends on your projects. Not every compiler does allow both kinds of syntax. If you need to have to code assembled on other platforms Intel is probably better, even after some years experience with both I personally like Intel more, but it`s a small difference, and it doesn't really matter to me.
- What are the advantages and disadvantages of these syntax?
In the AT&T syntax there are slightly to much %
, even more if you need to use macros. OTOH I prefer the source, destination ordering, but thats personal taste, others may prefer it the other way around because it resembles the writing order of an assignment operator in say C (and many more).
I INTEL syntax the are obscenities like DWORD PTR
, where in AT&T a small appended l
is sufficient. The exact spelling of the mnemonics differs in many cases, I find the AT&T more logical while of course Intels way is the standard. The addressing modes in Intel are somewhat more readable.
- Which one is more widely used and understood?
I believe AT&T is more used, because of the ubiquitousness of linux on embedded platforms, where assembler is much more often used that in other software projects. There are more assemblers which understand Intels syntax, that much is true, but I believe gcc/gas is more used in a field where assembler matters/is useful .
What was the original reason for the design of AT&T assembly syntax?
UNIX was for a long time developed on the PDP-11, a 16 bit computer from DEC, which had a fairly simple instruction set. Nearly every instruction has two operands, each of which can have one of the following eight addressing modes, here shown in the MACRO 16 assembly language:
0n Rn register
1n (Rn) deferred
2n (Rn)+ autoincrement
3n @(Rn)+ autoincrement deferred
4n -(Rn) autodecrement
5n @-(Rn) autodecrement deferred
6n X(Rn) index
7n @X(Rn) index deferred
Immediates and direct addresses can be encoded by cleverly re-using some addressing modes on R7, the program counter:
27 #imm immediate
37 @#imm absolute
67 addr relative
77 @addr relative deferred
As the UNIX tty driver used @
and #
as control characters, $
was substituted for #
and *
for @
.
The first operand in a PDP11 instruction word refers to the source operand while the second operand refers to the destination. This is reflected in the assembly language's operand order which is source, then destination. For example, the opcode
011273
refers to the instruction
mov (R2),R3
which moves the word pointed to by R2
to R3
.
This syntax was adapted to the 8086 CPU and its addressing modes:
mr0 X(bx,si) bx + si indexed
mr1 X(bx,di) bx + di indexed
mr2 X(bp,si) bp + si indexed
mr3 X(bp,di) bp + di indexed
mr4 X(si) si indexed
mr5 X(di) di indexed
mr6 X(bp) bp indexed
mr7 X(bx) bx indexed
3rR R register
0r6 addr direct
Where m
is 0 if there is no index, m
is 1 if there is a one-byte index, m
is 2 if there is a two-byte index and m
is 3 if instead of a memory operand, a register is used. If two operands exist, the other operand is always a register and encoded in the r
digit. Otherwise, r
encodes another three bits of the opcode.
Immediates aren't possible in this addressing scheme, all instructions that take immediates encode that fact in their opcode. Immediates are spelled $imm
just like in the PDP-11 syntax.
While Intel always used a dst, src
operand ordering for its assembler, there was no particularly compelling reason to adapt this convention and the UNIX assembler was written to use the src, dst
operand ordering known from the PDP11.
They made some inconsistencies with this ordering in their implementation of the 8087 floating point instructions, possibly because Intel gave the two possible directions of non-commutative floating point instructions different mnemonics which do not match the operand ordering used by AT&T's syntax.
The PDP11 instructions jmp
(jump) and jsr
(jump to subroutine) jump to the address of their operand. Thus, jmp foo
would jump to foo
and jmp *foo
would jump to the address stored in the variable foo
, similar to how lea
works in the 8086.
The syntax for the x86's jmp
and call
instructions was designed as if these instructions worked like on the PDP11, which is why jmp foo
jumps to foo
and jmp *foo
jumps to the value at address foo
, even though the 8086 doesn't actually have deferred addressing. This has the advantage and convenience of syntactically distinguishing direct jumps from indirect jumps without requiring an $
prefix for every direct jump target but doesn't make a lot of sense logically.
The syntax was expanded to specify segment prefixes using a colon:
seg:addr
When the 80386 was introduced, this scheme was adapted to its new SIB addressing modes using a four-part generic addressing mode:
disp(base,index,scale)
where disp
is a displacement, base is a base register, index
an index register and scale
is 1, 2, 4, or 8 to scale the index register by one of these amounts. This is equal to Intel syntax:
[disp+base+index*scale]
Another remarkable feature of the PDP-11 is that most instructions are available in a byte and a word variant. Which one you use is indicated by a b
or w
suffix to the opcode, which directly toggles the first bit of the opcode:
010001 movw r0,r1
110001 movb r0,r1
this also was adapted for AT&T syntax as most 8086 instructions are indeed also available in a byte mode and a word mode. Later the 80386 and AMD K6 introduced 32 bit instructions (suffixed l
for long
) and 64 bit instructions (suffixed q
for quad).
Last but not least, originally the convention was to prefix C language symbols with an underscore (as is still done on Windows) so you can distinguish a C function named ax
from the register ax
. When Unix System Laboratories developed the ELF binary format, they decided to get rid of this decoration. As there is no way to distinguish a direct address from a register otherwise, a %
prefix was added to every register:
mov direct,%eax # move memory at direct to %eax
And that's how we got today's AT&T syntax.
Intel Assembly ljmp syntax from AT&T syntax
You can use jmp 0x08, start32
For some reason, jmp 0x8:start32
only works after .intel_syntax noprefix
, even with command line args that should be equivalent. This is the syntax used by Binutils objdump -d -Mintel -mi8086
, e.g. ea 16 00 08 00 jmp 0x8:0x16
so it's probably a GAS bug that it's not accepted sometimes.
I edited your question to create a small reproducible example with as
2.35.1 (which I have on Arch GNU/Linux) based on your comments replying to Jester. I included command line options: I assume you must have been using those because there's no .intel_syntax noprefix
directive in your file.
That seems to be the problem: -msyntax=intel -mnaked-reg
makes other Intel syntax things work, like xor ax,ax
, but does not make jmp 0x8:start32
work (or other ways of writing it). Only a .intel_syntax noprefix
1 directive makes that syntax for far jmp work.
# .intel_syntax noprefix # rely on command line options to set this
.code16
xor ax, ax # verify that command-line setting of intel_syntax worked, otherwise this line errors.
ljmp 0x8, start32 # Working before or after a syntax directive, but is basically AT&T syntax
# jmp 0x8:start32 # fails here, works after a directive
jmp 0x8, start32 # Michael Petch's suggested syntax that's still somewhat AT&Tish. works with just cmdline opts.
.att_syntax
ljmp $0x8, $start32 # working everywhere, even with clang
.intel_syntax noprefix
jmp 0x8:start32 # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
.code32
start32:
nop
I verified that -msyntax=intel -mnaked-reg
work for other instructions where their effect is necessary: movzx ax, al
works. But without -mnaked-reg
we'd get "too many memory references" because "ax" and "al" would be taken as symbol names. Without or "operand size mismatch" without -msyntax=intel
.
A GAS listing from as -32 -msyntax=intel -mmnemonic=intel -mnaked-reg -o foo.o foo.s -al --listing-lhs-width=2 --listing-rhs-width=140
(I'm pretty sure -mmnemonic=intel
is irrelevant, and implied by syntax=intel.)
Note that you can see which instructions worked because they have machine code, and which didn't (the first jmp 0x8:start32
) because the left-hand column is empty for it. The very first column would normally be addresses, but is ???? because assembly failed. (Because I uncommented the jmp 0x8:start32
to show it failing the first time, working the 2nd time.)
foo.s: Assembler messages:
foo.s:6: Error: junk `:start32' after expression
GAS LISTING foo.s page 1
1 # .intel_syntax noprefix # rely on command line options to set this
2 .code16
3 ???? 0FB6C0 movzx ax, al # verify that command-line setting of intel_syntax worked, otherwise this line errors.
4
5 ???? EA170008 00 ljmp 0x8, start32 # Working before or after a syntax directive, but is basically AT&T syntax
6 jmp 0x8:start32 # fails here, works after a directive
7 ???? EA170008 00 jmp 0x8, start32 # Michael Petch's suggested syntax that's still somewhat AT&Tish. works with just cmdline opts.
8
9 .att_syntax
10 ???? EA170008 00 ljmp $0x8, $start32 # working everywhere, even with clang
11 .intel_syntax noprefix
12 ???? EA170008 00 jmp 0x8:start32 # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
13
14 .code32
15 start32:
16 ???? 90 nop
17
(GAS does listing field widths for the left column in "words", which apparently means 32-bit chunks. That's why the 00
most-significant byte of the segment selector is separated by a space.)
Putting a label before the jmp 0x8:label
didn't help; it's not an issue of forward vs. backward reference. Even jmp 0x8:23
fails to assemble.
Syntax "recommended" by disassemblers, from a working build:
objdump -drwC -Mintel -mi8086 foo.o
:
foo.o: file format elf32-i386
Disassembly of section .text:
00000000 <start32-0x17>:
0: 0f b6 c0 movzx ax,al
3: ea 17 00 08 00 jmp 0x8:0x17 4: R_386_16 .text
8: ea 17 00 08 00 jmp 0x8:0x17 9: R_386_16 .text
d: ea 17 00 08 00 jmp 0x8:0x17 e: R_386_16 .text
12: ea 17 00 08 00 jmp 0x8:0x17 13: R_386_16 .text
00000017 <start32>:
17: 90 nop
llvm-objdump --mattr=+16bit-mode --x86-asm-syntax=intel -d foo.o
:
00000000 <.text>:
0: 0f b6 c0 movzx ax, al
3: ea 17 00 08 00 ljmp 8, 23
8: ea 17 00 08 00 ljmp 8, 23
d: ea 17 00 08 00 ljmp 8, 23
12: ea 17 00 08 00 ljmp 8, 23
00000017 <start32>:
17: 90 nop
And BTW, I didn't get clang 11.0 to assemble any Intel-syntax versions of this with a symbol name. ljmp 8, 12
assembles with clang, but not even ljmp 8, start32
. Only by switching to AT&T syntax and back could I get clang's built-in assembler (clang -m32 -masm=intel -c
) to emit a 16-bit mode far jmp.
.att_syntax
ljmp $0x8, $start32 # working everywhere, even with clang
.intel_syntax noprefix
Keep in mind this direct form of far JMP is not available in 64-bit mode; perhaps that's why LLVM's built-in assembler appears to have spent less effort on it.
Footnote 1: Actually .intel_syntax prefix
works, too, but never use that. Nobody want to see the franken-monster that is mov %eax, [%eax]
, or especially add %edx, %eax
that's using dst, src
order, but with AT&T decorated register names.
Can I use Intel syntax of x86 assembly with GCC?
If you are using separate assembly files, gas has a directive to support Intel syntax:
.intel_syntax noprefix # not recommended for inline asm
which uses Intel syntax and doesn't need the % prefix before register names.
(You can also run as
with -msyntax=intel -mnaked-reg
to have that as the default instead of att
, in case you don't want to put .intel_syntax noprefix
at the top of your files.)
Inline asm: compile with -masm=intel
For inline assembly, you can compile your C/C++ sources with gcc -masm=intel
(See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8]
instead of 8(%rdi)
.
This works with GCC itself and ICC, but for clang only clang 14 and later.
(Not released yet, but the patch is in current trunk.)
Using .intel_syntax noprefix
at the start of inline asm, and switching back with .att_syntax
can work, but will break if you use any m
constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax
as a register name even in intel-noprefix mode.
Using .att_syntax
at the end of an asm()
statement will also break compilation with -masm=intel
; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as
to be assembled separately.)
Related:
- GCC manual: asm dialect alternatives: writing an
asm
statement with{att | intel}
in the template so it works when compiled with-masm=att
or-masm=intel
. See an example usinglock cmpxchg
. - https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
- AT&T syntax: https://stackoverflow.com/tags/att/info
- Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
- The x86 tag wiki has links to manuals, optimization guides, and tutorials.
Commenting syntax for x86 AT&T syntax assembly
Comments for at&t assembler are:
# this is a comment
/* this is a comment */
According to the fourth result Google gave me
//
and /* */
comments are only supported in .S
files because GCC runs the C preprocessor on them before assembling. For .s
files, the actual assembler itself (as
) only handles #
as a comment character, for x86.
For some other ISAs, GAS uses other comment characters, for example @
for ARM.
Related Topics
How to Pass Parameters to a Bash Script
Write a Bash Shell Script That Consumes a Constant Amount of Ram for a User Defined Time
How to Convert Linux 32-Bit Gcc Inline Assembly to 64-Bit Code
Symbols from Convenience Library Not Getting Exported in Executable
Setting the Vim Background Colors
Makefile That Distinguishes Between Windows and Unix-Like Systems
How to Debug Linux Kernel Modules with Qemu
Why Do Shells Ignore Sigint and Sigquit in Backgrounded Processes
Linux X64: Why Does R10 Come Before R8 and R9 in Syscalls
Rsync Copy Over Only Certain Types of Files Using Include Option
How to Make Mv Create the Directory to Be Moved to If It Doesn't Exist
How to Check If Smtp Is Working from Commandline (Linux)
Why Doesn't the Cd Command Work in My Shell Program
Delete All Files Older Than 30 Days, Based on File Name as Date
Signing Windows Application on Linux-Based Distros