GCC: putchar(char) in inline assembly
When using GNU C inline asm, use constraints to tell the compiler where you want things, instead of doing it "manually" with instructions inside the asm template.
For writechar
and readchar
, we only need a "syscall"
as the template, with constraints to set up all the inputs in registers (and the pointed-to char
in memory for the write(2)
system call), according to the x86-64 Linux system-call convention (which very closely matches the System V ABI's function-calling convention). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.
This also makes it easy to avoid clobbering the red-zone (128 bytes below RSP), where the compiler might be keeping values. You must not clobber it from inline asm (so push
/ pop
aren't usable unless you sub rsp, 128
first: see Using base pointer register in C++ inline asm for that and many useful links about GNU C inline asm), and there's no way to tell the compiler you clobber it. You could build with -mno-redzone
, but in this case input/output operands are much better.
I'm hesitant to call these putchar
and getchar
. You can do that if you're implementing your own stdio that doesn't support buffering yet, but some functions require input buffering to implement correctly. For example, scanf
has to examine characters to see if they match the format string, and leave them "unread" if they don't. Output buffering is optional, though; you could I think fully implement stdio with functions that create a private buffer and write()
it, or directly write()
their input pointer.
writechar()
:
int writechar(char my_char)
{
int retval; // sys_write uses ssize_t, but we only pass len=1
// so the return value is either 1 on success or -1..-4095 for error
// and thus fits in int
asm volatile("syscall #dummy arg picked %[dummy]\n"
: "=a" (retval) /* output in EAX */
/* inputs: ssize_t read(int fd, const void *buf, size_t count); */
: "D"(1), // RDI = fd=stdout
"S"(&my_char), // RSI = buf
"d"(1) // RDX = length
, [dummy]"m" (my_char) // dummy memory input, otherwise compiler doesn't store the arg
/* clobbered regs */
: "rcx", "r11" // clobbered by syscall
);
// It doesn't matter what addressing mode "m"(my_char) picks,
// as long as it refers to the same memory as &my_char so the compiler actually does a store
return retval;
}
This compiles very efficiently with gcc -O3, on the Godbolt compiler explorer.
writechar:
movb %dil, -4(%rsp) # store my_char into the red-zone
movl $1, %edi
leaq -4(%rsp), %rsi
movl %edi, %edx # optimize because fd = len
syscall # dummy arg picked -4(%rsp)
ret
@nrz's test main
inlines it much more efficiently than the unsafe (red-zone clobbering) version in that answer, taking advantage of the fact that syscall
leaves most registers unmodified so it can just set them once.
main:
movl $97, %r8d # my_char = 'a'
leaq -1(%rsp), %rsi # rsi = &my_char
movl $1, %edx # len
.L6: # do {
movb %r8b, -1(%rsp) # store the char into the buffer
movl %edx, %edi # silly compiler doesn't hoist this out of the loop
syscall #dummy arg picked -1(%rsp)
addl $1, %r8d
cmpb $123, %r8b
jne .L6 # } while(++my_char < 'z'+1)
movb $10, -1(%rsp)
syscall #dummy arg picked -1(%rsp)
xorl %eax, %eax # return 0
ret
readchar(), done the same way:
int readchar(void)
{
int retval;
unsigned char my_char;
asm volatile("syscall #dummy arg picked %[dummy]\n"
/* outputs */
: "=a" (retval)
,[dummy]"=m" (my_char) // tell the compiler the asm dereferences &my_char
/* inputs: ssize_t read(int fd, void *buf, size_t count); */
: "D"(0), // RDI = fd=stdin
"S" (&my_char), // RDI = buf
"d"(1) // RDX = length
: "rcx", "r11" // clobbered by syscall
);
if (retval < 0) // -1 .. -4095 are -errno values
return retval;
return my_char; // else a 0..255 char / byte
}
Callers can check for error by checking c < 0
.
GCC: putchar(char) in inline assembly
When using GNU C inline asm, use constraints to tell the compiler where you want things, instead of doing it "manually" with instructions inside the asm template.
For writechar
and readchar
, we only need a "syscall"
as the template, with constraints to set up all the inputs in registers (and the pointed-to char
in memory for the write(2)
system call), according to the x86-64 Linux system-call convention (which very closely matches the System V ABI's function-calling convention). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.
This also makes it easy to avoid clobbering the red-zone (128 bytes below RSP), where the compiler might be keeping values. You must not clobber it from inline asm (so push
/ pop
aren't usable unless you sub rsp, 128
first: see Using base pointer register in C++ inline asm for that and many useful links about GNU C inline asm), and there's no way to tell the compiler you clobber it. You could build with -mno-redzone
, but in this case input/output operands are much better.
I'm hesitant to call these putchar
and getchar
. You can do that if you're implementing your own stdio that doesn't support buffering yet, but some functions require input buffering to implement correctly. For example, scanf
has to examine characters to see if they match the format string, and leave them "unread" if they don't. Output buffering is optional, though; you could I think fully implement stdio with functions that create a private buffer and write()
it, or directly write()
their input pointer.
writechar()
:
int writechar(char my_char)
{
int retval; // sys_write uses ssize_t, but we only pass len=1
// so the return value is either 1 on success or -1..-4095 for error
// and thus fits in int
asm volatile("syscall #dummy arg picked %[dummy]\n"
: "=a" (retval) /* output in EAX */
/* inputs: ssize_t read(int fd, const void *buf, size_t count); */
: "D"(1), // RDI = fd=stdout
"S"(&my_char), // RSI = buf
"d"(1) // RDX = length
, [dummy]"m" (my_char) // dummy memory input, otherwise compiler doesn't store the arg
/* clobbered regs */
: "rcx", "r11" // clobbered by syscall
);
// It doesn't matter what addressing mode "m"(my_char) picks,
// as long as it refers to the same memory as &my_char so the compiler actually does a store
return retval;
}
This compiles very efficiently with gcc -O3, on the Godbolt compiler explorer.
writechar:
movb %dil, -4(%rsp) # store my_char into the red-zone
movl $1, %edi
leaq -4(%rsp), %rsi
movl %edi, %edx # optimize because fd = len
syscall # dummy arg picked -4(%rsp)
ret
@nrz's test main
inlines it much more efficiently than the unsafe (red-zone clobbering) version in that answer, taking advantage of the fact that syscall
leaves most registers unmodified so it can just set them once.
main:
movl $97, %r8d # my_char = 'a'
leaq -1(%rsp), %rsi # rsi = &my_char
movl $1, %edx # len
.L6: # do {
movb %r8b, -1(%rsp) # store the char into the buffer
movl %edx, %edi # silly compiler doesn't hoist this out of the loop
syscall #dummy arg picked -1(%rsp)
addl $1, %r8d
cmpb $123, %r8b
jne .L6 # } while(++my_char < 'z'+1)
movb $10, -1(%rsp)
syscall #dummy arg picked -1(%rsp)
xorl %eax, %eax # return 0
ret
readchar(), done the same way:
int readchar(void)
{
int retval;
unsigned char my_char;
asm volatile("syscall #dummy arg picked %[dummy]\n"
/* outputs */
: "=a" (retval)
,[dummy]"=m" (my_char) // tell the compiler the asm dereferences &my_char
/* inputs: ssize_t read(int fd, void *buf, size_t count); */
: "D"(0), // RDI = fd=stdin
"S" (&my_char), // RDI = buf
"d"(1) // RDX = length
: "rcx", "r11" // clobbered by syscall
);
if (retval < 0) // -1 .. -4095 are -errno values
return retval;
return my_char; // else a 0..255 char / byte
}
Callers can check for error by checking c < 0
.
Writing a putchar in Assembly for x86_64 with 64 bit Linux?
From man 2 write
, you can see the signature of write
is,
ssize_t write(int fd, const void *buf, size_t count);
It takes a pointer (const void *buf
) to a buffer in memory. You can't pass it a char
by value, so you have to store it to memory and pass a pointer.
(Don't print one char at a time unless you only have one to print, that's really inefficient. Construct a buffer in memory and print that. e.g. this x86-64 Linux NASM function: How do I print an integer in Assembly Level Programming without printf from the c library?)
A NASM version of GCC: putchar(char) in inline assembly:
; x86-64 System V calling convention: input = byte in DIL
; clobbers: RDI, RSI, RDX, RCX, R11 (last 2 by syscall itself)
; returns: RAX = write return value: 1 for success, -1..-4095 for error
writechar:
mov byte [rsp-4], dil ; store the char from RDI
mov edi, 1 ; EDI = fd=1 = stdout
lea rsi, [rsp-4] ; RSI = buf
mov edx, edi ; RDX = len = 1
syscall ; rax = write(1, buf, 1)
ret
If you do pass an invalid pointer in RSI, such as '2'
(integer 50
), the system call will return -EFAULT
(-14
) in RAX. (The kernel returns error codes on bad pointers to system calls, instead of delivering a SIGSEGV like it would if you deref in user-space).
See also What are the return values of system calls in Assembly?
Instead of writing code to check return values, in toy programs / experiments you should just run them under strace ./a.out
, especially if you're writing your own _start
without libc there won't be any other system calls during startup that you don't make yourself, so it's very easy to read the output. How should strace be used?
How can I call putchar from the C library using ARM Assembly for the Raspberry pi?
There are a few issues with your code actually, and they're all related to what is and is not preserved across a call boundary.
@Jester's comment is spot on in terms of your immediate problem: the PSR (which contains the status flags) is not preserved across a call boundary, so the result of your CMP
is clobbered by the BL
.
But it's also worth noting that lr
is clobbered by the BL
too, so when you reach the end of main()
the BX lr
will branch right back to the line after the BL
. Your comment suggests that you know that r0-r3
are call-clobbered. But r12
and lr
are too, so they need preserving if you're using them; and main()
is a function just like any other, so it needs to conform to the calling conventions by preserving r4-r11
.
Currently, main()
is clobbering r4
and r5
, so these need to be pushed to the stack at the start and popped at the end, along with lr
(to avoid the problem of lr
being clobbered by the BL
). The ARM ABI requires 8-byte stack alignment across call boundaries in different translation units, so you'll have to push and pop one other register too to make it an even number.
So, at the start, you'll want
main:
PUSH {r4-r6,lr}
and at the end
POP {r4-r6,lr}
BX lr
or equivalently
POP {r4-r6,pc}
where the stacked value of lr
is popped directly into the program counter, which causes a branch.
C/C++ Inline assembly [with C loops]
It doesn't work because lea instruction is intended to get the address of a variable. (+1 to zebarbox for this note.) We need the value of chr, not its address, so we use this instead:
movsx eax,chr
This pseudoinstruction will compile to something like this:
movsx eax,[ebp-4]
You can also write putchar(chr), put there a breakpoint, run the application and look into disassembly window to see how it is compiled.
Note that I use movsx because chr is char and I need here a dword. If chr was int, I would simply use mov instruction.
Also, you are not allowed to use pop ebx, because ebx must not be changed here. Use pop eax or add esp,4 instead.
Can't see calls to function or constant value in optimized compiler-generated assembly
- Your functions were all inlined, so there aren't any calls to them in
main
. Instead, their result is used directly. - The value 17 doesn't appear in your code because the compiler performed the calculations at compile time. All functions return
int((17+15)/3.0f)
which is 10. And you can seemov esi, 10
three times in the generated assembly, which is used to pass the value tobasic:ostream::operator<<
. - 1077936128 is the representation of
3.0f
when read as an integer from memory. (See Understanding GCC's floating point constants in assembly listing output for details). It is only used in the three functions and not inmain
(where constant-propagation resulted in a simple integer at compile-time).
Related Topics
History Command Works in a Terminal, But Doesn't When Written as a Bash Script
How to Use an Older Version of Gcc in Linux
Android Studio 3.0 Emulator Does Not Start
Extending a Script to Loop Over Multiple Files and Generate Output Names
Omitting the First Line from Any Linux Command Output
How to Do Division with Variables in a Linux Shell
Limit the Memory and CPU Available for a User in Linux
How to Change Permissions to Certain File Pattern/Extension
Use Bash to Read a File and Then Execute a Command from the Words Extracted
How to Export Database Schema in Oracle to a Dump File
Why Does '/Proc/Meminfo' Show 32Gb When Aws Instance Has Only 16Gb
Insert New Line to Bash Prompts
Execute Script as Another User Whilst Not Being Root
Why Do My Results Different Following Along the Tiny Asm Example
Unzip All Files in a Directory
How to Diff Directories Over Ssh