Gcc: Putchar(Char) in Inline Assembly

GCC: putchar(char) in inline assembly

When using GNU C inline asm, use constraints to tell the compiler where you want things, instead of doing it "manually" with instructions inside the asm template.

For writechar and readchar, we only need a "syscall" as the template, with constraints to set up all the inputs in registers (and the pointed-to char in memory for the write(2) system call), according to the x86-64 Linux system-call convention (which very closely matches the System V ABI's function-calling convention). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

This also makes it easy to avoid clobbering the red-zone (128 bytes below RSP), where the compiler might be keeping values. You must not clobber it from inline asm (so push / pop aren't usable unless you sub rsp, 128 first: see Using base pointer register in C++ inline asm for that and many useful links about GNU C inline asm), and there's no way to tell the compiler you clobber it. You could build with -mno-redzone, but in this case input/output operands are much better.


I'm hesitant to call these putchar and getchar. You can do that if you're implementing your own stdio that doesn't support buffering yet, but some functions require input buffering to implement correctly. For example, scanf has to examine characters to see if they match the format string, and leave them "unread" if they don't. Output buffering is optional, though; you could I think fully implement stdio with functions that create a private buffer and write() it, or directly write() their input pointer.

writechar():

int writechar(char my_char)
{
int retval; // sys_write uses ssize_t, but we only pass len=1
// so the return value is either 1 on success or -1..-4095 for error
// and thus fits in int

asm volatile("syscall #dummy arg picked %[dummy]\n"
: "=a" (retval) /* output in EAX */
/* inputs: ssize_t read(int fd, const void *buf, size_t count); */
: "D"(1), // RDI = fd=stdout
"S"(&my_char), // RSI = buf
"d"(1) // RDX = length
, [dummy]"m" (my_char) // dummy memory input, otherwise compiler doesn't store the arg
/* clobbered regs */
: "rcx", "r11" // clobbered by syscall
);
// It doesn't matter what addressing mode "m"(my_char) picks,
// as long as it refers to the same memory as &my_char so the compiler actually does a store

return retval;
}

This compiles very efficiently with gcc -O3, on the Godbolt compiler explorer.

writechar:
movb %dil, -4(%rsp) # store my_char into the red-zone
movl $1, %edi
leaq -4(%rsp), %rsi
movl %edi, %edx # optimize because fd = len
syscall # dummy arg picked -4(%rsp)

ret

@nrz's test main inlines it much more efficiently than the unsafe (red-zone clobbering) version in that answer, taking advantage of the fact that syscall leaves most registers unmodified so it can just set them once.

main:
movl $97, %r8d # my_char = 'a'
leaq -1(%rsp), %rsi # rsi = &my_char
movl $1, %edx # len
.L6: # do {
movb %r8b, -1(%rsp) # store the char into the buffer
movl %edx, %edi # silly compiler doesn't hoist this out of the loop
syscall #dummy arg picked -1(%rsp)

addl $1, %r8d
cmpb $123, %r8b
jne .L6 # } while(++my_char < 'z'+1)

movb $10, -1(%rsp)
syscall #dummy arg picked -1(%rsp)

xorl %eax, %eax # return 0
ret

readchar(), done the same way:

int readchar(void)
{
int retval;
unsigned char my_char;
asm volatile("syscall #dummy arg picked %[dummy]\n"
/* outputs */
: "=a" (retval)
,[dummy]"=m" (my_char) // tell the compiler the asm dereferences &my_char

/* inputs: ssize_t read(int fd, void *buf, size_t count); */
: "D"(0), // RDI = fd=stdin
"S" (&my_char), // RDI = buf
"d"(1) // RDX = length

: "rcx", "r11" // clobbered by syscall
);
if (retval < 0) // -1 .. -4095 are -errno values
return retval;
return my_char; // else a 0..255 char / byte
}

Callers can check for error by checking c < 0.

GCC: putchar(char) in inline assembly

When using GNU C inline asm, use constraints to tell the compiler where you want things, instead of doing it "manually" with instructions inside the asm template.

For writechar and readchar, we only need a "syscall" as the template, with constraints to set up all the inputs in registers (and the pointed-to char in memory for the write(2) system call), according to the x86-64 Linux system-call convention (which very closely matches the System V ABI's function-calling convention). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

This also makes it easy to avoid clobbering the red-zone (128 bytes below RSP), where the compiler might be keeping values. You must not clobber it from inline asm (so push / pop aren't usable unless you sub rsp, 128 first: see Using base pointer register in C++ inline asm for that and many useful links about GNU C inline asm), and there's no way to tell the compiler you clobber it. You could build with -mno-redzone, but in this case input/output operands are much better.


I'm hesitant to call these putchar and getchar. You can do that if you're implementing your own stdio that doesn't support buffering yet, but some functions require input buffering to implement correctly. For example, scanf has to examine characters to see if they match the format string, and leave them "unread" if they don't. Output buffering is optional, though; you could I think fully implement stdio with functions that create a private buffer and write() it, or directly write() their input pointer.

writechar():

int writechar(char my_char)
{
int retval; // sys_write uses ssize_t, but we only pass len=1
// so the return value is either 1 on success or -1..-4095 for error
// and thus fits in int

asm volatile("syscall #dummy arg picked %[dummy]\n"
: "=a" (retval) /* output in EAX */
/* inputs: ssize_t read(int fd, const void *buf, size_t count); */
: "D"(1), // RDI = fd=stdout
"S"(&my_char), // RSI = buf
"d"(1) // RDX = length
, [dummy]"m" (my_char) // dummy memory input, otherwise compiler doesn't store the arg
/* clobbered regs */
: "rcx", "r11" // clobbered by syscall
);
// It doesn't matter what addressing mode "m"(my_char) picks,
// as long as it refers to the same memory as &my_char so the compiler actually does a store

return retval;
}

This compiles very efficiently with gcc -O3, on the Godbolt compiler explorer.

writechar:
movb %dil, -4(%rsp) # store my_char into the red-zone
movl $1, %edi
leaq -4(%rsp), %rsi
movl %edi, %edx # optimize because fd = len
syscall # dummy arg picked -4(%rsp)

ret

@nrz's test main inlines it much more efficiently than the unsafe (red-zone clobbering) version in that answer, taking advantage of the fact that syscall leaves most registers unmodified so it can just set them once.

main:
movl $97, %r8d # my_char = 'a'
leaq -1(%rsp), %rsi # rsi = &my_char
movl $1, %edx # len
.L6: # do {
movb %r8b, -1(%rsp) # store the char into the buffer
movl %edx, %edi # silly compiler doesn't hoist this out of the loop
syscall #dummy arg picked -1(%rsp)

addl $1, %r8d
cmpb $123, %r8b
jne .L6 # } while(++my_char < 'z'+1)

movb $10, -1(%rsp)
syscall #dummy arg picked -1(%rsp)

xorl %eax, %eax # return 0
ret

readchar(), done the same way:

int readchar(void)
{
int retval;
unsigned char my_char;
asm volatile("syscall #dummy arg picked %[dummy]\n"
/* outputs */
: "=a" (retval)
,[dummy]"=m" (my_char) // tell the compiler the asm dereferences &my_char

/* inputs: ssize_t read(int fd, void *buf, size_t count); */
: "D"(0), // RDI = fd=stdin
"S" (&my_char), // RDI = buf
"d"(1) // RDX = length

: "rcx", "r11" // clobbered by syscall
);
if (retval < 0) // -1 .. -4095 are -errno values
return retval;
return my_char; // else a 0..255 char / byte
}

Callers can check for error by checking c < 0.

Writing a putchar in Assembly for x86_64 with 64 bit Linux?

From man 2 write, you can see the signature of write is,

ssize_t write(int fd, const void *buf, size_t count);

It takes a pointer (const void *buf) to a buffer in memory. You can't pass it a char by value, so you have to store it to memory and pass a pointer.

(Don't print one char at a time unless you only have one to print, that's really inefficient. Construct a buffer in memory and print that. e.g. this x86-64 Linux NASM function: How do I print an integer in Assembly Level Programming without printf from the c library?)

A NASM version of GCC: putchar(char) in inline assembly:

; x86-64 System V calling convention: input = byte in DIL
; clobbers: RDI, RSI, RDX, RCX, R11 (last 2 by syscall itself)
; returns: RAX = write return value: 1 for success, -1..-4095 for error
writechar:
mov byte [rsp-4], dil ; store the char from RDI

mov edi, 1 ; EDI = fd=1 = stdout
lea rsi, [rsp-4] ; RSI = buf
mov edx, edi ; RDX = len = 1
syscall ; rax = write(1, buf, 1)
ret

If you do pass an invalid pointer in RSI, such as '2' (integer 50), the system call will return -EFAULT (-14) in RAX. (The kernel returns error codes on bad pointers to system calls, instead of delivering a SIGSEGV like it would if you deref in user-space).

See also What are the return values of system calls in Assembly?

Instead of writing code to check return values, in toy programs / experiments you should just run them under strace ./a.out, especially if you're writing your own _start without libc there won't be any other system calls during startup that you don't make yourself, so it's very easy to read the output. How should strace be used?

How can I call putchar from the C library using ARM Assembly for the Raspberry pi?

There are a few issues with your code actually, and they're all related to what is and is not preserved across a call boundary.

@Jester's comment is spot on in terms of your immediate problem: the PSR (which contains the status flags) is not preserved across a call boundary, so the result of your CMP is clobbered by the BL.

But it's also worth noting that lr is clobbered by the BL too, so when you reach the end of main() the BX lr will branch right back to the line after the BL. Your comment suggests that you know that r0-r3 are call-clobbered. But r12 and lr are too, so they need preserving if you're using them; and main() is a function just like any other, so it needs to conform to the calling conventions by preserving r4-r11.

Currently, main() is clobbering r4 and r5, so these need to be pushed to the stack at the start and popped at the end, along with lr (to avoid the problem of lr being clobbered by the BL). The ARM ABI requires 8-byte stack alignment across call boundaries in different translation units, so you'll have to push and pop one other register too to make it an even number.

So, at the start, you'll want

main:
PUSH {r4-r6,lr}

and at the end

    POP {r4-r6,lr}
BX lr

or equivalently

    POP {r4-r6,pc}

where the stacked value of lr is popped directly into the program counter, which causes a branch.

C/C++ Inline assembly [with C loops]

It doesn't work because lea instruction is intended to get the address of a variable. (+1 to zebarbox for this note.) We need the value of chr, not its address, so we use this instead:

movsx eax,chr

This pseudoinstruction will compile to something like this:

movsx eax,[ebp-4]

You can also write putchar(chr), put there a breakpoint, run the application and look into disassembly window to see how it is compiled.

Note that I use movsx because chr is char and I need here a dword. If chr was int, I would simply use mov instruction.

Also, you are not allowed to use pop ebx, because ebx must not be changed here. Use pop eax or add esp,4 instead.

Can't see calls to function or constant value in optimized compiler-generated assembly


  1. Your functions were all inlined, so there aren't any calls to them in main. Instead, their result is used directly.
  2. The value 17 doesn't appear in your code because the compiler performed the calculations at compile time. All functions return int((17+15)/3.0f) which is 10. And you can see mov esi, 10 three times in the generated assembly, which is used to pass the value to basic:ostream::operator<<.
  3. 1077936128 is the representation of 3.0f when read as an integer from memory. (See Understanding GCC's floating point constants in assembly listing output for details). It is only used in the three functions and not in main (where constant-propagation resulted in a simple integer at compile-time).


Related Topics



Leave a reply



Submit