Printing an Integer as a String With At&T Syntax, With Linux System Calls Instead of Printf

Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf

As @ped7g points out, you're doing several things wrong: using the int 0x80 32-bit ABI in 64-bit code, and passing character values instead of pointers to the write() system call.

Here's how to print an integer in x8-64 Linux, the simple and somewhat-efficient1 way, using the same repeated division / modulo by 10.

System calls are expensive (probably thousands of cycles for write(1, buf, 1)), and doing a syscall inside the loop steps on registers so it's inconvenient and clunky as well as inefficient. We should write the characters into a small buffer, in printing order (most-significant digit at the lowest address), and make a single write() system call on that.

But then we need a buffer. The maximum length of a 64-bit integer is only 20 decimal digits, so we can just use some stack space. In x86-64 Linux, we can use stack space below RSP (up to 128B) without "reserving" it by modifying RSP. This is called the red-zone. If you wanted to pass the buffer to another function instead of a syscall, you would have to reserve space with sub $24, %rsp or something.

Instead of hard-coding system-call numbers, using GAS makes it easy to use the constants defined in .h files. Note the mov $__NR_write, %eax near the end of the function. The x86-64 SystemV ABI passes system-call arguments in similar registers to the function-calling convention. (So it's totally different from the 32-bit int 0x80 ABI, which you shouldn't use in 64-bit code.)

// building with  gcc foo.S  will use CPP before GAS so we can use headers
#include <asm/unistd.h> // This is a standard Linux / glibc header file
// includes unistd_64.h or unistd_32.h depending on current mode
// Contains only #define constants (no C prototypes) so we can include it from asm without syntax errors.

.p2align 4
.globl print_integer #void print_uint64(uint64_t value)
print_uint64:
lea -1(%rsp), %rsi # We use the 128B red-zone as a buffer to hold the string
# a 64-bit integer is at most 20 digits long in base 10, so it fits.

movb $'\n', (%rsi) # store the trailing newline byte. (Right below the return address).
# If you need a null-terminated string, leave an extra byte of room and store '\n\0'. Or push $'\n'

mov $10, %ecx # same as mov $10, %rcx but 2 bytes shorter
# note that newline (\n) has ASCII code 10, so we could actually have stored the newline with movb %cl, (%rsi) to save code size.

mov %rdi, %rax # function arg arrives in RDI; we need it in RAX for div
.Ltoascii_digit: # do{
xor %edx, %edx
div %rcx # rax = rdx:rax / 10. rdx = remainder

# store digits in MSD-first printing order, working backwards from the end of the string
add $'0', %edx # integer to ASCII. %dl would work, too, since we know this is 0-9
dec %rsi
mov %dl, (%rsi) # *--p = (value%10) + '0';

test %rax, %rax
jnz .Ltoascii_digit # } while(value != 0)
# If we used a loop-counter to print a fixed number of digits, we would get leading zeros
# The do{}while() loop structure means the loop runs at least once, so we get "0\n" for input=0

# Then print the whole string with one system call
mov $__NR_write, %eax # call number from asm/unistd_64.h
mov $1, %edi # fd=1
# %rsi = start of the buffer
mov %rsp, %rdx
sub %rsi, %rdx # length = one_past_end - start
syscall # write(fd=1 /*rdi*/, buf /*rsi*/, length /*rdx*/); 64-bit ABI
# rax = return value (or -errno)
# rcx and r11 = garbage (destroyed by syscall/sysret)
# all other registers = unmodified (saved/restored by the kernel)

# we don't need to restore any registers, and we didn't modify RSP.
ret

To test this function, I put this in the same file to call it and exit:

.p2align 4
.globl _start
_start:
mov $10120123425329922, %rdi
# mov $0, %edi # Yes, it does work with input = 0
call print_uint64

xor %edi, %edi
mov $__NR_exit, %eax
syscall # sys_exit(0)

I built this into a static binary (with no libc):

$ gcc -Wall -static -nostdlib print-integer.S && ./a.out 
10120123425329922
$ strace ./a.out > /dev/null
execve("./a.out", ["./a.out"], 0x7fffcb097340 /* 51 vars */) = 0
write(1, "10120123425329922\n", 18) = 18
exit(0) = ?
+++ exited with 0 +++
$ file ./a.out
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=69b865d1e535d5b174004ce08736e78fade37d84, not stripped

Footnote 1: See Why does GCC use multiplication by a strange number in implementing integer division? for avoiding div r64 for division by 10, because that's very slow (21 to 83 cycles on Intel Skylake). A multiplicative inverse would make this function actually efficient, not just "somewhat". (But of course there'd still be room for optimizations...)




Related: Linux x86-32 extended-precision loop that prints 9 decimal digits from each 32-bit "limb": see .toascii_digit: in my Extreme Fibonacci code-golf answer. It's optimized for code-size (even at the expense of speed), but well-commented.

It uses div like you do, because that's smaller than using a fast multiplicative inverse). It uses loop for the outer loop (over multiple integer for extended precision), again for code-size at the cost of speed.

It uses the 32-bit int 0x80 ABI, and prints into a buffer that was holding the "old" Fibonacci value, not the current.


Another way to get efficient asm is from a C compiler. For just the loop over digits, look at what gcc or clang produce for this C source (which is basically what the asm is doing). The Godbolt Compiler explorer makes it easy to try with different options and different compiler versions.

See gcc7.2 -O3 asm output which is nearly a drop-in replacement for the loop in print_uint64 (because I chose the args to go in the same registers):

void itoa_end(unsigned long val, char *p_end) {
const unsigned base = 10;
do {
*--p_end = (val % base) + '0';
val /= base;
} while(val);

// write(1, p_end, orig-current);
}

I tested performance on a Skylake i7-6700k by commenting out the syscall instruction and putting a repeat loop around the function call. The version with mul %rcx / shr $3, %rdx is about 5 times faster than the version with div %rcx for storing a long number-string (10120123425329922) into a buffer. The div version ran at 0.25 instructions per clock, while the mul version ran at 2.65 instructions per clock (although requiring many more instructions).

It might be worth unrolling by 2, and doing a divide by 100 and splitting up the remainder of that into 2 digits. That would give a lot better instruction-level parallelism, in case the simpler version bottlenecks on mul + shr latency. The chain of multiply/shift operations that brings val to zero would be half as long, with more work in each short independent dependency chain to handle a 0-99 remainder.


Related:

  • NASM version of this answer, for x86-64 or i386 Linux How do I print an integer in Assembly Level Programming without printf from the c library?

  • How to convert a binary integer number to a hex string? - Base 16 is a power of 2, conversion is much simpler and doesn't require div.

Assembly , syscall not work as expected. Ubuntu Linux x86_64 , using AT&T syntax

mov     $output,%rsi     # address of string to output moved to rsi
^^^^^^

Address of string. The value $12 is not the character sequence "12". If you wanted to print the string 12, you would need to load 0x31 and 0x32 ('1' and '2') into the memory area (making it big enough) the use 2 as the length.

For example, movw $0x3231, output or better movw $0x3231, output(%rip) to use RIP-relative addressing for static data, like normal for x86-64. (Unlike NASM, GAS syntax doesn't $'12' as a way to write the same integer constant.)

If you want to print an integer as a string, you'll probably want to manipulate it mathematically so you can do it one digit at a time. (Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf)

x86 assembly, integer to string reversed

Some itoa functions push their digits in one loop, and pop them in another, so they can get them in MSD-first printing order after div generates LSD-first. But there is zero point in pushl %edx / movb (%esp), %cl - you're just reloading it right away and filling up the stack (before movl %ebp, %esp removes it). You might as well have just done movb %dl, (%ebx, %edi, 1).

The better way to handle this is to start at the end of a buffer and decrement a pointer, so your ASCII digits end up in memory in printing order, opposite of the order you generated, without any crappy push/pop loop.

See Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf for a working version (using x86-64, but easy enough to port to 32-bit code.) How do I print an integer in Assembly Level Programming without printf from the c library? is a NASM version, also explaining the digit-ordering logic.


Also, it's silly not to return the length; you can easily do a pointer subtraction so the caller knows how many digits you stored in the buffer. That way they don't have to call strlen. (Or 0-terminate the buffer at all if they're just passing it to a write system call).

If you want to also return the pointer, you can do that too, in a different register. This is assembly language; you can easily return multiple separate things, not constrained by the inconvenience of C calling conventions.

Or if you're returning a pointer to the start of the string data after converting backwards, the caller can do the subtraction since they know what end-of-buffer pointer they passed to your itoa_end function in the first place.

Print return value from a syscall in assembly

Wrote a 200+ line assembly program to do this. It prints the time stamp first and then prints the formatted date and time. All its functions follow System-V x64 calling convention.

global _start

section .rodata

strings: ; '\n', ' ', '/', ':'
db 0x1, 0xa, 0x1, 0x20
db 0x1, 0x2f, 0x1, 0x3a
weekdays: ; weekday strings
db 0x3, 0x54, 0x68, 0x75
db 0x3, 0x46, 0x72, 0x69
db 0x3, 0x53, 0x61, 0x74
db 0x3, 0x53, 0x75, 0x6e
db 0x3, 0x4d, 0x6f, 0x6e
db 0x3, 0x54, 0x75, 0x65
db 0x3, 0x57, 0x65, 0x64
months: ; length of months
db 0x1f, 0x1c, 0x1f, 0x1e
db 0x1f, 0x1e, 0x1f, 0x1f
db 0x1e, 0x1f, 0x1e, 0x1f

section .text

_start:
push rbx ; align stack

mov rax, 201 ; sys_time
xor rdi, rdi
syscall

mov rbx, rax

; you may uncomment the following line and put an arbitary timestamp to test it out
; mov rbx, 0

mov rdi, rbx
call print_num ; print unix timestamp

mov rdi, strings
call sys_print ; new line

mov rdi, rbx
call print_time ; print formatted date

pop rbx ; since we are exiting, we don't need this pop actually

mov rax, 60 ; sys_exit
xor rdi, rdi
syscall

leap_year: ; rsi + (year in rdi is leap)
mov rax, rdi
mov rcx, 4
xor rdx, rdx
div rcx
test rdx, rdx ; return 0 if year % 4
jnz func_leap_year_ret_0
mov rax, rdi
mov rcx, 100
xor rdx, rdx
div rcx
test rdx, rdx ; return 1 if year % 100
jnz func_leap_year_ret_1
mov rax, rdi
mov rcx, 400
xor rdx, rdx
div rcx
test rdx, rdx ; return 0 if year % 400
jnz func_leap_year_ret_0
func_leap_year_ret_1:
lea rax, [rsi + 1]
ret
func_leap_year_ret_0:
mov rax, rsi
ret

year_length: ; length of year in rdi
mov rsi, 365
jmp leap_year

month_length: ; length of month (year in rdi, month in rsi)
push r15
push r14
push r13

mov r14, rsi ; back up month in r14, will be used as index
cmp rsi, 1
setz r15b
movzx r13, r15b
xor rsi, rsi
call leap_year
and r13, rax
movzx rax, byte [r14 + months]
add rax, r13

pop r13
pop r14
pop r15
ret

print_time: ; print time_t in rdi
push r15
push r14
push r13
push r12
mov r14, 1970 ; 1970-01-01T00:00:00Z
xor r15, r15

mov rcx, 60
mov rax, rdi
xor rdx, rdx
div rcx
push rdx ; push #5
xor rdx, rdx
div rcx
push rdx ; push #6
mov rcx, 24
xor rdx, rdx
div rcx
push rdx ; push #7, the last one
mov r12, rax
mov r13, rax
func_print_time_loop_1_start:
mov rdi, r14
call year_length
cmp r13, rax
jb func_print_time_loop_2_start
sub r13, rax
inc r14
jmp func_print_time_loop_1_start
func_print_time_loop_2_start:
mov rdi, r14
mov rsi, r15
call month_length
cmp r13, rax
jb func_print_time_loop_end
sub r13, rax
inc r15
jmp func_print_time_loop_2_start
func_print_time_loop_end:
; print time
mov rdi, [rsp]
call print_num
mov rdi, strings + 6
call sys_print
mov rdi, [rsp + 8]
call print_num
mov rdi, strings + 6
call sys_print
mov rdi, [rsp + 16]
call print_num

; print " "
mov rdi, strings + 2
call sys_print

; print weekday
mov rax, r12
mov rcx, 7
xor rdx, rdx
div rcx
lea rdi, [rdx * 4 + weekdays]
call sys_print

; print " "
mov rdi, strings + 2
call sys_print

; print date
mov rdi, r15
inc rdi
call print_num
mov rdi, strings + 4
call sys_print
mov rdi, r13
inc rdi
call print_num
mov rdi, strings + 4
call sys_print
mov rdi, r14
call print_num

; print new line
mov rdi, strings
call sys_print

add rsp, 24
pop r12
pop r13
pop r14
pop r15
ret

print_num: ; print number in rdi
mov r8, rsp
sub rsp, 24 ; 21 bytes for local storage, with extra 3 bytes to keep stack aligned
xor r9, r9
mov rax, rdi
mov rcx, 10
func_print_num_loop_start:
dec r8
xor rdx, rdx
div rcx
add dl, 48
mov [r8], dl
inc r9b
test rax, rax
jnz func_print_num_loop_start
func_print_num_loop_end:
dec r8
mov [r8], r9b
mov rdi, r8
call sys_print
add rsp, 24 ; deallocate local storage, restore rsp
ret

sys_print: ; print a string pointed by rdi
movzx rdx, byte [rdi]
lea rsi, [rdi + 1]
mov rdi, 1 ; stdout
mov rax, 1 ; write
syscall
ret

print_num function prints any number in register rdi. If you want to know how I print a number you can look at that function.

print_time is the where the date and time are calculated and printed.

Here is the output, along with output from a C program that prints formatted date & time using asctime(gmtime(time_t t))

$ ./time && ./ct
1608515228
1:47:8 Mon 12/21/2020
Unix time: 1608515228
C library returns: Mon Dec 21 01:47:08 2020

(The last two lines are from the C program)

You can also put any timestamp in line 34 to test it out.

My solution is very naive:

  • Figure out the total days first, it can be found using time/60/60/24 (and you get your hour/min/sec in this step).
  • Then figure out the year. I did this by subtracting number of days in a year, year by year. I designed a function to figure out the number of days in any year as my helper function.
  • Find month of year and day of month. It's almost the same with step 2. I designed a function to figure out the number of days in any month of any year as my helper function.

Edit:

Pasted the whole program to this answer, as several people asked.

For printing integer part, I used implementation from @PeterCordes here:

https://stackoverflow.com/a/46301894



Related Topics



Leave a reply



Submit