Printing an Integer as a String With At&T Syntax, With Linux System Calls Instead of Printf

Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf

As @ped7g points out, you're doing several things wrong: using the int 0x80 32-bit ABI in 64-bit code, and passing character values instead of pointers to the write() system call.

Here's how to print an integer in x8-64 Linux, the simple and somewhat-efficient¹ way, using the same repeated division / modulo by 10.

System calls are expensive (probably thousands of cycles for write(1, buf, 1)), and doing a syscall inside the loop steps on registers so it's inconvenient and clunky as well as inefficient. We should write the characters into a small buffer, in printing order (most-significant digit at the lowest address), and make a single write() system call on that.

But then we need a buffer. The maximum length of a 64-bit integer is only 20 decimal digits, so we can just use some stack space. In x86-64 Linux, we can use stack space below RSP (up to 128B) without "reserving" it by modifying RSP. This is called the red-zone. If you wanted to pass the buffer to another function instead of a syscall, you would have to reserve space with sub $24, %rsp or something.

Instead of hard-coding system-call numbers, using GAS makes it easy to use the constants defined in .h files. Note the mov $__NR_write, %eax near the end of the function. The x86-64 SystemV ABI passes system-call arguments in similar registers to the function-calling convention. (So it's totally different from the 32-bit int 0x80 ABI, which you shouldn't use in 64-bit code.)

// building with  gcc foo.S  will use CPP before GAS so we can use headers
#include <asm/unistd.h>    // This is a standard Linux / glibc header file
      // includes unistd_64.h or unistd_32.h depending on current mode
      // Contains only #define constants (no C prototypes) so we can include it from asm without syntax errors.

.p2align 4
.globl print_integer            #void print_uint64(uint64_t value)
print_uint64:
    lea   -1(%rsp), %rsi        # We use the 128B red-zone as a buffer to hold the string
                                # a 64-bit integer is at most 20 digits long in base 10, so it fits.

    movb  $'\n', (%rsi)         # store the trailing newline byte.  (Right below the return address).
    # If you need a null-terminated string, leave an extra byte of room and store '\n\0'.  Or  push $'\n'

    mov    $10, %ecx            # same as  mov $10, %rcx  but 2 bytes shorter
    # note that newline (\n) has ASCII code 10, so we could actually have stored the newline with  movb %cl, (%rsi) to save code size.

    mov    %rdi, %rax           # function arg arrives in RDI; we need it in RAX for div
.Ltoascii_digit:                # do{
    xor    %edx, %edx
    div    %rcx                  #  rax = rdx:rax / 10.  rdx = remainder

                                 # store digits in MSD-first printing order, working backwards from the end of the string
    add    $'0', %edx            # integer to ASCII.  %dl would work, too, since we know this is 0-9
    dec    %rsi
    mov    %dl, (%rsi)           # *--p = (value%10) + '0';

    test   %rax, %rax
    jnz  .Ltoascii_digit        # } while(value != 0)
    # If we used a loop-counter to print a fixed number of digits, we would get leading zeros
    # The do{}while() loop structure means the loop runs at least once, so we get "0\n" for input=0

    # Then print the whole string with one system call
    mov   $__NR_write, %eax     # call number from asm/unistd_64.h
    mov   $1, %edi              # fd=1
    # %rsi = start of the buffer
    mov   %rsp, %rdx
    sub   %rsi, %rdx            # length = one_past_end - start
    syscall                     # write(fd=1 /*rdi*/, buf /*rsi*/, length /*rdx*/); 64-bit ABI
    # rax = return value (or -errno)
    # rcx and r11 = garbage (destroyed by syscall/sysret)
    # all other registers = unmodified (saved/restored by the kernel)

    # we don't need to restore any registers, and we didn't modify RSP.
    ret

To test this function, I put this in the same file to call it and exit:

.p2align 4
.globl _start
_start:
    mov    $10120123425329922, %rdi
#    mov    $0, %edi    # Yes, it does work with input = 0
    call   print_uint64

    xor    %edi, %edi
    mov    $__NR_exit, %eax
    syscall                             # sys_exit(0)

I built this into a static binary (with no libc):

$ gcc -Wall -static -nostdlib print-integer.S && ./a.out 
10120123425329922
$ strace ./a.out  > /dev/null
execve("./a.out", ["./a.out"], 0x7fffcb097340 /* 51 vars */) = 0
write(1, "10120123425329922\n", 18)     = 18
exit(0)                                 = ?
+++ exited with 0 +++
$ file ./a.out 
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=69b865d1e535d5b174004ce08736e78fade37d84, not stripped

Footnote 1: See Why does GCC use multiplication by a strange number in implementing integer division? for avoiding div r64 for division by 10, because that's very slow (21 to 83 cycles on Intel Skylake). A multiplicative inverse would make this function actually efficient, not just "somewhat". (But of course there'd still be room for optimizations...)

Related: Linux x86-32 extended-precision loop that prints 9 decimal digits from each 32-bit "limb": see .toascii_digit: in my Extreme Fibonacci code-golf answer. It's optimized for code-size (even at the expense of speed), but well-commented.

It uses div like you do, because that's smaller than using a fast multiplicative inverse). It uses loop for the outer loop (over multiple integer for extended precision), again for code-size at the cost of speed.

It uses the 32-bit int 0x80 ABI, and prints into a buffer that was holding the "old" Fibonacci value, not the current.

Another way to get efficient asm is from a C compiler. For just the loop over digits, look at what gcc or clang produce for this C source (which is basically what the asm is doing). The Godbolt Compiler explorer makes it easy to try with different options and different compiler versions.

See gcc7.2 -O3 asm output which is nearly a drop-in replacement for the loop in print_uint64 (because I chose the args to go in the same registers):

void itoa_end(unsigned long val, char *p_end) {
  const unsigned base = 10;
  do {
    *--p_end = (val % base) + '0';
    val /= base;
  } while(val);

  // write(1, p_end, orig-current);
}

I tested performance on a Skylake i7-6700k by commenting out the syscall instruction and putting a repeat loop around the function call. The version with mul %rcx / shr $3, %rdx is about 5 times faster than the version with div %rcx for storing a long number-string (10120123425329922) into a buffer. The div version ran at 0.25 instructions per clock, while the mul version ran at 2.65 instructions per clock (although requiring many more instructions).

It might be worth unrolling by 2, and doing a divide by 100 and splitting up the remainder of that into 2 digits. That would give a lot better instruction-level parallelism, in case the simpler version bottlenecks on mul + shr latency. The chain of multiply/shift operations that brings val to zero would be half as long, with more work in each short independent dependency chain to handle a 0-99 remainder.

Related:

NASM version of this answer, for x86-64 or i386 Linux How do I print an integer in Assembly Level Programming without printf from the c library?
How to convert a binary integer number to a hex string? - Base 16 is a power of 2, conversion is much simpler and doesn't require div.

Assembly , syscall not work as expected. Ubuntu Linux x86_64 , using AT&T syntax

mov     $output,%rsi     # address of string to output moved to rsi
                                      ^^^^^^

Address of string. The value $12 is not the character sequence "12". If you wanted to print the string 12, you would need to load 0x31 and 0x32 ('1' and '2') into the memory area (making it big enough) the use 2 as the length.

For example, movw $0x3231, output or better movw $0x3231, output(%rip) to use RIP-relative addressing for static data, like normal for x86-64. (Unlike NASM, GAS syntax doesn't $'12' as a way to write the same integer constant.)

If you want to print an integer as a string, you'll probably want to manipulate it mathematically so you can do it one digit at a time. (Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf)

x86 assembly, integer to string reversed

Some itoa functions push their digits in one loop, and pop them in another, so they can get them in MSD-first printing order after div generates LSD-first. But there is zero point in pushl %edx / movb (%esp), %cl - you're just reloading it right away and filling up the stack (before movl %ebp, %esp removes it). You might as well have just done movb %dl, (%ebx, %edi, 1).

The better way to handle this is to start at the end of a buffer and decrement a pointer, so your ASCII digits end up in memory in printing order, opposite of the order you generated, without any crappy push/pop loop.

See Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf for a working version (using x86-64, but easy enough to port to 32-bit code.) How do I print an integer in Assembly Level Programming without printf from the c library? is a NASM version, also explaining the digit-ordering logic.

Also, it's silly not to return the length; you can easily do a pointer subtraction so the caller knows how many digits you stored in the buffer. That way they don't have to call strlen. (Or 0-terminate the buffer at all if they're just passing it to a write system call).

If you want to also return the pointer, you can do that too, in a different register. This is assembly language; you can easily return multiple separate things, not constrained by the inconvenience of C calling conventions.

Or if you're returning a pointer to the start of the string data after converting backwards, the caller can do the subtraction since they know what end-of-buffer pointer they passed to your itoa_end function in the first place.

Print return value from a syscall in assembly

Wrote a 200+ line assembly program to do this. It prints the time stamp first and then prints the formatted date and time. All its functions follow System-V x64 calling convention.

global _start

section .rodata

strings:            ; '\n', ' ', '/', ':'
    db 0x1, 0xa, 0x1, 0x20
    db 0x1, 0x2f, 0x1, 0x3a
weekdays:           ; weekday strings
    db 0x3, 0x54, 0x68, 0x75
    db 0x3, 0x46, 0x72, 0x69
    db 0x3, 0x53, 0x61, 0x74
    db 0x3, 0x53, 0x75, 0x6e
    db 0x3, 0x4d, 0x6f, 0x6e
    db 0x3, 0x54, 0x75, 0x65
    db 0x3, 0x57, 0x65, 0x64
months:             ; length of months
    db 0x1f, 0x1c, 0x1f, 0x1e
    db 0x1f, 0x1e, 0x1f, 0x1f
    db 0x1e, 0x1f, 0x1e, 0x1f

section .text

_start:
    push rbx        ; align stack

    mov rax, 201    ; sys_time
    xor rdi, rdi
    syscall

    mov rbx, rax

    ; you may uncomment the following line and put an arbitary timestamp to test it out
    ; mov rbx, 0

    mov rdi, rbx
    call print_num  ; print unix timestamp

    mov rdi, strings
    call sys_print  ; new line

    mov rdi, rbx
    call print_time ; print formatted date

    pop rbx         ; since we are exiting, we don't need this pop actually

    mov rax, 60     ; sys_exit
    xor rdi, rdi
    syscall

leap_year:          ; rsi + (year in rdi is leap)
    mov rax, rdi
    mov rcx, 4
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 0 if year % 4
    jnz func_leap_year_ret_0
    mov rax, rdi
    mov rcx, 100
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 1 if year % 100
    jnz func_leap_year_ret_1
    mov rax, rdi
    mov rcx, 400
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 0 if year % 400
    jnz func_leap_year_ret_0
func_leap_year_ret_1:
    lea rax, [rsi + 1]
    ret
func_leap_year_ret_0:
    mov rax, rsi
    ret

year_length:        ; length of year in rdi
    mov rsi, 365
    jmp leap_year

month_length:       ; length of month (year in rdi, month in rsi)
    push r15
    push r14
    push r13

    mov r14, rsi    ; back up month in r14, will be used as index
    cmp rsi, 1
    setz r15b
    movzx r13, r15b
    xor rsi, rsi
    call leap_year
    and r13, rax
    movzx rax, byte [r14 + months]
    add rax, r13

    pop r13
    pop r14
    pop r15
    ret

print_time:         ; print time_t in rdi
    push r15
    push r14
    push r13
    push r12
    mov r14, 1970   ; 1970-01-01T00:00:00Z
    xor r15, r15

    mov rcx, 60
    mov rax, rdi
    xor rdx, rdx
    div rcx
    push rdx        ; push #5
    xor rdx, rdx
    div rcx
    push rdx        ; push #6
    mov rcx, 24
    xor rdx, rdx
    div rcx
    push rdx        ; push #7, the last one
    mov r12, rax
    mov r13, rax
func_print_time_loop_1_start:
    mov rdi, r14
    call year_length
    cmp r13, rax
    jb func_print_time_loop_2_start
    sub r13, rax
    inc r14
    jmp func_print_time_loop_1_start
func_print_time_loop_2_start:
    mov rdi, r14
    mov rsi, r15
    call month_length
    cmp r13, rax
    jb func_print_time_loop_end
    sub r13, rax
    inc r15
    jmp func_print_time_loop_2_start
func_print_time_loop_end:
    ; print time
    mov rdi, [rsp]
    call print_num
    mov rdi, strings + 6
    call sys_print
    mov rdi, [rsp + 8]
    call print_num
    mov rdi, strings + 6
    call sys_print
    mov rdi, [rsp + 16]
    call print_num

    ; print " "
    mov rdi, strings + 2
    call sys_print

    ; print weekday
    mov rax, r12
    mov rcx, 7
    xor rdx, rdx
    div rcx
    lea rdi, [rdx * 4 + weekdays]
    call sys_print

    ; print " "
    mov rdi, strings + 2
    call sys_print

    ; print date
    mov rdi, r15
    inc rdi
    call print_num
    mov rdi, strings + 4
    call sys_print
    mov rdi, r13
    inc rdi
    call print_num
    mov rdi, strings + 4
    call sys_print
    mov rdi, r14
    call print_num

    ; print new line
    mov rdi, strings
    call sys_print

    add rsp, 24
    pop r12
    pop r13
    pop r14
    pop r15
    ret

print_num:          ; print number in rdi
    mov r8, rsp
    sub rsp, 24     ; 21 bytes for local storage, with extra 3 bytes to keep stack aligned
    xor r9, r9
    mov rax, rdi
    mov rcx, 10
func_print_num_loop_start:
    dec r8
    xor rdx, rdx
    div rcx
    add dl, 48
    mov [r8], dl
    inc r9b
    test rax, rax
    jnz func_print_num_loop_start
func_print_num_loop_end:
    dec r8
    mov [r8], r9b
    mov rdi, r8
    call sys_print
    add rsp, 24     ; deallocate local storage, restore rsp
    ret

sys_print:          ; print a string pointed by rdi
    movzx rdx, byte [rdi]
    lea rsi, [rdi + 1]
    mov rdi, 1      ; stdout
    mov rax, 1      ; write
    syscall
    ret

print_num function prints any number in register rdi. If you want to know how I print a number you can look at that function.

print_time is the where the date and time are calculated and printed.

Here is the output, along with output from a C program that prints formatted date & time using asctime(gmtime(time_t t))

$ ./time && ./ct
1608515228
1:47:8 Mon 12/21/2020
Unix time: 1608515228
C library returns: Mon Dec 21 01:47:08 2020

(The last two lines are from the C program)

You can also put any timestamp in line 34 to test it out.

My solution is very naive:

Figure out the total days first, it can be found using time/60/60/24 (and you get your hour/min/sec in this step).
Then figure out the year. I did this by subtracting number of days in a year, year by year. I designed a function to figure out the number of days in any year as my helper function.
Find month of year and day of month. It's almost the same with step 2. I designed a function to figure out the number of days in any month of any year as my helper function.

Edit:

Pasted the whole program to this answer, as several people asked.

For printing integer part, I used implementation from @PeterCordes here:

https://stackoverflow.com/a/46301894

Printing an Integer as a String With At&T Syntax, With Linux System Calls Instead of Printf