Assembly: Read Integer from Stdin, Increment It and Print to Stdout

Assembly: Read integer from stdin, increment it and print to stdout

movl %edi, %ecx    # store input in register %edi
movl $4, %edx # read one byte

This part is all wrong. You can't store the result of read in a register. What that's actually doing is storing the result at the address contained in %edi, which since you haven't set it, is probably somewhere you have no business storing anything. You first need to make room in memory to store the string at. You're also reading four bytes and not one.

I would replace that with something like this

subl $4, %esp
movl %esp, %ecx
movl $4, %edx

This will make room for 4 bytes on the stack, then use the top of the stack as the address to store the string at. You'll also have to modify the arguments for the write syscall to use this address.

Another problem that you'll have to deal with is that stdin and stdout usually deal with text, so what you're reading will probably be a string and not a number, to use it as a number you'll have to convert it and then convert it back before you write it out.

How to read input from STDIN in x86_64 assembly?

First of all : there are no variables in assembly. There are just labels for some kind of data. The data is, by design, untyped - at least in real assemblers, not HLA (e.g. MASM).

Reading from the standard input is achieved by using the system call read. I assume you've already read the post you mentioned and you know how to call system calls in x64 Linux. Assuming that you're using NASM (or something that resembles its syntax), and that you want to store the input from stdin at the address buffer, where you have reserved BUFSIZE bytes of memory, executing the system call would look like this :

xor eax, eax      ; rax <- 0 (syscall number for 'read')
xor edi, edi ; edi <- 0 (stdin file descriptor)
mov rsi, buffer ; rsi <- address of the buffer. lea rsi, [rel buffer]
mov edx, BUFSIZE ; rdx <- size of the buffer
syscall ; execute read(0, buffer, BUFSIZE)

Upon returning, rax will contain the result of the syscall. If you want to know more about how it works, please consult man 2 read. Note that the syscall for read on mac is 0x2000003 instead of 0, so that first line would instead be mov rax, 0x2000003.

Parsing an integer in assembly language is not that simple, though. Since read only gives you plain binary data that appears on the standard input, you need to convert the integer value yourself. Keep in mind that what you type on the keyboard is sent to the application as ASCII codes (or any other encoding you might be using - I'm assuming ASCII here). Therefore, you need to convert the data from an ASCII-encoded decimal to binary.

A function in C for converting such a structure to a normal unsigned int could look something like this:

unsigned int parse_ascii_decimal(char *str,unsigned int strlen)
{
unsigned int ret = 0, mul = 1;
int i = strlen-1;
while(i >= 0)
{
ret += (str[i] & 0xf) * mul;
mul *= 10;
--i;
}
return ret;
}

Converting this to assembly (and extending to support signed numbers) is left as an exercise for the reader. :) (Or see NASM Assembly convert input to integer? - a simpler algorithm only has 1 multiply per iteration, with total = total*10 + digit. And you can check for the first non-digit character as you iterate instead of doing strlen separately, if the length isn't already known.)


Last but not least - the write syscall requires you to always pass a pointer to a buffer with the data that's supposed to be written to a given file descriptor. Therefore, if you want to output a newline, there is no other way but to create a buffer containing the newline sequence.

How should I work with dynamically-sized input in NASM Assembly?

First of all you are generating a 32-bit program, not a 64-bit program. This is no problem as Linux 64-bit can run 32-bit programs if they are either statically linked (this is the case for you) or the 32-bit shared libraries are installed.

Your program contains a real bug: You are reading and writing the "EAX" register from a 1-byte field in RAM:

mov EAX, [num1]

This will normally work on little-endian computers (x86). However if the byte you want to read is at the end of the last memory page of your program you'll get a bus error.

Even more critical is the write command:

mov [result], EAX

This command will overwrite 3 bytes of memory following the "result" variable. If you extend your program by additional bytes:

num1 resb 1
num2 resb 1
result resb 1
newVariable1 resb 1

You'll overwrite these variables! To correct your program you must use the AL (and BL) register instead of the complete EAX register:

mov AL, [num1]
mov BL, [num2]
...
mov [result], AL

Another finding in your program is: You are reading from file handle #1. This is the standard output. Your program should read from file handle #0 (standard input):

mov EAX, 3 ; read
mov EBX, 0 ; standard input
...
int 0x80

But now the answer to the actual question:

The C library functions (e.g. fgets()) use buffered input. Doing it like this would be a bit to complicated for the beginning so reading one byte at a time could be a possibility.

Thinking the way "how would I solve this problem using a high-level language like C". If you don't use libraries in your assembler program you can only use system calls (section 2 man pages) as functions (e.g. you cannot use "fgets()" but only "read()").

In your case a C program reading a number from standard input could look like this:

int num1;
char c;
...
num1 = 0;
while(1)
{
if(read(0,&c,1)!=1) break;
if(c=='\r' || c=='\n') break;
num1 = 10*num1 + c - '0';
}

Now you may think about the assembler code (I typically use GNU assembler, which has another syntax, so maybe this code contains some bugs):

c resb 1
num1 resb 4

...

; Set "num1" to 0
mov EAX, 0
mov [num1], EAX
; Here our while-loop starts
next_digit:
; Read one character
mov EAX, 3
mov EBX, 0
mov ECX, c
mov EDX, 1
int 0x80
; Check for the end-of-input
cmp EAX, 1
jnz end_of_loop
; This will cause EBX to be 0.
; When modifying the BL register the
; low 8 bits of EBX are modified.
; The high 24 bits remain 0.
; So clearing the EBX register before
; reading an 8-bit number into BL is
; a method for converting an 8-bit
; number to a 32-bit number!
xor EBX, EBX
; Load the character read into BL
; Check for "\r" or "\n" as input
mov BL, [c]
cmp BL, 10
jz end_of_loop
cmp BL, 13
jz end_of_loop
; read "num1" into EAX
mov EAX, [num1]
; Multiply "num1" with 10
mov ECX, 10
mul ECX
; Add one digit
sub EBX, '0'
add EAX, EBX
; write "num1" back
mov [num1], EAX
; Do the while loop again
jmp next_digit
; The end of the loop...
end_of_loop:
; Done

Writing decimal numbers with more digits is more difficult!

will work out how far apart two letters are in the alphabet

If you are still stuck, you question requires that you determine the distance between two characters. That brings with it a number of checks you must implement. Though your question is silent on whether you need to handle both uppercase and lowercase distances, unless you are converting everything to one case or the other, you will need to determine whether both characters are of the same case to make the distance within the alphabet between those two characters valid.

Since two characters are involved, you need a way of saving the case of the first for comparison with the case of the second. Here, and in all cases where a simple state is needed, just using a byte (flag) to store the state is about as simple as anything else. For example, a byte to hold 0 if the ASCII character is not an alpha character, 1 if the character is uppercase and 2 if the character is lowercase (or whatever consistent scheme you like)

That way, when you are done with the comparisons and tests, you can simply compare the two flags for equality. If they are equal, you can proceed to subtract one from the other to get the distance (swapping if necessary) and then output the number converting the number to ASCII digits for output.

To test if the character is an uppercase character, similar to isupper() in C, a short function is all that is needed:

; check if character isupper()
; parameters:
; ecx - address holding character
; returns;
; eax - 0 (false), 1 (true)
_isupr:

mov eax, 1 ; set return true

cmp byte[ecx], 'A' ; compare with 'A'
jge _chkZ ; char >= 'A'

mov eax, 0 ; set return false
ret

_chkZ:
cmp byte[ecx], 'Z' ; compare with 'Z'
jle _rtnupr ; <= is uppercase

mov eax, 0 ; set return false

_rtnupr:
ret

You can handle the storage for the local arrays and values you need in a couple of ways. You can either subtract from the current stack pointer to create temporary storage on the stack, or in a slightly more readable way, create labels to storage within the uninitialized segment (.bss) and use the labels as variable names. Your initialized variables go in the .data segment. For example, storage for the program could be:

section .bss

buf resb 32 ; general buffer, used by _prnuint32
bufa resb 8 ; storage for first letter line
bufb resb 8 ; storage for second letter line
lena resb 4 ; length of first letter line
lenb resb 4 ; length of second letter line
nch resb 1 ; number of digit characters in _prnuint32
ais resb 1 ; what 1st char is, 0-notalpha, 1-upper, 2-lower
bis resb 1 ; same for 2nd char

Rather than using numbers sprinkled through your syscall setups, declaring initialized labels for, e.g. stdin and stdout instead of using 0 and 1 make things more readable:

section .data

bufsz: equ 32
babsz: equ 8
tmsg: db "first letter : "
tlen: equ $-tmsg
ymsg: db "second letter: "
ylen: equ $-ymsg
dmsg: db "char distance: "
dlen: equ $-dmsg
emsg: db "error: not alpha or same case", 0xa
elen: equ $-emsg
nl: db 0xa
stdin: equ 0
stdout: equ 1
read: equ 3
write: equ 4
exit: equ 1

Then for your reading your character input, you would have, e.g.

    mov     eax, write          ; prompt for 1st letter
mov ebx, stdout
mov ecx, tmsg
mov edx, tlen
int 80h ; __NR_write

mov eax, read ; read 1st letter line
mov ebx, stdin
mov ecx, bufa
mov edx, babsz
int 80h ; __NR_read

mov [lena], eax ; save no. of character in line

To then check the case of the character input, you could do:

    call    _isupr              ; check if uppercase
cmp eax, 1 ; check return 0-false, 1-true
jne chkalwr ; if not, branch to check lowercase

mov byte[ais], 1 ; set uppercase flag for 1st letter

jmp getb ; branch to get 2nd letter

chkalwr:
call _islwr ; check if lowercase
cmp eax, 1 ; check return
jne notalpha ; 1st letter not alpha char, display error

mov byte[ais], 2 ; set lowercase flag for 1st char

The notalpha: label just being a block to output an error in case the character isn't an alpha character or the case between the two characters don't match:

  notalpha:                     ; show not alpha or not same case error
mov eax, write
mov ebx, stdout
mov ecx, emsg
mov edx, elen
int 80h ; __NR_write

mov ebx, 1 ; set EXIT_FAILURE

After you have completed input and classification of both characters, you now need to verify whether both character are of the same case, if so you need to compute the distance between the characters (swapping if necessary, or using an absolute value) and finally handle the conversion of the distance between them from a numeric value to ASCII digits for output. You can do something similar to the following:

  chkboth:
mov al, byte[ais] ; load flags into al, bl
mov bl, byte[bis]
cmp al, bl ; compare flags equal, else not same case
jne notalpha

mov eax, write ; display distance output
mov ebx, stdout
mov ecx, dmsg
mov edx, dlen
int 80h ; __NR_write

mov al, byte[bufa] ; load chars into al, bl
mov bl, byte[bufb]
cmp al, bl ; chars equal, zero difference

jns getdiff ; 1st char >= 2nd char

push eax ; swap chars
push ebx
pop eax
pop ebx

getdiff:
sub eax, ebx ; subtract 2nd char from 1st char
call _prnuint32 ; output difference

xor ebx, ebx ; set EXIT_SUCCESS
jmp done

Putting it altogether and including the _prnuint32 function below for conversion and output of the numeric distance between characters, you would have:

section .bss

buf resb 32 ; general buffer, used by _prnuint32
bufa resb 8 ; storage for first letter line
bufb resb 8 ; storage for second letter line
lena resb 4 ; length of first letter line
lenb resb 4 ; length of second letter line
nch resb 1 ; number of digit characters in _prnuint32
ais resb 1 ; what 1st char is, 0-notalpha, 1-upper, 2-lower
bis resb 1 ; same for 2nd char

section .data

bufsz: equ 32
babsz: equ 8
tmsg: db "first letter : "
tlen: equ $-tmsg
ymsg: db "second letter: "
ylen: equ $-ymsg
dmsg: db "char distance: "
dlen: equ $-dmsg
emsg: db "error: not alpha or same case", 0xa
elen: equ $-emsg
nl: db 0xa
stdin: equ 0
stdout: equ 1
read: equ 3
write: equ 4
exit: equ 1

section .text

global _start:
_start:

mov byte[ais], 0 ; zero flags
mov byte[bis], 0

mov eax, write ; prompt for 1st letter
mov ebx, stdout
mov ecx, tmsg
mov edx, tlen
int 80h ; __NR_write

mov eax, read ; read 1st letter line
mov ebx, stdin
mov ecx, bufa
mov edx, babsz
int 80h ; __NR_read

mov [lena], eax ; save no. of character in line

call _isupr ; check if uppercase
cmp eax, 1 ; check return 0-false, 1-true
jne chkalwr ; if not, branch to check lowercase

mov byte[ais], 1 ; set uppercase flag for 1st letter

jmp getb ; branch to get 2nd letter

chkalwr:
call _islwr ; check if lowercase
cmp eax, 1 ; check return
jne notalpha ; 1st letter not alpha char, display error

mov byte[ais], 2 ; set lowercase flag for 1st char

getb:
mov eax, write ; prompt for 2nd letter
mov ebx, stdout
mov ecx, ymsg
mov edx, ylen
int 80h ; __NR_write

mov eax, read ; read 2nd letter line
mov ebx, stdin
mov ecx, bufb
mov edx, babsz
int 80h ; __NR_read

mov [lenb], eax ; save no. of character in line

call _isupr ; same checks for 2nd character
cmp eax, 1
jne chkblwr

mov byte[bis], 1

jmp chkboth

chkblwr:
call _islwr
cmp eax, 1
jne notalpha

mov byte[bis], 2

chkboth:
mov al, byte[ais] ; load flags into al, bl
mov bl, byte[bis]
cmp al, bl ; compare flags equal, else not same case
jne notalpha

mov eax, write ; display distance output
mov ebx, stdout
mov ecx, dmsg
mov edx, dlen
int 80h ; __NR_write

mov al, byte[bufa] ; load chars into al, bl
mov bl, byte[bufb]
cmp al, bl ; chars equal, zero difference

jns getdiff ; 1st char >= 2nd char

push eax ; swap chars
push ebx
pop eax
pop ebx

getdiff:
sub eax, ebx ; subtract 2nd char from 1st char
call _prnuint32 ; output difference

xor ebx, ebx ; set EXIT_SUCCESS
jmp done

notalpha: ; show not alpha or not same case error
mov eax, write
mov ebx, stdout
mov ecx, emsg
mov edx, elen
int 80h ; __NR_write

mov ebx, 1 ; set EXIT_FAILURE

done:
mov eax, exit ; __NR_exit
int 80h

; print unsigned 32-bit number to stdout
; arguments:
; eax - number to output
; returns:
; none
_prnuint32:
mov byte[nch], 0 ; zero nch counter

mov ecx, 0xa ; base 10 (and newline)
lea esi, [buf + 31] ; load address of last char in buf
mov [esi], cl ; put newline in buf
inc byte[nch] ; increment char count in buf

_todigit: ; do {
xor edx, edx ; zero remainder register
div ecx ; edx=remainder = low digit = 0..9. eax/=10

or edx, '0' ; convert to ASCII
dec esi ; backup to next char in buf
mov [esi], dl ; copy ASCII digit to buf
inc byte[nch] ; increment char count in buf

test eax, eax ; } while (eax);
jnz _todigit

mov eax, 4 ; __NR_write from /usr/include/asm/unistd_32.h
mov ebx, 1 ; fd = STDOUT_FILENO
mov ecx, esi ; copy address in esi to ecx (addr of 1st digit)
; subtracting to find length.
mov dl, byte[nch] ; length, including the \n
int 80h ; write(1, string, digits + 1)

ret

; check if character islower()
; parameters:
; ecx - address holding character
; returns;
; eax - 0 (false), 1 (true)
_islwr:

mov eax, 1 ; set return true

cmp byte[ecx], 'a' ; compare with 'a'
jge _chkz ; char >= 'a'

mov eax, 0 ; set return false
ret

_chkz:
cmp byte[ecx], 'z' ; compare with 'z'
jle _rtnlwr ; <= is lowercase

mov eax, 0 ; set return false

_rtnlwr:
ret


; check if character isupper()
; parameters:
; ecx - address holding character
; returns;
; eax - 0 (false), 1 (true)
_isupr:

mov eax, 1 ; set return true

cmp byte[ecx], 'A' ; compare with 'A'
jge _chkZ ; char >= 'A'

mov eax, 0 ; set return false
ret

_chkZ:
cmp byte[ecx], 'Z' ; compare with 'Z'
jle _rtnupr ; <= is uppercase

mov eax, 0 ; set return false

_rtnupr:
ret

There are many ways to write the varying pieces and this is intended to fall more on the easier to follow side than the most efficient way it can be written side.

Example Use/Output

After you compile and link the code, e.g.

nasm -f elf -o ./obj/char_dist_32.o char_dist_32.asm
ld -m elf_i386 -o ./bin/char_dist_32 ./obj/char_dist_32.o

You can test with the inputs given in your question and others, e.g.

$ ./bin/char_dist_32
first letter : a
second letter: e
char distance: 4

$ ./bin/char_dist_32
first letter : d
second letter: b
char distance: 2

$ ./bin/char_dist_32
first letter : D
second letter: B
char distance: 2

$ ./bin/char_dist_32
first letter : a
second letter: Z
error: not alpha or same case

Look things over and let me know if you have further questions.

Nasm increment register over 9 can't display

The problem is you are displaying a number as a character.

add ebx, '0'

is a good way to convert a digit to a character for display. It is a bad way to convert a number to a character for display.

You want the following:

; variable in ebx
itoa:
mov eax, ebx
mov ecx, 10
mov esi, buf + 10
xor edx, edx
.nxt
div ecx
add dl, '0'
dec esi
mov [esi], dl
or eax, eax
jnz .nxt
mov edx, buf + 10
sub edx, esi
ret

; pointer in esi, length in edx

;... (bss area)
buf resb 10


Related Topics



Leave a reply



Submit