Get String Length in Inline Gnu Assembler

Short-form to get string length in assembly

The first version determines the length at run-time and the second version sets the length at assembly time.

The . in the second expression represents the current address (in the data segment). Then, the expression

hello_world_len = . - hello_world

subtracts the starting address of the string .ascii "hello world\n"indicated by the label hello_world: from the current address(indicated by the .) resulting in the length value hello_world_len.

GAS aarch64 syntax to get .ascii string length

Solution: I needed to use -fno-integrated-as with clang to tell it to use GNU Assembler directly instead of its own built-in integrated assembler (which is suppose to be a drop-in replacement for GAS but apparently it's not). I used the following updated command to compile and run my aarch64 program without issue:

clang -nostdlib -fno-integrated-as -target aarch64-linux-gnu -s hello_world.s -o hello_world.out && ./hello_world.out

Thanks to @Jester and @Nate Eldredge who helped me debug in the comments.

How to print the length of a string in assembly


   ...
mov edx,len ;message length

This loads edx with some kind of numeric value, like 14 in this case. len is "equ" constant symbol, something like #define in C.

   mov  ecx,msg     ;message to write

This loads ecx with address of first character (msg is label, pointing into memory).

   mov  ebx,1       ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
...

msg db 'Hello, world!', 0xa ;our string

This defines 14 bytes of memory, with values 72 ('H'), 101 ('e'), ... . The first byte is pointed at by msg label (memory address of it).

    len equ $ - msg              ;length of our string

This defines constant len visible during compile time. It doesn't define any memory content, so you can't find it in the executable or during runtime (unless used, like by that mov edx,len, then it is compiled into that particular instruction of course).

The definition is $ - msg, the $ in this context works as "current address", where the next defined machine code byte will be compiled, so at this place it is equal to msg + 14 (I hope I did count the number of characters correctly :) ). And ((msg+14) - msg) = 14 = number of bytes defined in the memory between the definition of len and label msg.

Notice how I avoid words as variable or chars, the ASM is more low level, so labels into memory and bytes is more accurate wording and I hope it will help you to recognize the subtle differences.

Your len2 equ $ - len after the len did thus define value len2 as (msg+14) (still there in the memory, no new byte added by len definition) minus len which is 14, so you effectively defined len2 equal to msg.

Then:

   mov  edx,len2     ;message length
mov ecx,len ;message to write
...

Does call sys_write with pointer to string equal to 14 (invalid memory reference, that area of memory is off limits to ordinary user code), and length equal to address msg, which will be on 32b linux very likely some value like 0x80004000, i.e. over 2G of characters to output.

The sys_write naturally doesn't like that, fails, and returns error code in eax.

To output anything to console with sys_write you have to first write it into memory as ASCII (I think UTF8 is supported by default in Ubuntu shell, but too lazy to verify) encoded string, and give the sys_write address of that memory, and length in bytes (with UTF8 string the difference between bytes and chars is important, sys_write is not aware of characters, it works with binary files and bytes, so the length is amount of bytes).

I'm not going to write code to output numbers, as that's several lines long (simplified printf implementation) and SO has several Q+A over this, but I hope my explanation will help you to understand what happened and how it works.

If you are just learning ASM, consider either linking against clib to have printf available, or even better, use debugger, and verify the values straight in registers in debugger, don't bother with string output yet, that's a bit more advanced topic then the initial arithmetic, and basic flow control and operating stack. After you will be more comfortable with how basic instruction works, and how to debug the code, it will be more easier to try to output numbers then.

How would I find the length of a string using NASM?

You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:

cmp byte[ebx], 0

Also, the last push ebx should be pop ebx.

Declaring a fixed-length padded string in GNU assembler

I think I'd use a macro with local labels for this. For example:

    .macro padded_string string, max
1:
.ascii "\string"
2:
.iflt \max - (2b - 1b)
.error "String too long"
.endif

.ifgt \max - (2b - 1b)
.zero \max - (2b - 1b)
.endif

.endm

...used like this:

my_string:
padded_string "Hello world!", 16

(This has C-like behaviour of not adding the terminating 0 if the string is exactly max characters long. To ensure the string is terminated, just change the .ascii to .asciz.)



Related Topics



Leave a reply



Submit