Nasm Gcc Command Error with Subprogram as Seperate File

nasm gcc command error with subprogram as seperate file

Quoting from the NASM manual:

GLOBAL is the other end of EXTERN: if one module declares a symbol as EXTERN and refers to it, then in order to prevent linker errors, some other module must actually define the symbol and declare it as GLOBAL.

So if get_int is defined in get_int.asm you should put the global get_int in get_int.asm, and declare it as extern in any other files that want to use get_int.

Using a user defined entry point in assembly x86-64 nasm when compiling with gcc

If you are writing in assembly and not using the C runtime library, then you can call your entry point whatever you want. You tell the linker what the name of the entry point is, using either the gcc command line option -Wl,--entry=<symbol> or the ENTRY directive in the linker script. The linker writes the address of this entry point in the executable file.

If you are using the C runtime library, then the entry point in the executable file needs to be the entry point of the C runtime library, so that it can perform initialization. This entry point is typically called crt0. When crt0 finishes initializing, it calls main, so in this case, you cannot change the name.

Mac OS X 32-bit nasm assembly program using main and scanf/printf?

You're asking a lot of questions about your code, and you really don't understand the assembly code that's there.

Firstly, because of the way you're writing your code, the main routine is going to be the entry point of a C style program. Because of the way that mac os x linkages work; you're going to have to name it _main to match the name of the symbol being looked for by the linker as the default program entry point when it pulls in /usr/lib/crt1.o when producing the executable (if you do an nm of the file you'll see an entry like: U _main. Similarly, all the library routines start with a leading underscore so you have to use that prefix if you want to use them.

Secondly, the MAC OS calling convention requires a 16 byte alignment of the stack for all calls which means that you have to ensure that the stack pointer is aligned relevantly at each point. At the entry point of the main routine you already know that you're misaligned due to the return address being stored in the stack for returning from main. This means that if you want to make even a single call you're going to have to move the stack down by at least 12 bytes to make the call.

Armed with that piece of information, we're going to omit futzing around with the ebp, and just use esp exclusively for the purposes of the code.

This is assuming a prolog of:

bits 32
extern _printf
global _main

section .data
    message db "Hello world!", 10, 0

section .text
_main:

On entry into _main, realign the stack:

sub esp, 12

Next we store the address of the message into the address pointed to by esp:

mov dword[esp], message

Then we call printf:

call _printf

Then we restore the stack:

add esp, 12

Set the return code for main, and return:

mov eax, 0
ret

The ABI for MAC OS X uses eax as the return code for the routine as long as it fits in a register. Once you've compiled and linked the code:

nasm -f macho -o test.o test.asm 
ld -o test -arch i386 test.o -macosx_version_min 10.7 -lc /usr/lib/crt1.o

It runs and prints the message, exiting with a 0.

Next we're going to play around with your scanning and printing example.

Firstly, scanf only scans, you can't have a prompt in there; it's simply not going to work, so you have to split the prompt from the scanning. We've already shown you how to do the print, and now what we need to show is the scanf.

Set up the variables in the data section:

scan_string     db  "%d", 0
limit           dd  0

First store the address of scan_string in esp, and then store the address of limit in esp + 4, then call scanf:

mov dword[esp], scan_string
mov dword[esp + 4], limit
call _scanf

We now should have the value that we scanned stored in the limit memory location.

Next to print this message:

output_string   db  "Value %d", 10, 0

Next we put the address of output_string on the stack:

mov dword[esp], output_string

Read the value of the limit address into the eax register and put it into esp + 4 - i.e. the second parameter for printf:

mov eax, [limit]
mov dword[esp + 4], eax
call _printf

Next, we're calling exit, so we have to store the exit code in the stack and invoke the _exit function - this is different from the simple print variant as we're actually invoking exit, rather than simply returning.

mov dword[esp], 0
call _exit

As for some of the questions:

Why the alignment?

Because that's how Mac OS X does it

Why isn't push good enough?

It is, but we aligned the stack at the start of the routine and an aligned stack is a functioning stack, by pushing and popping you're messing with the alignment. This is one of the purposes behind using the ebp register rather than the esp register.

If we were to use the ebp register, the function prolog would look like:

push ebp
mov ebp, esp
sub esp, 8 ; in this case to obtain alignment

and the function epilog would look like:

add esp, 8
pop ebp

You can put in symmetric pusha/popa calls in there as well, but if you're not using the registers, why complicate the stack.

A better overview of the 32bit function calling mechanism is on the OS X Developer guide, The ABI function call guide gives the far more detail on the hows of parameter passing and returning. It's based on the AT&T System V ABI for the i386, with a few listed differences:

Different rules for returning structures
The stack is 16-byte aligned at the point of function calls
Large data types (larger than 4 bytes) are kept at their natural alignment
Most floating-point operations are carried out using the SSE unit instead of the x87 FPU, except when operating on long double values. (The IA-32 environment defaults to 64-bit internal precision for the x87 FPU.)

Running gcc's steps manually, compiling, assembling, linking

These are the different stages using gcc

gcc -E  --> Preprocessor, but don't compile
gcc -S  --> Compile but don't assemble
gcc -c  --> Preprocess, compile, and assemble, but don't link
gcc with no switch will link your object files and generate the executable

Why is my assembly code exiting prematurely despite it working in a separate file?

It is normal because after the first comparison will branch to _true (because !Z && N==O) and continue until the end (_else and _exit).

     .global _start
_start:
     mov r0, #0
     mov r1, #10

     cmp r1, #5
     bgt _true          @ It will branch to true because !Z && N==O

But, answering to your question, you have to jump to _exit after _true and _else (or branch with link (bl)), because if not, you will continue running instructions secuentially. In addition, if you don't want use branch with link (bl), you can put _exit at the top and b _exit in _true and _else.

e.g.

     .global _start

_exit:
       mov r7, #1
       swi 0

_start:
     mov r0, #0
     mov r1, #10

     cmp r1, #5
     bgt _true

     cmp r1, #10 @ x < 10
     bge _else
     add r0, #3

    cmp r1, #9 @ *here is where the program messes up and exits*
    ble _exit
    cmp r1, #10
    bgt _exit
    mov r0, #7
_true:
       add r0, #1
       b _exit

_else:
       add r0, #5
       b _exit

NASM Subroutine Call Segmentation Fault

Well, you will need debugger, as there are several problems in your code, and it's a bit too large for me to run it in head accurately (like 100% guarding stack/etc), so only few things I see:

In CHAR_CHECK: loop test the length during loop, so you don't overwrite .bss memory when somebody gives you too long string. You can move the length check right under CHAR_LOOP:, when edi is out of bounds, stop looping.

Also add the null character before storing N (swap those two mov lines), as N is stored right after X in memory, so with 31 (?) long input string you will overwrite N to 0 (this particularly is not exploitable, but the copy of long string may be).

jl/jg used in length check, but length is unsigned, so jb/ja would made more sense to me (not a bug, signed test >=1 && <= 30 will fail at the same time as unsigned one, just doesn't feel right if you have programming OCD).

good/bad char test - you can make it a bit shorter by doing only two tests ('0' <= char && char <= '2'), as ['0', '1', '2'] are values [48, 49, 50].

And now more serious stuff follows.

In I/J loop you don't reset J, so logic of your inner loop will be flawed.

push dword [X] I don't think this does what you think it does. The address of string is X, [X] is content of memory (chars of string). (this will make the sufcmp code to segfault early, when it will try to access "address" '0010', which is not legal.

In the swap, for example mov edx, dword [y + edi] ... you increment edi by 1, but Y is defined as array of dwords, so everywhere the indexing should be edi*4.

cmp esi, dword [N-1] ; if i = N-1 uhm, nope, it will compare esi with value at address N-1, so if [N] contains 16 and ahead of it is single zero byte, the cmp will compare esi with value 4096 (memory at N-1 is: 00 10 00 00 00, so [N] == 0x00000010 and [N-1] == 0x00001000).

mov eax, dword [X] ; move address of X to eax - no, lea would do what the comment says. mov will fetch content of at address X.

add eax, [y + esi] - again using +-1 indexing with dword array.

And you forget to call print_string, only new line is called.

You can rewrite that part as:

mov eax,[y + esi*4]   ; eax = Y[i]
lea eax,[X + eax]     ; eax = address X + Y[i]

And, as I'm cruel and tired, I kept the my biggest worry for last note.

I don't think this will work at all. Your bubble sort is iterating over original X string (well, it's not, but once you fix the argument issues with correct addresses, it will).

Every time. So you keep shuffling content of Y array according to original string in every iteration.

The most important part of my answer is the first sentence. You absolutely need debugger. If you felt like the language made some sense to you up till now, your source doesn't prove that. Actually I can see a glimpses of understanding, so you are basically right, but you would have to be total prodigy whizz kid to be able to pull this without debugger within reasonable time. I would grade you only as above-average, maybe good, but far away from prodigious premises.

If you still want to go without debugger, change technique. Don't write so much of code without compiling + running it. Do it by much much much smaller steps, and keep displaying all sort of things, to be sure your new 3 lines of code do what they should. For example if you would create empty stub for sufcmp just printing the string from pointer, it would segfault right after trying to access the string.

That would maybe give you better hint, than when almost final app code is segfaulting, so instead of hunting problem on recent 10 lines you have 50+ to reason about.

EDIT: algorithm suggestion:

Unless you really must use bubble sort, I would avoid that, and do the brute-force dumb "count" variant of sort.

i:[0,N): count[i] = countif(j:[0,N): X[j] < X[i])
i:[0,N): j:[0,N): if (i == count[j]) print X[j]

I hope you will be able to decipher it... it means that I would calculate for every suffix how many suffixes are "smaller" lexicographically, ie. full O(N²) loopy loop (which is in reality N^3, because comparing strings is another O(N) ... but who cares with N=30, even N⁵ would be bearable).

Then to print suffixes in correct order you simply search the count array again and again, first time for 0 smaller-count (that's the smallest one), then for 1, ... etc. Till you print all of them.

Actually you may loop through all suffixes, calculate how many are smaller, and put index of that suffix into sorted[smaller_count], so for printing you will just loop through sorted array from 0 to N-1, no searching involved.

How to write hello world in assembler under Windows?

NASM examples.

Calling libc stdio printf, implementing int main(){ return printf(message); }

; ----------------------------------------------------------------------------
; helloworld.asm
;
; This is a Win32 console program that writes "Hello, World" on one line and
; then exits.  It needs to be linked with a C library.
; ----------------------------------------------------------------------------

    global  _main
    extern  _printf

    section .text
_main:
    push    message
    call    _printf
    add     esp, 4
    ret
message:
    db  'Hello, World', 10, 0

Then run

nasm -fwin32 helloworld.asm
gcc helloworld.obj
a

There's also The Clueless Newbies Guide to Hello World in Nasm without the use of a C library. Then the code would look like this.

16-bit code with MS-DOS system calls: works in DOS emulators or in 32-bit Windows with NTVDM support. Can't be run "directly" (transparently) under any 64-bit Windows, because an x86-64 kernel can't use vm86 mode.

org 100h
mov dx,msg
mov ah,9
int 21h
mov ah,4Ch
int 21h
msg db 'Hello, World!',0Dh,0Ah,'$'

Build this into a .com executable so it will be loaded at cs:100h with all segment registers equal to each other (tiny memory model).

Good luck.

Woes of implementing selection sort in x86 NASM or YASM Assembly

For 64-bit code there are 16 general purpose registers: RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8, R9, R10, R11, R12, R13, R14, R15.

Of these, RSP has a special purpose and can only be used for that purpose (current stack address). The RBP register is typically used by compilers for keeping track of the stack frame (excluding "-fomit-frame-pointer" possibilities), but you aren't a compiler and can use it for anything you like.

This means that out of 15 registers that you could be using, you're only using 6 of them.

If you actually did run out of registers, then you could shift some things to stack space. For example:

foo:
    sub rsp,8*5        ;Create space for 5 values
%define .first   rsp
%define .second  rsp+8
%define .third   rsp+8*2
%define .fourth  rsp+8*3
%define .fifth   rsp+8*4
%define .first   rsp+8*5

    mov [.first],rax
    mov [.second],rax
    mov rax,[.first]
    add rax,[.second] 
    mov [.third],rax
...
    add rsp,8*5        ;Clean up stack
    ret

Hopefully you can see that you could have hundreds of values on the stack, and use a few registers for (temporarily) holding those values if you need to. Normally you'd work out which values are used most often (e.g. in the inner loop) and try to use registers for them and use the stack for the least frequently used variables. However, for 64-bit code (where there are 8 more registers you can use) it's very rare to run out of registers, and if you do it's probably a sign that you need to split the routine into multiple routines.

Nasm Gcc Command Error with Subprogram as Seperate File