How to Do 'Ret' Instruction from Code at _Start in MACos? Linux

Can I do `ret` instruction from code at _start in MacOS? Linux?

MacOS Dynamic Executables

When you are using MacOS and link with:

ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo

you are getting a dynamically loaded version of your code. _start isn't the true entry point, the dynamic loader is. The dynamic loader as one of its last steps does C/C++/Objective-C runtime initialization, and then calls your specified entry point specified with the -e option. The Apple documentation about Forking and Executing the Process has these paragraphs:

A Mach-O executable file contains a header consisting of a set of load commands. For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program. If you use Xcode, this is always /usr/lib/dyld, the standard OS X dynamic linker.

When you call the execve routine, the kernel first loads the specified program file and examines the mach_header structure at the start of the file. The kernel verifies that the file appear to be a valid Mach-O file and interprets the load commands stored in the header. The kernel then loads the dynamic linker specified by the load commands into memory and executes the dynamic linker on the program file.

The dynamic linker loads all the shared libraries that the main program links against (the dependent libraries) and binds enough of the symbols to start the program. It then calls the entry point function. At build time, the static linker adds the standard entry point function to the main executable file from the object file /usr/lib/crt1.o. This function sets up the runtime environment state for the kernel and calls static initializers for C++ objects, initializes the Objective-C runtime, and then calls the program’s main function

In your case that is _start. In this environment where you are creating a dynamically linked executable you can do a ret and have it return back to the code that called _start which does an exit system call for you. This is why it doesn't crash. If you review the generated object file with gobjdump -Dx foo you should get:

start address 0x0000000000000000

Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start
0000000000000000 g 01 UND 00 0100 dyld_stub_binder

Disassembly of section .text:

0000000000001fff <_start>:
1fff: c3 retq

Notice that start address is 0. And the code at 0 is dyld_stub_binder. This is the dynamic loader stub that eventually sets up a C runtime environment and then calls your entry point _start. If you don't override the entry point it defaults to main.

MacOS Static Executables

If however you build as a static executable, there is no code executed before your entry point and ret should crash since there is no valid return address on the stack. In the documentation quoted above is this:

For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program.

A statically built executable doesn't use the dynamic loader dyld with crt1.o embedded in it. CRT = C runtime library which covers C++/Objective-C as well on MacOS. The processes of dealing with dynamic loading are not done, C/C++/Objective-C initialization code is not executed, and control is transferred directly to your entry point.

To build statically drop the -lc (or -lSystem) from the linker command and add -static option:

ld foo.o -macosx_version_min 10.12.0 -e _start -o foo -static

If you run this version it should produce a segmentation fault. gobjdump -Dx foo produces

start address 0x0000000000001fff

Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
1 LC_THREAD.x86_THREAD_STATE64.0 000000a8 0000000000000000 0000000000000000 00000198 2**0
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start

Disassembly of section .text:

0000000000001fff <_start>:
1fff: c3 retq

You should notice start_address is now 0x1fff. 0x1fff is the entry point you specified (_start). There is no dynamic loader stub as an intermediary.


Under Linux when you specify your own entry point it will segmentation fault whether you are building as a static or shared executable. There is good information on how ELF executables are run on Linux in this article and the dynamic linker documentation. The key point that should be observed is that the Linux one makes no mention of doing C/C++/Objective-C runtime initialisation unlike the MacOS dynamic linker documentation.

The key difference between the Linux dynamic loader ( and the MacOS one (dynld) is that the MacOS dynamic loader performs C/C++/Objective-C startup initialization by including the entry point from crt1.o. The code in crt1.o then transfers control to the entry point you specified with -e (default is main). In Linux the dynamic loader makes no assumption about the type of code that will be run. After the shared objects are processed and initialized control is transferred directly to the entry point.

Stack Layout at Process Creation

FreeBSD (on which MacOS is based) and Linux share one thing in common. When loading 64-bit executables the layout of the user stack when a process is created is the same. The stack for 32-bit processes is similar but pointers and data are 4 bytes wide, not 8.

Sample Image

Although there isn't a return address on the stack, there is other data representing the number of arguments, the arguments, environment variables, and other information. This layout is not the same as what the main function in C/C++ expects. It is part of the C startup code to convert the stack at process creation to something compatible with the C calling convention and the expectations of the function main (argc, argv, envp).

I wrote more information on this subject in this Stackoverflow answer that shows how a statically linked MacOS executable can traverse through the program arguments passed by the kernel at process creation.

Return values in main vs _start

TL:DR: function return values and system-call arguments use separate registers because they're completely unrelated.

When you compile with gcc, it links CRT startup code that defines a _start. That _start (indirectly) calls main, and passes main's return value (which main leaves in EAX) to the exit() library function. (Which eventually makes an exit system call, after doing any necessary libc cleanup like flushing stdio buffers.)

See also Return vs Exit from main function in C - this is exactly analogous to what you're doing, except you're using _exit() which bypasses libc cleanup, instead of exit(). Syscall implementation of exit()

An int $0x80 system call takes its argument in EBX, as per the 32-bit system-call ABI (which you shouldn't be using in 64-bit code). It's not a return value from a function, it's the process exit status. See Hello, world in assembly language with Linux system calls? for more about system calls.

Note that _start is not a function; it can't return in that sense because there's no return address on the stack. You're taking a casual description like "return to the OS" and conflating that with a function's "return value". You can call exit from main if you want, but you can't ret from _start.

EAX is the return-value register for int-sized values in the function-calling convention. (The high 32 bits of RAX are ignored because main returns int. But also, $? exit status can only get the low 8 bits of the value passed to exit().)


  • Why am I allowed to exit main using ret?
  • What happens with the return value of main()?
  • where goes the ret instruction of the main
  • What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains why you should use syscall, and shows some of the kernel side of what happens inside the kernel after a system call.

Why am I allowed to exit main using ret?

C main is called (indirectly) from CRT startup code, not directly from the kernel.

After main returns, that code calls atexit functions to do stuff like flushing stdio buffers, then passes main's return value to a raw _exit system call. Or exit_group which exits all threads.

You make several wrong assumptions, all I think based on a misunderstanding of how kernels work.

  • The kernel runs at a different privilege level from user-space (ring 0 vs. ring 3 on x86). Even if user-space knew the right address to jump to, it can't jump into kernel code. (And even if it could, it wouldn't be running with kernel privilege level).

    ret isn't magic, it's basically just pop %rip and doesn't let you jump anywhere you couldn't jump to with other instructions. Also doesn't change privilege level1.

  • Kernel addresses aren't mapped / accessible when user-space code is running; those page-table entries are marked as supervisor-only. (Or they're not mapped at all in kernels that mitigate the Meltdown vulnerability, so entering the kernel goes through a "wrapper" block of code that changes CR3.)

    Virtual memory is how the kernel protects itself from user-space. User-space can't modify page tables directly, only by asking the kernel to do it via mmap and mprotect system calls. (And user-space can't execute privileged instructions like mov cr3, rax to install new page tables. That's the purpose of having ring 0 (kernel mode) vs. ring 3 (user mode).)

  • The kernel stack is separate from the user-space stack for a process. (In the kernel, there's also a small kernel stack for each task (aka thread) that's used during system calls / interrupts while that user-space thread is running. At least that's how Linux does it, IDK about others.)

  • The kernel doesn't literally call user-space code; The user-space stack doesn't hold any return address back into the kernel. A kernel->user transition involves swapping stack pointers, as well as changing privilege levels. e.g. with an instruction like iret (interrupt-return).

    Plus, leaving a kernel code address anywhere user-space can see it would defeat kernel ASLR.

Footnote 1: (The compiler-generated ret will always be a normal near ret, not a retf that could return through a call gate or something to a privileged cs value. x86 handles privilege levels via the low 2 bits of CS but nevermind that. MacOS / Linux don't set up call gates that user-space can use to call into the kernel; that's done with syscall or int 0x80 instructions.)

In a fresh process (after an execve system call replaced the previous process with this PID with a new one), execution begins at the process entry point (usually labeled _start), not at the C main function directly.

C implementations come with CRT (C RunTime) startup code that has (among other things) a hand-written asm implementation of _start which (indirectly) calls main, passing args to main according to the calling convention.

_start itself is not a function. On process entry, RSP points at argc, and above that on the user-space stack is argv[0], argv[1], etc. (i.e. the char *argv[] array is right there by value, and above that the envp array.) _start loads argc into a register and puts pointers to the argv and envp into registers. (The x86-64 System V ABI that MacOS and Linux both use documents all this, including the process-startup environment and the calling convention.)

If you try to ret from _start, you're just going to pop argc into RIP, and then code-fetch from absolute address 1 or 2 (or other small number) will segfault. For example, Nasm segmentation fault on RET in _start shows an attempt to ret from the process entry point (linked without CRT startup code). It has a hand-written _start that just falls through into main.

When you run gcc main.c, the gcc front-end runs multiple other programs (use gcc -v to show details). This is how the CRT startup code gets linked into your process:

  • gcc preprocesses (CPP) and compiles+assembles main.c to main.o (or a temporary file). On MacOS, the gcc command is actually clang which has a built-in assembler, but real gcc really does compile to asm and then run as on that. (The C preprocessor is built-in to the compiler, though.)
  • gcc runs something like ld -dynamic-linker /lib64/ -pie /usr/lib/Scrt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtbeginS.o main.o -lc -lgcc /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtendS.o. That's actually simplified a lot, with some of the CRT files left out, and paths canonicalized to remove ../../lib parts. Also, it doesn't run ld directly, it runs collect2 which is a wrapper for ld. But anyway, that statically links in those .o CRT files that contain _start and some other stuff, and dynamically links libc (-lc) and libgcc (for GCC helper functions like implementing __int128 multiply and divide with 64-bit registers, in case your program uses those).



.global _rbp

mov rax, rbp

This is not allowed, ...

The only reason that doesn't assemble is because you tried to declare .text: as a label, instead of using the .text directive. If you remove the trailing : it does assemble with clang (which treats .intel_syntax the same as .intel_syntax noprefix).

For GCC / GAS to assemble it, you'd also need the noprefix to tell it that register names aren't prefixed by %. (Yes you can have Intel op dst, src order but still with %rsp register names. No you shouldn't do this!) And of course GNU/Linux doesn't use leading underscores.

Not that it would always do what you want if you called it, though! If you compiled main without optimization (so -fno-omit-frame-pointer was in effect), then yes you'd get a pointer to the stack slot below the return address.

And you definitely use the value incorrectly. (*p)-4; loads the saved RBP value (*p) and then offsets by four 8-byte void-pointers. (Because that's how C pointer math works; *p has type void* because p has type void **).

I think you're trying to get your own return address and re-run the call instruction (in main's caller) that reached main, eventually leading to a stack overflow from pushing more return addresses. In GNU C, use void * __builtin_return_address (0) to get your own return address.

x86 call rel32 instructions are 5 bytes, but the call that called main was probably an indirect call, using a pointer in a register. So it might be a 2-byte call *%rax or a 3-byte call *%r12, you don't know unless you disassemble your caller. (I'd suggest single-stepping by instructions (GDB / LLDB stepi) off the end of main using a debugger in disassembly mode. If it has any symbol info for main's caller, you'll be able to scroll backward and see what the previous instruction was.

If not, you might have to try and see what looks sane; x86 machine code can't be unambiguously decoded backwards because it's variable-length. You can't tell the difference between a byte within an instruction (like an immediate or ModRM) vs. the start of an instruction. It all depends on where you start disassembling from. If you try a few byte offsets, usually only one will produce anything that looks sane.

   asm("movq %rax, 0"); //Exit code is 11, so now it should be 0

This is a store of RAX to absolute address 0, in AT&T syntax. This of course segfaults. exit code 11 is from SIGSEGV, which is signal 11. (Use kill -l to see signal numbers).

Perhaps you wanted mov $0, %eax. Although that's still pointless here, you're about to call through your function pointer. In debug mode, the compiler might load it into RAX and step on your value.

Also, writing a register in an asm statement is never safe when you don't tell the compiler which registers you're modifying (using constraints).

   printf("Main: %p\n", main);
printf("&Main: %p\n", &main); //WTF

main and &main are the same thing because main is a function. That's just how C syntax works for function names. main isn't an object that can have its address taken. & operator optional in function pointer assignment

It's similar for arrays: the bare name of an array can be assigned to a pointer or passed to functions as a pointer arg. But &array is also the same pointer, same as &array[0]. This is true only for arrays like int array[10], not for pointers like int *ptr; in the latter case the pointer object itself has storage space and can have its own address taken.

Nasm segmentation fault on RET in _start

Because ret is NOT the proper way to exit a program in Linux, Windows, or Mac!!!!

_start is not a function, there is no return address on the stack because there is no user-space caller to return to. Execution in user-space started here (in a static executable), at the process entry point. (Or with dynamic linking, it jumped here after the dynamic linker finished, but same result).

On Linux / OS X, the stack pointer is pointing at argc on entry to _start (see the i386 or x86-64 System V ABI doc for more details on the process startup environment); the kernel puts command line args into user-space stack memory before starting user-space. (So if you do try to ret, EIP/RIP = argc = a small integer, not a valid address. If your debugger shows a fault at address 0x00000001 or something, that's why.)

For Windows it is ExitProcess and Linux is is system call -
int 80H using sys_exit, for x86 or using syscall using 60 for 64-bit or a call to exit from the C Library if you are linking to it.

32-bit Linux (i386)

%define  SYS_exit  1   ; call number __NR_exit from <asm/unistd_32.h>

mov eax, SYS_exit ; use the NASM macro we defined earlier
xor ebx, ebx ; ebx = 0 exit status
int 80H ; _exit(0)

64-bit Linux (amd64)

mov     rax, 60        ; SYS_exit aka __NR_exit from asm/unistd_64.h
xor rdi, rdi ; edi = 0 first arg to 64-bit system calls
syscall ; _exit(0)

(In GAS you can actually #include <sys/syscall.h> or <asm/unistd.h> to get the right numbers for the mode you're assembling a .S for, but NASM can't easily use the C preprocessor.
See Polygot include file for nasm/yasm and C for hints.)

32-bit Windows (x86)

push    0
call ExitProcess

Or Windows/Linux linking against the C Library

; pass an int exit_status as appropriate for the calling convention
; push 0 / xor edi,edi / xor ecx,ecx
call exit

(Or for 32-bit x86 Windows, call _exit, because C names get prepended with an underscore, unlike in x86-64 Windows. The POSIX _exit function would be call __exit, if Windows had one.)

Windows x64's calling convention includes shadow space which the caller has to reserve, but exit isn't going to return so it's ok to let it step on that space above its return address. Also, 16-byte stack alignment is required by the calling convention before call exit except for 32-bit Windows, but often won't actually crash for a simple function like exit().

call exit (unlike a raw exit system call or libc _exit) will flush stdio buffers first. If you used printf from _start, use exit to make sure all output is printed before you exit, even if stdout is redirected to a file (making stdout full-buffered, not line-buffered).

It's generally recommended that if you use libc functions, you write a main function and link with gcc so it's called by the normal CRT start functions which you can ret to.

See also

  • Syscall implementation of exit()
  • How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?

Defining main as something that _start falls through into doesn't make it special, it's just confusing to use a main label if it's not like a C main function called by a _start that's prepared to exit after main returns.

Mac assembly: segfault with libc exit

@fuz is almost certainly correct: you crash because you didn't initialize libc. There's probably a NULL pointer somewhere in the data structures that exit(3) checks before actually exiting. e.g. it flushes stdout if needed, and it runs any functions registered with atexit(3).

If you don't want it to do all that work, then either make the sys_exit system call directly with a syscall instruction, or call the thin _exit(2) libc wrapper function for it. (The basics of the situation will be the same as on Linux, because exit(3) vs. _exit(2) are standardized by POSIX: see Syscall implementation of exit().

I think the tutorial you're following mostly looks good, but perhaps some older version of OS X allowed libc functions (including printf?!?) to be used without calling any libc init functions. Or else they didn't test their code after an edit to the build commands. (Assuming they tested at all, maybe it was with dynamic linking, which would work.)

OS X prefixes symbol names in assembly with an _, so use call __exit (two underscores) to call _exit(). (e.g. call _printf calls the C printf function).

_exit(2) probably won't crash if you call it without initializing libc, but it's still a bad idea to call any libc functions without having called libc init functions first. Better to make the system call directly (see later in the tutorial), or even better, build it with gcc hello_asm.S -o hello_asm to make sure libc is initialized. Then you can follow the rest of the tutorial, including the printf.

Don't call your Mach-O entry point _main or main in a static executable. CRT startup code hasn't run yet. The usual convention is to call it _start for the process entry point.

(Note that OS X puts the CRT start code in the dynamic linker, so the "entry point" in a dynamically-linked executable is the C main function, unlike in Linux where dynamic executables can avoid the CRT startup code.

libc would be initialized for you if you linked with gcc exit2.o -o exit instead of ld, which you're using to do the equivalent of gcc -static -nostartfiles.)

How can I modify the stack with nasm, x86_64, linux functions (using `ret` keyword)?


Remember that call is technically a push rip, and ret is technically a pop rip, so you pretty much messed up your stack in your example because you inadvertently pop it in the wrong spot.

More of an answer

Although you should probably properly learn how calling conventions work, I'm going to attempt an answer to briefly "soften" the idea, and for the fun of learning.

Abstractly speaking, in order to have functions, you must have something called stack frames, or else you'd have a pretty hard time managing local variables and getting ret to work. On x86_64, a stack frame is pretty much composed of a few things, in order.

  • The function arguments, if there are any0,
    • If some arguments were passed in registers, this may be omitted.
  • the return address,
    • The call instruction will push this onto the stack.
    • It's on you to make sure the ret instruction will pop this off the stack.
  • optionally a frame pointer,
    • If your stack grows by a dynamic amount, this can keep track of the start of the frame.
    • Otherwise, if you know the stack size ahead of time, it's optional.
  • and then your local state on the stack.

As long as execution stays within your little assembly space, you are technically free to pass arguments however you want1 as long as you are aware of how instructions like call and ret manipulate the stack. The simplest way, in my opinion, is to make it sort of stack-based, so that your compiler would not need to worry about register allocation as much2.

To keep things simple, I'd suggest using something like the x86 convention but applied to x86_64, as you seem to be using 64-bit code. That is to say, the caller function would push all of its arguments onto the stack (usually in reverse order), and then call the callee function. For example, for a 3-argument function, your stack would end up looking something like this (beware that the top of the stack is actually on the bottom).

| argument 2 |
| argument 1 |
| argument 0 |
| return address |
| local state |
| ... |

Also, I noticed that you never really made use of the rsp register. Depending on the design of your compiler, you technically could get away with this. Stack machines like the JVM rely solely on pushes and pops, anyway, I believe. As long as your pushes and pops match (especially call and ret, which act as a special push and pop), you should be fine.

0 Windows actually allocates at least an extra 32 bytes here for argument spilling, but you can probably ignore that in this case.

1 There are specific calling conventions that dictate how parameters are passed from caller to callee and back. Beyond your programming exercise, I highly recommend reading about how they work, so that your compiler can output code that can easily be called by and easily call functions that weren't emitted by your compiler, or go the Forth way as Nate mentioned.

2 goto 1

How do I return to mainline code from a signal handler in assembler?

A simple ret will return so as to reattempt the faulting instruction. When using sigaction to register the signal handler with the flag SA_SIGINFO, the third argument is a pointer to a ucontext_t that contains the saved state, which may be altered.

What is the first variables of my stack program?

You're targetting modern MacOS, hence ld will emit dyld assisted LC_MAIN load command for entry point handling.
The [rsp] is the return address to libdyld _start function epilogue:

mov        edi, eax ; pass your process return code as 1st argument under System V 64bit ABI
call exit ;from libSystem

What it means you don't need to exit your process through a system call like you do in:

; return (0)
mov rax, 0x2000001
mov rdi, 0x0


xor eax,eax

is enough (and that's what compilers will emit btw).

Your buffer will also get flushed in the ret / libdyld approach. That's irrelevant for your system write call you are doing, but could be for a printf for instance.

Here's a great article that describes lots of details.

Capture input in assembly arm 64 bit mac os

First you need to move msg to a writeable segment:

msg: .ds 4 //memory buffer for keyboard input

.text // keep everything else in __TEXT

Related Topics

Leave a reply