sys_execve system call from Assembly
The execve
system call is being called, but you are indeed passing it bad parameters.
(You can see this by running your executable using strace
.)
There are three problems:
.ascii
does not 0-terminate the string. (You might get lucky, as there is nothing following it in your.data
section in this example, but that's not guaranteed...) Add a 0, or use.asciz
(or.string
) instead.movl file_to_run, %edi
moves the value pointed to by thefile_to_run
symbol into%edi
, i.e. the first 4 bytes of the string (0x6e69622f
). The address of the string is just the value of the symbol itself, so you need to use the$
prefix for literal values:movl $file_to_run, %edi
. Similarly, you need to saymovl $file_to_run, %ebx
a few lines further down. (This is a common source of confusion between AT&T syntax and Intel syntax!)The parameters are placed on the stack in the wrong order:
-0x8(%ebp)
is a lower address than-0x4(%ebp)
. So the address of the command string should be written to-0x8(%ebp)
, the 0 should be written to-0x4(%ebp)
, and theleal
instruction should beleal -8(%ebp), %ecx
.
Fixed code:
.section .data
file_to_run:
.asciz "/bin/sh"
.section .text
.globl main
main:
pushl %ebp
movl %esp, %ebp
subl $0x8, %esp # array of two pointers. array[0] = file_to_run array[1] = 0
movl $file_to_run, %edi
movl %edi, -0x8(%ebp)
movl $0, -0x4(%ebp)
movl $11, %eax # sys_execve
movl $file_to_run, %ebx # file to execute
leal -8(%ebp), %ecx # command line parameters
movl $0, %edx # environment block
int $0x80
leave
ret
Execute system command in x64 bit assembly?
Huge thanks to @PeterCordes.
In 64 bit architecture, you can visit
unistd_64.h
to find codes for system calls. which in this case forexecve
the system call were 59.strace helped a lot. with a bit of debugging, found out that the the executing file location
/bin//ls
should be stored atrdi
and the arguments/bin//ls ./
should be stored atrsi
.
complete working code is below:
SECTION .data
SECTION .text
global main
main:
xor rax, rax
xor rdx, rdx
push rdx
mov rcx, 0x736c2f2f6e69622f ; "sl/nib/"
push rcx
mov rdi, rsp
;push rdx
mov rcx, 0x2f2e
push rcx
mov rsi, rsp
push rax
push rsi
push rdi
mov rsi, rsp
mov rax, 59
syscall
mov rax, 60
syscall
Linux Sys_execve wont run in assembly
You're getting errno=EFAULT
(0xfffffff2 = -14, 14 = EFAULT), indicating that you're passing a bad address to the syscall.
SYS_execve
takes 3 arguments, but the second and third are NULL-terminated arrays of pointers to arguments/environment strings, not a single string of nul-separated components. Interpreting the string as an array of pointers, means the first 4 bytes of the string are interpreted as the address of the first string, but it's not a valid address, hence EFAULT.
SYSCALL_DEFINE3(execve,
const char __user *, filename,
const char __user *const __user *, argv,
const char __user *const __user *, envp)
Confusion with system call
Try to use objdump -DS
or objdump -sS
to include the address 0x80e99f0 in your dump.
Local example:
0806bf70 <__execve>:
...
806bf82: ff 15 10 a3 0e 08 call *0x80ea310
At address 0x80ea310 (shown with objdump -sS
):
80ea310 10ea0608 60a60908 07000000 7f030000
10ea0608
is address 0x806ea10 little-endian in memory.
You will then see, that the address of _dl_sysinfo_int80
is located there:
0806ea10 <_dl_sysinfo_int80>:
806ea10: cd 80 int $0x80
806ea12: c3 ret
which calls the software interrupt 0x80 (executes the syscall) and returns to the caller then.
call *0x80ea310 is therefore really calling 0x806ea10 (dereferencing a pointer)
Why do these syscalls do nothing?
A linux syscall reads the syscall number from the EAX register under x86. By moving C9 into AL, you're only setting the bottom 8 bits of EAX, leaving garbage in the rest. That's why strace calls what you did "syscall_4294957257": It's FFFFD8C9 in hex. (note that it ends with C9.) It's also why you see that errno 38
, which translates to ENOSYS
"function not implemented", or in other words, "I don't know what you're asking me to do".
The solution is to to clear eax
before setting al
, like so:
31 c0 xor eax,eax
b0 c9 mov al,0xc9
How to invoke a system call via syscall or sysenter in inline assembly?
First of all, you can't safely use GNU C Basic asm("");
syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1)
means as part of an asm()
statement.
You also need asm volatile
because that's not implicit for Extended asm
statements with 1 or more output operands.
I'm going to show you how to execute system calls by writing a program that writes Hello World!
to standard output by using the write()
system call. Here's the source of the program without an implementation of the actual system call :
#include <sys/types.h>
ssize_t my_write(int fd, const void *buf, size_t size);
int main(void)
{
const char hello[] = "Hello world!\n";
my_write(1, hello, sizeof(hello));
return 0;
}
You can see that I named my custom system call function as my_write
in order to avoid name clashes with the "normal" write
, provided by libc. The rest of this answer contains the source of my_write
for i386 and amd64.
i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80
in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER
, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER
was never meant as a direct replacement of the int 0x80
API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10
in your code is for), which contains all the code supporting the SYSENTER
instruction. There's quite a lot of it because of how the instruction actually works.
If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid
and clock_gettime
are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter
could be.
It's much easier to use the slower int $0x80
to invoke the 32-bit ABI.
// i386 Linux
#include <asm/unistd.h> // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
: "memory" // the kernel dereferences pointer args
);
return ret;
}
As you can see, using the int 0x80
API is relatively simple. The number of the syscall goes to the eax
register, while all the parameters needed for the syscall go into respectively ebx
, ecx
, edx
, esi
, edi
, and ebp
. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h
.
Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2)
.
The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the memory
parameter, which means that the instruction listed in the instruction list references memory (via the buf
parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
amd64
Things look different on the AMD64 architecture which sports a new instruction called SYSCALL
. It is very different from the original SYSENTER
instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL
, actually, and adapting the old int 0x80
to the new SYSCALL
is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).
In this case, the number of the system call is still passed in the register rax
, but the registers used to hold the arguments now nearly match the function calling convention: rdi
, rsi
, rdx
, r10
, r8
and r9
in that order. (syscall
itself destroys rcx
so r10
is used instead of rcx
, letting libc wrapper functions just use mov r10, rcx
/ syscall
.)
// x86-64 Linux
#include <asm/unistd.h> // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
ssize_t ret;
asm volatile
(
"syscall"
: "=a" (ret)
// EDI RSI RDX
: "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
: "rcx", "r11", "memory"
);
return ret;
}
(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The "0"(callnum)
matching constraint could be written as "a"
because operand 0 (the "=a"(ret)
output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.
Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.
Create an arg array for execve on the stack
You can put the argv
array onto the stack and load the address of it into rsi
. The first member of argv
is a pointer to the program name, so we can use the same address that we load into rdi
.
xor edx, edx ; Load NULL to be used both as the third
; parameter to execve as well as
; to push 0 onto the stack later.
push "-aal" ; Put second argument string onto the stack.
mov rax, rsp ; Load the address of the second argument.
mov rcx, "/bin//ls" ; Load the file name string
push rdx ; and place a null character
push rcx ; and the string onto the stack.
mov rdi, rsp ; Load the address of "/bin//ls". This is
; used as both the first member of argv
; and as the first parameter to execve.
; Now create argv.
push rdx ; argv must be terminated by a NULL pointer.
push rax ; Second arg is a pointer to "-aal".
push rdi ; First arg is a pointer to "/bin//ls"
mov rsi, rsp ; Load the address of argv into the second
; parameter to execve.
This also corrects a couple of other problems with the code in the question. It uses an 8-byte push for the file name, since x86-64 doesn't support 4-byte push, and it makes sure that the file name has a null terminator.
This code does use a 64-bit push with a 4-byte immediate to push "-aal" since the string fits in 4 bytes. This also makes it null terminated without needing a null byte in the code.
I used strings with doubled characters as they are in the question to avoid null bytes in the code, but my preference would be this:
mov ecx, "X-al" ; Load second argument string,
shr ecx, 8 ; shift out the dummy character,
push rcx ; and write the string to the stack.
mov rax, rsp ; Load the address of the second argument.
mov rcx, "X/bin/ls" ; Load file name string,
shr rcx, 8 ; shift out the dummy character,
push rcx ; and write the string onto the stack.
Note that the file name string gets a null terminator via the shift, avoiding the extra push. This pattern works with strings where a doubled character wouldn't work, and it can be used with shorter strings, too.
Related Topics
Linux Usb: Turning the Power on and Off
Replace Whitespaces with Tabs in Linux
What Is Raw Socket in Socket Programming
Using Grep to Search for a String That Has a Dot in It
Control Mouse by Writing to /Dev/Input/Mice
Redirect All Output to File Using Bash on Linux
Convert All File Extensions to Lower-Case
Colors with Unix Command "Watch"
How to Read a .Properties File Which Contains Keys That Have a Period Character Using Shell Script
What Is a Sysroot Exactly and How to Create One
Does Creating a Symbolic Link to Another Symbolic Link Have Any Side-Effects
Use Ssh to Start a Background Process on a Remote Server, and Exit Session
Linux Wildcard Usage in Cp and Mv
Difference Between $@ and $* in Bash Script