How to Read a File Using Shellcode Without Explicitly Mentioning Syscall (0X0F05) with Write Permissions Disabled

How to use a syscall in a shellcode without use `syscall` or `sysenter` for Linux x86-64, avoiding 0x0F bytes?

After some work here, I find one solution.
The idea behind was to construct a shellcode that has the capability to change himself, modify some bytes of machine code before they execute.

So what I did was to load rip into a register and put some bytes after. Then I change those bytes to \x0f\x05 and in this way, I finally executed my shellcode. I could have use a RIP-relative store instead of a RIP-relative LEA, after getting the desired bytes into a register (with mov + xor or shift, or various other ways.)

Linux Shellcode Hello, World!

When you inject this shellcode, you don't know what is at message:

mov ecx, message

in the injected process, it can be anything but it will not be "Hello world!\r\n" since it is in the data section while you are dumping only the text section. You can see that your shellcode doesn't have "Hello world!\r\n":

"\xb8\x04\x00\x00\x00"
"\xbb\x01\x00\x00\x00"
"\xb9\x00\x00\x00\x00"
"\xba\x0f\x00\x00\x00"
"\xcd\x80\xb8\x01\x00"
"\x00\x00\xbb\x00\x00"
"\x00\x00\xcd\x80";

This is common problem in shellcode development, the way to work around it is this way:

global _start

section .text

_start:
jmp MESSAGE ; 1) lets jump to MESSAGE

GOBACK:
mov eax, 0x4
mov ebx, 0x1
pop ecx ; 3) we are poping into `ecx`, now we have the
; address of "Hello, World!\r\n"
mov edx, 0xF
int 0x80

mov eax, 0x1
mov ebx, 0x0
int 0x80

MESSAGE:
call GOBACK ; 2) we are going back, since we used `call`, that means
; the return address, which is in this case the address
; of "Hello, World!\r\n", is pushed into the stack.
db "Hello, World!", 0dh, 0ah

section .data

Now dump the text section:

$ nasm -f elf shellcode.asm
$ ld shellcode.o -o shellcode
$ ./shellcode
Hello, World!
$ objdump -d shellcode

shellcode: file format elf32-i386

Disassembly of section .text:

08048060 <_start>:
8048060: e9 1e 00 00 00 jmp 8048083 <MESSAGE>

08048065 <GOBACK>:
8048065: b8 04 00 00 00 mov $0x4,%eax
804806a: bb 01 00 00 00 mov $0x1,%ebx
804806f: 59 pop %ecx
8048070: ba 0f 00 00 00 mov $0xf,%edx
8048075: cd 80 int $0x80
8048077: b8 01 00 00 00 mov $0x1,%eax
804807c: bb 00 00 00 00 mov $0x0,%ebx
8048081: cd 80 int $0x80

08048083 <MESSAGE>:
8048083: e8 dd ff ff ff call 8048065 <GOBACK>
8048088: 48 dec %eax <-+
8048089: 65 gs |
804808a: 6c insb (%dx),%es:(%edi) |
804808b: 6c insb (%dx),%es:(%edi) |
804808c: 6f outsl %ds:(%esi),(%dx) |
804808d: 2c 20 sub $0x20,%al |
804808f: 57 push %edi |
8048090: 6f outsl %ds:(%esi),(%dx) |
8048091: 72 6c jb 80480ff <MESSAGE+0x7c> |
8048093: 64 fs |
8048094: 21 .byte 0x21 |
8048095: 0d .byte 0xd |
8048096: 0a .byte 0xa <-+

$

The lines I marked are our "Hello, World!\r\n" string:

$ printf "\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a"
Hello, World!

$

So our C wrapper will be:

char code[] = 

"\xe9\x1e\x00\x00\x00" // jmp (relative) <MESSAGE>
"\xb8\x04\x00\x00\x00" // mov $0x4,%eax
"\xbb\x01\x00\x00\x00" // mov $0x1,%ebx
"\x59" // pop %ecx
"\xba\x0f\x00\x00\x00" // mov $0xf,%edx
"\xcd\x80" // int $0x80
"\xb8\x01\x00\x00\x00" // mov $0x1,%eax
"\xbb\x00\x00\x00\x00" // mov $0x0,%ebx
"\xcd\x80" // int $0x80
"\xe8\xdd\xff\xff\xff" // call (relative) <GOBACK>
"Hello wolrd!\r\n"; // OR "\x48\x65\x6c\x6c\x6f\x2c\x20\x57"
// "\x6f\x72\x6c\x64\x21\x0d\x0a"

int main(int argc, char **argv)
{
(*(void(*)())code)();

return 0;
}

Lets test it, using -z execstack to enable read-implies-exec (process-wide, despite "stack" in the name) so we can executed code in the .data or .rodata sections:

$ gcc -m32 test.c -z execstack -o test
$ ./test
Hello wolrd!

It works. (-m32 is necessary, too, on 64-bit systems. The int $0x80 32-bit ABI doesn't work with 64-bit addresses like .rodata in a PIE executable. Also, the machine code was assembled for 32-bit. It happens that the same sequence of bytes would decode to equivalent instructions in 64-bit mode but that's not always the case.)

Modern GNU ld puts .rodata in a separate segment from .text, so it can be non-executable. It used to be sufficient to use const char code[] to put executable code in a page of read-only data. At least for shellcode that doesn't want to modify itself.

Why does the amount of NOPs seem to impact whether shellcode is executed successfully?

Jester's guess that the shellcode's push operations overwrite the instructions at the far end of the shell code regarding my second example was correct:

Checking the current instruction after receiving the SIGILL by setting set disassemble-next-line on and repeating the second example yields

Program received signal SIGILL, Illegal instruction.
0x00007fffffffd8ea in ?? ()
=> 0x00007fffffffd8ea: ff (bad)

The NOP (90) which was at this address previously has been overwritten by ff.

How does this happen? Repeat the second example again and additionally set b 8. At this point in time, the buffer has not been overflown yet.

(gdb) info frame 0
[...]
Saved registers:
rbp at 0x7fffffffd8f0, rip at 0x7fffffffd8f8

The bytes starting at 0x7fffffffd8f8 contain the address which will be returned to after having left the function function. Then, this 0x7fffffffd8f8 address will also be the address from which stack will continue to grow again (there, the first 8 bytes will be stored). Indeed, continuing with gdb and using the si command shows that before the first push instruction of the shellcode the stack pointer points to 0x7fffffffd900:

(gdb) si
0x00007fffffffd8da in ?? ()
=> 0x00007fffffffd8da: 53 push %rbx
(gdb) x/8x $sp
0x7fffffffd900: 0xf8 0xd9 0xff 0xff 0xff 0x7f 0x00 0x00

... and when the push instruction is executed the bytes are stored at address 0x7fffffffd8f8:

(gdb) si
0x00007fffffffd8db in ?? ()
=> 0x00007fffffffd8db: 48 89 e7 mov %rsp,%rdi
(gdb) x/8bx $sp
0x7fffffffd8f8: 0x2f 0x62 0x69 0x6e 0x2f 0x73 0x68 0x00

Continuing with this, one can see that after the last push instruction of the shellcode the content of push is pushed on the stack at address 0x7fffffffd8e8:

0x00007fffffffd8e3 in ?? ()
=> 0x00007fffffffd8e3: 57 push %rdi
0x00007fffffffd8e4 in ?? ()
=> 0x00007fffffffd8e4: 48 89 e6 mov %rsp,%rsi
(gdb) x/8bx $sp
0x7fffffffd8e8: 0xf8 0xd8 0xff 0xff 0xff 0x7f 0x00 0x00

However, this is also the place where the last byte for the instruction of syscall is stored (see the x/80bx buff output in the question for the second example). Therefore, the syscall and thus the shellcode cannot be executed successfully. This doesn't happen in the first example since then the bytes pushed onto the stack grow right til the end of the shellcode (without overriding a byte of it): 8 bytes for the 8 NOPs ("\x90"x8) + 8 bytes for the saved base pointer + 8 bytes for the return address provide enough space for the 3 push operations.

How do I compile C without anything but my code in the binary?

Shortly:

  1. strip -s does not remove the sections but only overrides them with 0 (and thus the file size remains the same.
  2. There are a lot of program headers that we do not need in this case (in order to handle exceptions etc.)
  3. There is a default alignment in the binary, which makes the start of it be at least 4000 (and we do not need it).

Detailed

First, we can improve it slightly if we compile the binary statically:

$ gcc -nostdlib -static nolib.c -o static_output
$ strip -s static_output # strip -s in order to strip all (not helping here)
$ ls -lh static_output
-rwxrwxrwx 1 graul graul 8.7K Jan 17 22:59 static_output

Lets look over our elf now:
$ readelf -h static_output
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 8368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 7
Section header string table index: 6

Looks like there is more than 8kn before the start of sections header!
Let's look at what this is made of:

$ readelf -e static_output
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 8368 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 7
Section header string table index: 6

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.propert NOTE 00000000004001c8 000001c8
0000000000000020 0000000000000000 A 0 0 8
[ 2] .note.gnu.build-i NOTE 00000000004001e8 000001e8
0000000000000024 0000000000000000 A 0 0 4
[ 3] .text PROGBITS 0000000000401000 00001000
000000000000001b 0000000000000000 AX 0 0 1
[ 4] .eh_frame PROGBITS 0000000000402000 00002000
0000000000000038 0000000000000000 A 0 0 8
[ 5] .comment PROGBITS 0000000000000000 00002038
000000000000002a 0000000000000001 MS 0 0 1
[ 6] .shstrtab STRTAB 0000000000000000 00002062
000000000000004a 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000020c 0x000000000000020c R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x000000000000001b 0x000000000000001b R E 0x1000
LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000
0x0000000000000038 0x0000000000000038 R 0x1000
NOTE 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
0x0000000000000020 0x0000000000000020 R 0x8
NOTE 0x00000000000001e8 0x00000000004001e8 0x00000000004001e8
0x0000000000000024 0x0000000000000024 R 0x4
GNU_PROPERTY 0x00000000000001c8 0x00000000004001c8 0x00000000004001c8
0x0000000000000020 0x0000000000000020 R 0x8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10

Section to Segment mapping:
Segment Sections...
00 .note.gnu.property .note.gnu.build-id
01 .text
02 .eh_frame
03 .note.gnu.property
04 .note.gnu.build-id
05 .note.gnu.property
06

This is weird, as we called the strip function which was supposed to remove this section from our elf. If we look over the response in https://unix.stackexchange.com/questions/267070/why-doesnt-strip-remove-section-headers-from-elf-executables
We can see that though not mentioned specifically, strip does not remove these parts from our binary, but only removes their content (which does not help for our case).

We can use strip -R in order to remove these sections completely, the biggest one here is the ".eh_frame" segment (which is not needed for our case, look over Why GCC compiled C program needs .eh_frame section? to look over it).

$ strip -R .eh_frame static_output
$ ls -lh static_output
-rwxrwxrwx 1 graul graul 4.6K Jan 17 23:22 static_output*

Just to be clear, there is no reason to not strip the rest of the unwanted sections as well:

$ strip -R .eh_frame -R .note.gnu.property -R .note.gnu.build-id -R .note.gnu.property static_output
-rwxrwxrwx 1 graul graul 4.4K Jan 17 23:31 static_output

Half the size! But still not good enough. looks like there is a big program header we need to remove.

looks like gcc inserts these sections without our desire:

$ gcc -c -nostdlib -static nolib.c -o nolib.o
$ ls -l nolib.o
-rwxrwxrwx 1 graul graul 1376 Jan 17 23:40 nolib.o
$ strip -R .data -R .bss -R .comment -R .note.GNU-stack -R .note.GNU-stack -R .note.gnu.propery -R .eh_frame -R .real.eh_frame -R .symtab -R.strtab -R.shstrtab nolib.o
$ ls -l nolib.o
-rwxrwxrwx 1 graul graul 424 Jan 17 23:41 nolib.o

But this is not an elf, if we run now

$ld nolib.o -o ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 4760 Jan 17 23:55 ld_output

In the program ld there is a flag to remove the alignment between our sections (which is almost all of our size).

$ ld -n -static nolib.o -o ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 928 Jan 17 23:57 ld_output
$strip -R .note.gnu.property ld_output
$ls -l ld_output
-rwxrwxrwx 1 graul graul 472 Jan 17 23:58 ld_output

Which is a drastic improvement (though of course a lot of more work could be done).

Why do x86-64 Linux system calls modify RCX, and what does the value mean?

The system call return value is in rax, as always. See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

Note that sys_brk has a slightly different interface than the brk / sbrk POSIX functions; see the C library/kernel differences section of the Linux brk(2) man page. Specifically, Linux sys_brk sets the program break; the arg and return value are both pointers. See Assembly x86 brk() call use. That answer needs upvotes because it's the only good one on that question.


The other interesting part of your question is:

I do not quite understand the value in the rcx register in this case

You're seeing the mechanics of how the syscall / sysret instructions are designed to allow the kernel to resume user-space execution but still be fast.

syscall doesn't do any loads or stores, it only modifies registers. Instead of using special registers to save a return address, it simply uses regular integer registers.

It's not a coincidence that RCX=RIP and R11=RFLAGS after the kernel returns to your user-space code. The only way for this not to be the case is if a ptrace system call modified the process's saved rcx or r11 value while it was inside the kernel. (ptrace is the system call gdb uses). In that case, Linux would use iret instead of sysret to return to user space, because the slower general-case iret can do that. (See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for some walk-through of Linux's system-call entry points. Mostly the entry points from 32-bit processes, not from syscall in a 64-bit process, though.)


Instead of pushing a return address onto the kernel stack (like int 0x80 does), syscall:

  • sets RCX=RIP, R11=RFLAGS (so it's impossible for the kernel to even see the original values of those regs before you executed syscall).

  • masks RFLAGS with a pre-configured mask from a config register (the IA32_FMASK MSR). This lets the kernel disable interrupts (IF) until it's done swapgs and setting rsp to point to the kernel stack. Even with cli as the first instruction at the entry point, there'd be a window of vulnerability. You also get cld for free by masking off DF so rep movs / stos go upward even if user-space had used std.

    Fun fact: AMD's first proposed syscall / swapgs design didn't mask RFLAGS, but they changed it after feedback from kernel developers on the amd64 mailing list (in ~2000, a couple years before the first silicon).

  • jumps to the configured syscall entry point (setting CS:RIP = IA32_LSTAR). The old CS value isn't saved anywhere, I think.

  • It doesn't do anything else, the kernel has to use swapgs to get access to an info block where it saved the kernel stack pointer, because rsp still has its value from user-space.

So the design of syscall requires a system-call ABI that clobbers registers, and that's why the values are what they are.



Related Topics



Leave a reply



Submit