Is it possible to use both 64 bit and 32 bit instructions in the same executable in 64 bit Linux?
Switching between long mode and compatibility mode is done by changing CS. User mode code cannot modify the descriptor table, but it can perform a far jump or far call to a code segment that is already present in the descriptor table. I think that in Linux (for example) the required compatibility mode descriptor is present.
Here is sample code for Linux (Ubuntu). Build with
$ gcc -no-pie switch_mode.c switch_cs.s
switch_mode.c:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
extern bool switch_cs(int cs, bool (*f)());
extern bool check_mode();
int main(int argc, char **argv)
{
int cs = 0x23;
if (argc > 1)
cs = strtoull(argv[1], 0, 16);
printf("switch to CS=%02x\n", cs);
bool r = switch_cs(cs, check_mode);
if (r)
printf("cs=%02x: 64-bit mode\n", cs);
else
printf("cs=%02x: 32-bit mode\n", cs);
return 0;
}
switch_cs.s:
.intel_syntax noprefix
.code64
.text
.globl switch_cs
switch_cs:
push rbx
push rbp
mov rbp, rsp
sub rsp, 0x18
mov rbx, rsp
movq [rbx], offset .L1
mov [rbx+4], edi
// Before the lcall, switch to a stack below 4GB.
// This assumes that the data segment is below 4GB.
mov rsp, offset stack+0xf0
lcall [rbx]
// restore rsp to the original stack
leave
pop rbx
ret
.code32
.L1:
call esi
lret
.code64
.globl check_mode
// returns false for 32-bit mode; true for 64-bit mode
check_mode:
xor eax, eax
// In 32-bit mode, this instruction is executed as
// inc eax; test eax, eax
test rax, rax
setz al
ret
.data
.align 16
stack: .space 0x100
How does the 64 bit linux kernel kick off a 32 bit process from an ELF
If the execveat system call is used to start a new process, we first enter fs/exec.c in the kernel source into the SYSCALL_DEFINEx(execveat..) function.
This one then calls these functions:
- do_execveat(..)
- do_execveat_common(..)
- exec_binprm(..)
- search_binary_handler(..)
- exec_binprm(..)
- do_execveat_common(..)
The search_binary_handler iterates over the various binary handlers. In a 64 bit Linux kernel, there will be one handler for 64 bit ELFs and one for 32 bit ELFs. Both handlers are ultimately built from the same source fs/binfmt_elf.c. However, the 32 bit handler is built via fs/compat_binfmt_elf.c which redefines a number of macros before including the source file binfmt_elf.c itself.
Inside binfmt_elf.c, elf_check_arch is called. This is a macro defined in arch/x86/include/asm/elf.h and defined differently in the 64 bit handler vs the 32 bit handler. For 64 bit, it compares with EM_X86_64 ( 62 - defined in include/uapi/ilnux/elf-em.h). For 32 bit, it compares with EM_386 (3) or EM_486 (6) (defined in the same file). If the comparison fails, the binary handler gives up, so we end up with only one of the handlers taking care of the ELF parsing and execution - depending on whether the ELF is 64 bit or 32 bit.
All differences on parsing 32 bit ELFs vs 64 bit ELFs in 64 bit Linux should therefore be found in the file fs/compat_binfmt_elf.c.
The main clue seems to be compat_start_thread. start_thread is redefined to compat_start_thread. This function definition is found in arch/x86/kernel/process_64.c. compat_start_thread then calls start_thread_common with these arguments:
start_thread_common(regs, new_ip, new_sp,
test_thread_flag(TIF_X32)
? __USER_CS : __USER32_CS,
__USER_DS, __USER_DS);
while the normal start_thread function calls start_thread_common with these arguments:
start_thread_common(regs, new_ip, new_sp,
__USER_CS, __USER_DS, 0);
Here we already see the architecture dependent code doing something with CS differently for 64 bit ELFs vs 32 bit ELFs.
Then we have the definitions for __USER_CS and __USER32_CS in arch/x86/include/asm/segment.h:
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
#define __USER32_CS (GDT_ENTRY_DEFAULT_USER32_CS*8 + 3)
and:
#define GDT_ENTRY_DEFAULT_USER_CS 6
#define GDT_ENTRY_DEFAULT_USER32_CS 4
So __USER_CS
is 6*8 + 3 = 51 = 0x33
And __USER32_CS
is 4*8 + 3 = 35 = 0x23
These numbers match what is used for CS in these examples:
- For going from 64 bit mode to 32 bit in the middle of a process
- For going from 32 bit mode to 64 bit in the middle of a process
Since the CPU is not running in real mode, the segment register is not filled with the segment itself, but a 16-bit selector:
From Wikipedia (Protected mode):
In protected mode, the segment_part is replaced by a 16-bit selector, in which the 13 upper bits (bit 3 to bit 15) contain the index of an entry inside a descriptor table. The next bit (bit 2) specifies whether the operation is used with the GDT or the LDT. The lowest two bits (bit 1 and bit 0) of the selector are combined to define the privilege of the request, where the values of 0 and 3 represent the highest and the lowest privilege, respectively.
With the CS value 0x23, bit 1 and 0 is 3, meaning "lowest privilege". Bit 2 is 0, meaning GDT, and bit 3 to bit 15 is 4, meaning we get index 4 from the global descriptor table (GDT).
This is how far I have been able to dig so far.
Am I guaranteed to not encounter non-64-bit instructions if there are no compatibility mode switches in x86-64?
Every sequence of bytes of machine code either decodes as instructions or raises a #UD
illegal-instruction exception. With the CPU in 64-bit mode, that means they're decoded as 64-bit mode instructions if they don't fault. See also Is x86 32-bit assembly code valid x86 64-bit assembly code? (no, not in general).
If it's a normal program emitted by a compiler, it's unlikely there are any illegal instructions in its machine code, unless someone used inline asm, or used your program to disassemble a non-code section. Or an obfuscated program that puts partial instructions ahead of actual jump target, so simple disassemblers get confused and decode with instruction boundaries different from how it will actually run. x86 machine code is a byte stream that is not self-synchronizing.
TL:DR: in a normal program, yes, every sequence of bytes you encounter when disassembling is valid 64-bit-mode instructions.
66
and 67
do not switch modes, they merely switch the operand size for that one instruction. e.g. 66 40 90
is still a REX prefix in 64-bit mode (for the NOP instruction that follows). So it's just a nop
(xchg ax,ax
), not overriding it to decode as it would in 32-bit mode as inc ax
/ xchg eax,eax
.
Try assembling and then disassembling db 0x66, 0x40, 0x90
with nasm -felf32
then with nasm -felf64
to see how that same sequence decodes in 64-bit mode, not like it would in 32-bit mode.
Many instruction encodings are the same in both 32 and 64-bit mode, since they share the same default operand-size (for non-stack instructions). e.g. b8 39 30 00 00 mov eax,0x3039
is the code for mov eax, 12345
in either 32 or 64-bit mode.
(When you say "64-bit instruction", I hope you don't mean 64-bit operand-size, because that's not the case. All operand-sizes from 8 to 64-bit are encodeable in 64-bit mode for most instructions.)
And yes, it's safe to assume that user-space programs don't switch modes by doing a far jmp
. Unless you're on Windows, then the WOW64 DLLs do that for some reason instead of directly calling into the kernel. (Linux has 32-bit user-space use sysenter
or other direct system call).
Inline 64bit Assembly in 32bit GCC C Program
No, this isn't possible. You can't run 64-bit assembly from a 32-bit binary, as the processor will not be in long mode while running your program.
Copying 64-bit code to an executable page will result in that code being interpreted incorrectly as 32-bit code, which will have unpredictable and undesirable results.
Run 64 bit assembly code on a 32 bit operating system
The thing you would most need to know on this is to make sure you make your processor mode transitions correctly. You need to do some basic work to transition from 32 bit mode into 64 bit mode (also called long mode). The biggest issue would be making sure you set up the descriptor table correctly. Some more info is here:
http://www.codeproject.com/Articles/45788/The-Real-Protected-Long-mode-assembly-tutorial-for
Hope this helps.
Can ptrace tell if an x86 system call used the 64-bit or 32-bit ABI?
Interesting, I hadn't realized that there wasn't an obvious smarter way that strace
could use to correctly decode int 0x80
from 64-bit processes. (This is being worked on, see this answer for links to a proposed kernel patch to add PTRACE_GET_SYSCALL_INFO
to the ptrace API. strace
4.26 already supports it on patched kernels.)
Update: now supports per-syscall detection IDK which mainline kernel version added the feature. I tested on Arch Linux with kernel version 5.5 and strace
version 5.5.
e.g. this NASM source assembled into a static executable:
mov eax, 4
int 0x80
mov eax, 60
syscall
gives this trace: nasm -felf64 foo.asm && ld foo.o && strace ./a.out
execve("./foo", ["./foo"], 0x7ffcdc233180 /* 51 vars */) = 0
strace: [ Process PID=1262249 runs in 32 bit mode. ]
write(0, NULL, 0) = 0
strace: [ Process PID=1262249 runs in 64 bit mode. ]
exit(0) = ?
+++ exited with 0 +++
strace
prints a message every time a system call uses a different ABI bitness than previously. Note that the message about runs in 32 bit mode is completely wrong; it's merely using the 32-bit ABI from 64-bit mode. "Mode" has a specific technical meaning for x86-64, and this is not it.
With older kernels
As a workaround, I think you could disassemble the code at RIP and check whether it was the syscall
instruction (0F 05
) or not, because ptrace
does let you read the target process's memory.
But for a security use-case like disallowing some system calls, this would be vulnerable to a race condition: another thread in the syscall process could rewrite the syscall
bytes to int 0x80
after they execute, but before you can peek at them with ptrace
.
You only need to do that if the process is running in 64-bit mode, otherwise only the 32-bit ABI is available. If it's not, you don't need to check. (The vdso page can potentially use 32-bit mode syscall
on AMD CPUs that support it but not sysenter
. Not checking in the first place for 32-bit processes avoids this corner case.) I think you're saying you have a reliable way to detect that at least.
(I haven't used the ptrace API directly, just the tools like strace
that use it. So I hope this answer makes sense.)
Need Interpretation of 64bit Assembly Instruction As Opposed to 32bit
Probably easiest to just build 32-bit executables so you can follow the book more closely, with gcc -m32
. Don't try to port a tutorial to another OS or ISA while you're learning from it, that rarely goes well. (e.g. different calling conventions, not just different sizes.)
And in GDB, use set disassembly-flavor intel
to get GAS .intel_syntax noprefix
disassembly like your book shows, instead of the default AT&T syntax. For objdump, use objdump -drwC -Mintel
. See also How to remove "noise" from GCC/clang assembly output? for more about looking at GCC ouptut.
(See https://stackoverflow.com/tags/att/info vs. https://stackoverflow.com/tags/intel_syntax/info).
Both instructions are a dword store of an immediate 0, to an offset of -4 relative to where the frame pointer is pointing. (This is how it implements i=0
because you compiled with optimization disabled.)
How can I instruct the MSVC compiler to use a 64bit/32bit division instead of the slower 128bit/64bit division?
No current compilers (gcc/clang/ICC/MSVC) will do this optimization from portable ISO C source, even if you let them prove that b < a
so the quotient will fit in 32 bits. (For example with GNU C if(b>=a) __builtin_unreachable();
on Godbolt). This is a missed optimization; until that's fixed, you have to work around it with intrinsics or inline asm.
(Or use a GPU or SIMD instead; if you have the same divisor for many elements see https://libdivide.com/ for SIMD to compute a multiplicative inverse once and apply it repeatedly.)
_udiv64
is available starting in Visual Studio 2019 RTM.
In C mode (-TC
) it's apparently always defined. In C++ mode, you need to #include <immintrin.h>
, as per the Microsoft docs. or intrin.h
.
https://godbolt.org/z/vVZ25L (Or on Godbolt.ms because recent MSVC on the main Godbolt site is not working1.)
#include <stdint.h>
#include <immintrin.h> // defines the prototype
// pre-condition: a > b else 64/32-bit division overflows
uint32_t ScaledDiv(uint32_t a, uint32_t b)
{
uint32_t remainder;
uint64_t d = ((uint64_t) b) << 32;
return _udiv64(d, a, &remainder);
}
int main() {
uint32_t c = ScaledDiv(5, 4);
return c;
}
_udiv64 will produce 64/32 div. The two shifts left and right are a missed optimization.
;; MSVC 19.20 -O2 -TC
a$ = 8
b$ = 16
ScaledDiv PROC ; COMDAT
mov edx, edx
shl rdx, 32 ; 00000020H
mov rax, rdx
shr rdx, 32 ; 00000020H
div ecx
ret 0
ScaledDiv ENDP
main PROC ; COMDAT
xor eax, eax
mov edx, 4
mov ecx, 5
div ecx
ret 0
main ENDP
So we can see that MSVC doesn't do constant-propagation through _udiv64
, even though in this case it doesn't overflow and it could have compiled main
to just mov eax, 0ccccccccH
/ ret
.
UPDATE #2 https://godbolt.org/z/n3Dyp-
Added a solution with Intel C++ Compiler, but this is less efficient and will defeat constant-propagation because it's inline asm.
#include <stdio.h>
#include <stdint.h>
__declspec(regcall, naked) uint32_t ScaledDiv(uint32_t a, uint32_t b)
{
__asm mov edx, eax
__asm xor eax, eax
__asm div ecx
__asm ret
// implicit return of EAX is supported by MSVC, and hopefully ICC
// even when inlining + optimizing
}
int main()
{
uint32_t a = 3 , b = 4, c = ScaledDiv(a, b);
printf( "(%u << 32) / %u = %u\n", a, b, c);
uint32_t d = ((uint64_t)a << 32) / b;
printf( "(%u << 32) / %u = %u\n", a, b, d);
return c != d;
}
Footnote 1: Matt Godbolt's main site's non-WINE MSVC compilers are temporarily(?) gone. Microsoft runs https://www.godbolt.ms/ to host the recent MSVC compilers on real Windows, and normally the main Godbolt.org site relayed to that for MSVC.)
It seems godbolt.ms will generate short links, but not expand them again! Full links are better anyway for their resistance to link-rot.
Related Topics
How to Install a Rpm Package and Its Dependencies Offline
How Bash Handles the Jobs When Logout
How to Read Only First N Bytes from the Http Server Using Linux Command
Why Doesn't "History | Vim" Work
Accessing a Cygwin Symlink from Windows
Decrypt Obfuscated Perl Script
How to Rename Files You Put into a Tar Archive Using Linux 'Tar'
Which Segments Are Affected by a Copy-On-Write
Jenkins Path to Git Windows Master/Linux Slave
What Is the Safest Way to Run an Executable on Linux
Sed Help: Matching and Replacing a Literal "\N" (Not the Newline)
How to Check Hz in the Terminal
How to Load Jna Native Support Library Elasticsearch 6.X
Pycharm Startup Error: Unable to Detect Graphics Environment