.text section address range of position independent executable
Is there any other way to get the .text section address range during runtime (from the running program)?
Yes: you need to use dl_iterate_phdr and use info->dlpi_addr
to locate the PIE
binary in memory at runtime. The very first call to your callback
will be for the main executable.
How does the Linux kernel determine ld.so's load address?
Read more about ELF, in particular elf(5), and about the execve(2) syscall.
An ELF file may contain an interpreter. elf(5) mentions:
PT_INTERP
The array element specifies the location and
size of a null-terminated pathname to invoke
as an interpreter. This segment type is
meaningful only for executable files (though
it may occur for shared objects). However it
may not occur more than once in a file. If
it is present, it must precede any loadable
segment entry.
That interpreter is practically almost always ld-linux(8) (e.g. with GNU glibc), more precisely (on my Debian/Sid) /lib64/ld-linux-x86-64.so.2
. If you compile musl-libc then build some software with it you'll get a different interpreter, /lib/ld-musl-x86_64.so.1
. That ELF interpreter is the dynamic linker.
The execve(2) syscall is using that interpreter:
If the executable is a dynamically linked ELF executable, the
interpreter named in thePT_INTERP
segment is used to load the needed
shared libraries. This interpreter is typically/lib/ld-linux.so.2
for binaries linked with glibc.
See also Levine's book on Linkers and loaders, and Drepper's paper: How To Write Shared Libraries
Notice that execve
is also handling the shebang (i.e. first line starting with #!
); see the Interpreter scripts section of execve(2). BTW, for ELF binaries, execve
is doing the equivalent of mmap(2) on some segments.
Read also about vdso(7), proc(5) & ASLR. Type cat /proc/self/maps
in your shell.
(I guess, but I am not sure, that the 0x555555554000 address is in the ELF program header of your executable, or perhaps of ld-linux.so
; it might also come from the kernel, since 0x55555555 seems to appear in the kernel source code)
Entry point address of a PIE program
The entry point of a program is always available to it as the address of
the symbol _start
.
main.c
#include <stdio.h>
extern char _start;
int main()
{
printf("&_start = %p\n",&_start);
return 0;
}
Compile and link -no-pie
:
$ gcc -no-pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
w __gmon_start__
0000000000600e10 t __init_array_start
U __libc_start_main@@GLIBC_2.2.5
0000000000400400 T _start
^^^^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x400400
and:
$ ./a.out
&_start = 0x400400
Compile and link -pie
:
$ gcc -pie main.c
Then we see:
$ nm a.out | grep '_start'
0000000000201010 B __bss_start
0000000000201000 D __data_start
0000000000201000 W data_start
w __gmon_start__
0000000000200db8 t __init_array_start
U __libc_start_main@@GLIBC_2.2.5
0000000000000540 T _start
^^^^^^^^^^^^
and:
$ readelf -h a.out | grep Entry
Entry point address: 0x540
and:
$ ./a.out
&_start = 0x560a8dc5e540
^^^
So the PIE program is entered at its nominal entry point 0x540
plus 0x560a8dc5e000
.
What is the -fPIE option for position-independent executables in gcc and ld?
PIE is to support address space layout randomization (ASLR) in executable files.
Before the PIE mode was created, the program's executable could not be placed at a random address in memory, only position independent code (PIC) dynamic libraries could be relocated to a random offset. It works very much like what PIC does for dynamic libraries, the difference is that a Procedure Linkage Table (PLT) is not created, instead PC-relative relocation is used.
After enabling PIE support in gcc/linkers, the body of program is compiled and linked as position-independent code. A dynamic linker does full relocation processing on the program module, just like dynamic libraries. Any usage of global data is converted to access via the Global Offsets Table (GOT) and GOT relocations are added.
PIE is well described in this OpenBSD PIE presentation.
Changes to functions are shown in this slide (PIE vs PIC).
x86 pic vs pie
Local global variables and functions are optimized in pie
External global variables and functions are same as pic
and in this slide (PIE vs old-style linking)
x86 pie vs no-flags (fixed)
Local global variables and functions are similar to fixed
External global variables and functions are same as pic
Note, that PIE may be incompatible with -static
Why .text section is not near by Entry point address ?
The entry point is near the address of the .text
section.
The entry point that you see with $ readelf -h a.out
is the nominal
address statically assigned by the linker, before the program is loaded and relocated.
The address of the .text
section is not the address of main
, as your program
assumes, it is the address of the symbol __executable_start
, and furthermore what the program prints
at runtime is not the nominal address assigned by the linker but the virtual address after
the program is loaded and relocated. See:
$ cat main.c
#include <stdio.h>
extern char __executable_start;
extern char _start;
int main(void)
{
printf("%p: address of `.text` section\n", &__executable_start);
printf("%p: address of `_start` \n", &_start);
printf("%p: address of `main` \n", &main);
return 0;
}
$ gcc -Wall main.c
$ readelf -s a.out | egrep -w '(main|_start|__executable_start)'
34: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
49: 0000000000000000 0 NOTYPE GLOBAL DEFAULT ABS __executable_start
57: 0000000000000540 43 FUNC GLOBAL DEFAULT 14 _start
59: 000000000000064a 83 FUNC GLOBAL DEFAULT 14 main
The nominal address of the .text
section is 0000000000000000. The entry point
is the address of _start
, at offset 0x540 bytes in the .text
section andmain
is at offset 0x64a. The entry point as reported by:
$ readelf -h a.out | grep 'Entry point'
Entry point address: 0x540
is the same. And running the program:
$ ./a.out
0x564c5d350000: address of `.text` section
0x564c5d350540: address of `_start`
0x564c5d35064a: address of `main`
shows the symbols at the same offsets from virtual base address 0x564c5d350000.
Access .data section in Position Independent Code
The effective address format only allows for 32 bit displacement that is sign extended to 64 bit. According to the error message, you need full 64 bits. You can add it via a register, such as:
mov rax, last_tok wrt ..gotoff
mov [rbx + rax], rdi
Also, the call .get_GOT
is a 32 bit solution, in 64 bit mode you have rip relative addressing which you can use there. While the above may compile, but I am not sure it will work. Luckily the simple solution is to use the mentioned rip relative addressing to access your variable thus:
SECTION .data
GLOBAL last_tok
last_tok: dq 0 ; Define a QWORD
SECTION .text
GLOBAL strtok:function
strtok:
mov rcx, [rel last_tok wrt ..gotpc] ; load the address from the GOT
mov rax, [rcx] ; load the old dq value from there
; and/or
mov [rcx], rdi ; store arg at that address
ret
Note that for a private (static) variable you can just use [rel last_tok]
without having to mess with the got at all.
In a PIE executable, compilers use (the equivalent of) [rel symbol]
to access even global variables, on the assumption that the main executable doesn't need or want symbol interposition for its own symbols.
(Symbol interposition, or symbols defined in other shared libraries, is the only reason to load symbol addresses from the GOT on x86-64. But even something like mov rdx, [rel stdin]
is safe in a PIE executable: https://godbolt.org/z/eTf87e - the linker creates a definition of the variable in the executable so it's within range and at a link-time-constant offset for RIP-relative addressing.)
Related Topics
What Is the Linux Built-In Driver Load Order
Linux 3/1 Virtual Address Split
On X64 Linux, Differencebetween Syscall, Int 0X80 and Ret to Exit a Program
Using Named Pipes with Bash - Problem with Data Loss
Filtering Rows Based on Number of Columns with Awk
Pseudo-Random Stack Pointer Under Linux
Is There Any Shortcut to Reference the Path of the First Argument in a Mv Command
X86 Assembly: Before Making a System Call on Linux Should You Save All Registers
How to Set Java Classpath in Linux
Docker Alpine Executable Binary Not Found Even If in Path
Redirecting Output to a File in C
Arm Inline Asm: Exit System Call with Value Read from Memory
Unzip a Bunch of Zips into Their Own Directories
How to See Contents of Hive Orc Files in Linux
How to Install R 3.1.2 on Linux Mint 17.1
Multiple Option Arguments Using Getopts (Bash)
How to Get Docker Container Id from Within the Container with Cgroup V2