Create .SO files on Linux without using PIC (position independent code) (x86 32bit)
What happens if I just drop the
-fPIC
when compiling a.so
-file?
The resulting shared object ELF file would (very probably) be dynamically loaded at semi-random (i.e. unpredictable) page addresses (e.g. because the mmap
syscall will encounter ASLR).
And the linker would produce a huge lot of relocation operations. So the dynamic linker (ld.so
) would have to slowly process a big lot of relocations, so your text segment would have to be rewritten (and won't be efficiently read-only shared with other processes using the same .so
file).
So in practice forgetting the -fPIC
on a shared object (i.e. dynamically linked library) is most often a bad idea, even if it is possible.
Read Drepper's HowTo do Dynamic Shared Libraries paper and Wheeler's Program Library Howto
BTW, position independent code is much more costly on x86 (32 bits) than on x86-64. But it is worth the effort (probably, PIC code is at most 5 to 10% slower than non-PIC on x86 32 bits).
Difference in position-independent code: x86 vs x86-64
I have found a nice and detailed explanation, which boils down to:
- x86-64 uses IP-relative offset to load global data, x86-32 cannot, so it dereferences a global offset.
- IP-relative offset does not work for shared libraries, because global symbols can be overridden, so x86-64 breaks down when not built with PIC.
- If x86-64 built with PIC, the IP-relative offset dereference now yields a pointer to GOT entry, which is then dereferenced.
- x86-32, however, already uses a dereference of a global offset, so it is turned into GOT entry directly.
How does Windows handle multiple DLLs loaded in memory without position independent code?
Windows PE (.exe/.dll) files contain relocation data that allows the loader to adjust addresses as required if the code is loaded at an address other than the intended base address.
The relocation table is essentially just a list of offsets within the binary that need to be adjusted, such that e.g. if a .dll with a base address of 0x100000, is instead loaded at 0x300000, each of the addresses included in the relocation table will have (0x300000 - 0x100000) = 0x200000 added to them.
Further details on the format of the relocation data with the PE file, and the structure of such files generally can be found here: https://docs.microsoft.com/en-us/previous-versions/ms809762(v=msdn.10)#pe-file-base-relocations
linux g++ linking 64 bit shared library code to static libraries
Ok the answer is described in detail here: http://www.technovelty.org/code/c/amd64-pic.html.
The basic gist of the explanation is that the i386 architecture implicitly dereferences the frame pointer for each function (explained on the last paragraph of the linked page). This process incurs some extra overhead so in the new 64-bit architectures, this dereferencing overhead was eliminated as an optimization.
The consequence of this optimization from a linking perspective was that unless 64-bit code is explicitly compiled as position independent code, it will produce code that is hard-coded with offsets for its execution context.
This is an imperfect explanation of the content in the linked page but it suffices for my purposes.
How to access data from Position Independent Code (PIC) in ARM Assembly?
The Elf DATA follows text and the offset to data can be known at link time. You need to add the PC to an offset between known location and data to access the data. See ARM ELF and Linkers and loader chp8 by John Levine.
Obviously, if this is hosted by an OS and loader, you will need to use the conventions of the platform. The following is written for a bare metal system or places where you have the option to choose.
For example,
.global hwa_reset_cb_attach
hwa_reset_cb_attach:
adr r2, 1f ; pc relative value of label '1'
ldr r1, [r2] ; offset between label and data to r1
add r1, r1, r2 ; pc relative '1' label + offset there
str r0, [r1] ; store to corrected address.
bx lr
1: .word hwa_ext_reset_hnd-. ; offset from here to data fixed at link.
This works for PIC code with data following immediately. If you have many occurrences, you can create a macro to access the data. If there are a lot of references, it maybe easier to keep a register loaded with the beginning of the .data section. The static base with the compiler options -msingle-pic-base
and -mpic-register=
reg; static base is typically r9
. So the load time start of data is put in r9
once and only use str rx,[r9, #hwa_ext_reset-start_of_data]
. This is a tactic used by u-boot and you can even relocate the data sections (moving from iram to SDRAM, etc). However, it consumes an extra register.
What are the differences comparing PIE, PIC code and executable on 64-bit x86 platform?
I am confused about the concept Position Independent Executable (PIE) and Position Independent code (PIC), and I guess they are not orthogonal.
The only real difference between PIE
and PIC
is that you are allowed to interpose symbols in PIC
, but not in PIE
. Except for that, they are pretty much equivalent.
You can read about symbol interposition here.
C. a_pie.out contains syntax-identical instructions comparing with a_pic.out. However, the memory addresses of a_pie.out's .text section range from 0x630 to 0xa57, while the same section of a_pic.out ranges from 0x400410 to 0x400817.
It's hard to understand what you find surprising about this.
The PIE
binary is linked just as a shared library, and so its default load address (the .p_vaddr
of the first LOAD
segment) is zero. The expectation is that something will relocate this binary away from zero page, and load it at some random address.
On the other hand, a non-PIE
executable is always loaded at its linked-at address. On Linux, the default address for x86_64
binaries is 0x400000
, and so the .text
ends up not far from there.
Related Topics
Using Named Pipes with Bash - Problem with Data Loss
Microsecond Accurate (Or Better) Process Timing in Linux
Linux Command Output as a Parameter of Another Command
How to Use Awk to Convert All the Lower-Case Letters into Upper-Case
What Does "|" Mean in a Terminal Command Line
Force Gnu Linker to Generate 32 Bit Elf Executables
What Scheduling Algorithms Does Linux Kernel Use
Cpu Affinity Masks (Putting Threads on Different Cpus)
How Is Stack Size of Linux Process Related to Pthread, Fork and Exec
Differencebetween Alpine Docker Image and Busybox Docker Image
Accessing .So Libraries Using Dlopen() Throws Undefined Symbol Error
How to Prevent a Linux User Space Pthread Yielding in Critical Code
How to Execve a Process, Retaining Capabilities in Spite of Missing Filesystem-Based Capabilities
X86 Memory Access Segmentation Fault
Which Segments Are Affected by a Copy-On-Write
How Are Threads/Processes Parked and Woken in Linux, Prior to Futex