Create .So Files on Linux Without Using Pic (Position Independent Code) (X86 32Bit)

Create .SO files on Linux without using PIC (position independent code) (x86 32bit)

What happens if I just drop the -fPIC when compiling a .so-file?

The resulting shared object ELF file would (very probably) be dynamically loaded at semi-random (i.e. unpredictable) page addresses (e.g. because the mmap syscall will encounter ASLR).

And the linker would produce a huge lot of relocation operations. So the dynamic linker (ld.so) would have to slowly process a big lot of relocations, so your text segment would have to be rewritten (and won't be efficiently read-only shared with other processes using the same .so file).

So in practice forgetting the -fPIC on a shared object (i.e. dynamically linked library) is most often a bad idea, even if it is possible.

Read Drepper's HowTo do Dynamic Shared Libraries paper and Wheeler's Program Library Howto

BTW, position independent code is much more costly on x86 (32 bits) than on x86-64. But it is worth the effort (probably, PIC code is at most 5 to 10% slower than non-PIC on x86 32 bits).

Difference in position-independent code: x86 vs x86-64

I have found a nice and detailed explanation, which boils down to:

x86-64 uses IP-relative offset to load global data, x86-32 cannot, so it dereferences a global offset.
IP-relative offset does not work for shared libraries, because global symbols can be overridden, so x86-64 breaks down when not built with PIC.
If x86-64 built with PIC, the IP-relative offset dereference now yields a pointer to GOT entry, which is then dereferenced.
x86-32, however, already uses a dereference of a global offset, so it is turned into GOT entry directly.

How does Windows handle multiple DLLs loaded in memory without position independent code?

Windows PE (.exe/.dll) files contain relocation data that allows the loader to adjust addresses as required if the code is loaded at an address other than the intended base address.

The relocation table is essentially just a list of offsets within the binary that need to be adjusted, such that e.g. if a .dll with a base address of 0x100000, is instead loaded at 0x300000, each of the addresses included in the relocation table will have (0x300000 - 0x100000) = 0x200000 added to them.

Further details on the format of the relocation data with the PE file, and the structure of such files generally can be found here: https://docs.microsoft.com/en-us/previous-versions/ms809762(v=msdn.10)#pe-file-base-relocations

linux g++ linking 64 bit shared library code to static libraries

Ok the answer is described in detail here: http://www.technovelty.org/code/c/amd64-pic.html.

The basic gist of the explanation is that the i386 architecture implicitly dereferences the frame pointer for each function (explained on the last paragraph of the linked page). This process incurs some extra overhead so in the new 64-bit architectures, this dereferencing overhead was eliminated as an optimization.

The consequence of this optimization from a linking perspective was that unless 64-bit code is explicitly compiled as position independent code, it will produce code that is hard-coded with offsets for its execution context.

This is an imperfect explanation of the content in the linked page but it suffices for my purposes.

How to access data from Position Independent Code (PIC) in ARM Assembly?

The Elf DATA follows text and the offset to data can be known at link time. You need to add the PC to an offset between known location and data to access the data. See ARM ELF and Linkers and loader chp8 by John Levine.

Obviously, if this is hosted by an OS and loader, you will need to use the conventions of the platform. The following is written for a bare metal system or places where you have the option to choose.

For example,

   .global hwa_reset_cb_attach
hwa_reset_cb_attach:
   adr r2, 1f                ; pc relative value of label '1'
   ldr r1, [r2]              ; offset between label and data to r1
   add r1, r1, r2            ; pc relative '1' label + offset there
   str r0, [r1]              ; store to corrected address.
   bx lr
1: .word hwa_ext_reset_hnd-. ; offset from here to data fixed at link.

This works for PIC code with data following immediately. If you have many occurrences, you can create a macro to access the data. If there are a lot of references, it maybe easier to keep a register loaded with the beginning of the .data section. The static base with the compiler options -msingle-pic-base and -mpic-register=reg; static base is typically r9. So the load time start of data is put in r9 once and only use str rx,[r9, #hwa_ext_reset-start_of_data]. This is a tactic used by u-boot and you can even relocate the data sections (moving from iram to SDRAM, etc). However, it consumes an extra register.

What are the differences comparing PIE, PIC code and executable on 64-bit x86 platform?

I am confused about the concept Position Independent Executable (PIE) and Position Independent code (PIC), and I guess they are not orthogonal.

The only real difference between PIE and PIC is that you are allowed to interpose symbols in PIC, but not in PIE. Except for that, they are pretty much equivalent.

You can read about symbol interposition here.

C. a_pie.out contains syntax-identical instructions comparing with a_pic.out. However, the memory addresses of a_pie.out's .text section range from 0x630 to 0xa57, while the same section of a_pic.out ranges from 0x400410 to 0x400817.

It's hard to understand what you find surprising about this.

The PIE binary is linked just as a shared library, and so its default load address (the .p_vaddr of the first LOAD segment) is zero. The expectation is that something will relocate this binary away from zero page, and load it at some random address.

On the other hand, a non-PIE executable is always loaded at its linked-at address. On Linux, the default address for x86_64 binaries is 0x400000, and so the .text ends up not far from there.

Create .So Files on Linux Without Using Pic (Position Independent Code) (X86 32Bit)