Resolve Relative Relocations in Partial Link

Prelinking only has effect on relative relocations

OK, it turned out the problem was that some libraries were not correctly prelinked, as seen in my original question, in which e.g. libc.so wasn't loaded at the correct load address.

Seems like prelinking is a all-or-nothing approach: If one of the dependencies of an executable isn't correctly prelinked or can't be loaded at the preferred address, then neither the executable nor the libraries will be able to take advantage of prelinked symbol relocations, and only take advantage of prelinked relative relocations.

Whether a library was correctly prelinked should, in addition to the above, be checked with:

# readelf --dynamic usr/lib/someLibrary.so 
[..]
0x6ffffdf5 (GNU_PRELINKED)              2014-12-15T14:16:56
[..]

# readelf --program-headers usr/lib/someLibrary.so
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  [..]
  LOAD           0x000000 0x44bf0000 0x44bf0000 0xb56d4 0xb56d4 R E 0x8000
  [..]

The address outputted by prelink --verbose, readelf --program-headers and cat /proc/PID/maps need to match.

My mistake was that I didn't check readelf - if I had done so, I would have realized that some libraries on the target device were not prelinked, because an error in the buildsystem caused the prelinked versions to be overwritten with the non-prelinked ones...

After fixing my buildsystem problem, the normal relocations indeed went down to 0:

# LD_DEBUG=statistics /path/to/binary
  5089:                      number of relocations: 0
  5089:           number of relocations from cache: 19477
  5089:             number of relative relocations: 0

How are PE Base Relocations build up?

Neither options you indicated entirely correct/true.

This excellent tutorial on How to inject code in a PE file shows that the actual IMAGE_BASE_RELOCATION structure is:

typedef struct _IMAGE_BASE_RELOCATION {
  DWORD   VirtualAddress;
  DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION, *PIMAGE_BASE_RELOCATION;

Section 5.2 of this Microsoft Portable Executable and Common Object File Format Specification describe the structure. The SizeOfBlock-8 actually indicates how many WORD TypeOffset follow after the VirtualAddress and SizeOfBlock.

I think you would also be interested in Table 7 of the tutorial, which shows the structure of the blocks from the relocation table. I'll copy-paste the table here for quick-reference.

Sample Image

Absolute addressing using PC-relative addressing in a relocatable program. What would the modification record be like?

I think you'd want a relocation record that gives the absolute address, leaving it up to the dynamic linker / runtime-fixup-applier to calculate the right relative displacement to reach that absolute address from the location where this relocation applies.

It might not be that simple. e.g. x86-64 RIP-relative addressing is relative to the end of the instruction, but for example mov [RIP+rel32], imm32 is encoded with the immediate after the rel32 part of the addressing mode. But if there's no immediate, it's usually at the end of the instruction. So the point the addressing mode is relative to might not be a fixed position wrt. the postion you have to apply it.

But we can leave that up to the assembler, and let it add some offset to account for that different in base so relocation will wind up targeting the right absolute address.

This keeps the relocation record compact, the same size as a "normal" one: just a location where to apply it, a type, and a 32-bit absolute address or whatever width the machine uses. (You could even just encode the absolute address into the spot where the PC-relative offset goes, if that's always wide enough.)

Or the right relative offset to reach the desired absolute address relative to some base, e.g. 0. That's what GNU/Linux ELF .o files use, and so do PIE executables. That also solves the problem of biasing the relocation to account for any variable distance between where it's stored and where it's relative to.

So for example to relocate the whole image from 0 to 0x10000, you just subtract 0x10000 from every absolute-target relative relocation.

BTW, you can do this in practice for i386 relative call instructions on GNU / Linux with GAS. Near calls on x86 always use a call rel32 encoding, but good assemblers on platforms that support the necessary relocations let you write an absolute target and feed the linker the right relocations for you. (Call an absolute pointer in x86 machine code)

# foo.s
.globl _start
_start:
nop                 # some padding so the base of the call address doesn't start at 0
nop
call 0x123456       # relative call to that absolute address

Build with gcc -m32 -c foo.s, disassemble with objdump -drwC -Mintel

foo.o:     file format elf32-i386

Disassembly of section .text:

00000000 <_start>:
   0:   90                      nop
   1:   90                      nop
   2:   e8 52 34 12 00          call   123459 <_start+0x123459> 3: R_386_PC32   *ABS*

readelf -a foo.o shows the relocations section as follows:

Relocation section '.rel.text' at offset 0x94 contains 1 entry:
 Offset     Info    Type            Sym.Value  Sym. Name
00000003  00000002 R_386_PC32

The target address isn't part of this relocation record; it's encoded into the existing machine code. This works for i386 but maybe not always for x86-64. Building without -m32 gives us:

Relocation section '.rela.text' at offset 0xb0 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000003  000000000002 R_X86_64_PC32                        123452

Either way, note that "offset" is where to apply it (2 NOP bytes plus the opcode byte of "call" means the rel32 starts 3 bytes into the section, starting from a base of 0.) The 0x123452 in the x86-64 relocation is the real target ...6 minus the length of the rel32 (4 bytes).

The "name + addend" column header makes sense for relocations you'd get from targeting a symbol name with an offset. e.g. mov eax, [global_array + 12] needs to load from 12 bytes past wherever the linker puts the start of global_array.

Also note that we're looking at the un-linked .o file, not a linked executable. x86-64 ELF shared objects don't allow runtime fixups for 32-bit absolute targets; the whole object might be randomly located more than +-2GiB away. (That's why I used -m32).

It seems that 32-bit PIE executables don't properly support this either. Probably because the position-independent is right in the name. :P Building with gcc -m32 -pie -nostdlib foo.s gave us an encoding of e8 4f 24 12 00 which works for an image base of 0x1000. (Even from within GDB after setting a breakpoint and starting the PIE executable to let relocations be applied.)

But if we build with gcc -m32 -shared -nostdlib foo.s to make a shared library, text relocations are still allowed:

$ gcc -m32 -shared -nostdlib foo.s && objdump -drwC -Mintel a.out

a.out:     file format elf32-i386

Disassembly of section .text:

00001000 <_start>:
    1000:       90                      nop
    1001:       90                      nop
    1002:       e8 4f 24 12 00          call   123456 <_DYNAMIC+0x1204ae>

Note that disassembly used the relocation info to compute a correct final call target.

But I think that's actually broken because readelf output doesn't show any relocations. Executing it still fails (to even jump to the right address); we get 0xf7ffc002 <+2>: e8 4f 24 12 00 call 0xf811e456

Anyway, failures of runtime relocation on GNU/Linux are just because I'm abusing text relocations, I think. The relocation records for .o object files do totally work.

How to reverse R_X86_64_JUMP_SLOT relocations?

You can sidestep the problem by compiling with -fno-plt so you don't have any PLT entries at all, and the associated lazy-binding machinery doesn't come into play.

GCC and clang will use call *printf@GOTPCREL(%rip) which forces early binding: resolving the GOT entries on process startup. This makes each call more efficient, and some distros (e.g. Arch GNU/Linux) are compiling their packages this way already.

TL:DR: This is generally a good option, it's just not on by default (yet) in current GCC and clang distro configs.

Resolve Relative Relocations in Partial Link

Prelinking only has effect on relative relocations

How are PE Base Relocations build up?

Absolute addressing using PC-relative addressing in a relocatable program. What would the modification record be like?

How to reverse R_X86_64_JUMP_SLOT relocations?

Related Topics

Leave a reply