Differencebetween .Got and .Got.Plt Section

What is the difference between .got and .got.plt section?

My previous comment turns to be right:

I think .got is for relocations regarding global 'variables' while .got.plt is a auxiliary section to act together with .plt when resolving procedures absolute addresses.

The example below makes things a bit clear.

These are the relocations for my 32 bits i686-linux /lib/libm.so

Relocation section '.rel.dyn' at offset 0x32b8 contains 8 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00025030  00000008 R_386_RELATIVE   
00024fd8  00005706 R_386_GLOB_DAT    00025034   _LIB_VERSION
00024fdc  00000406 R_386_GLOB_DAT    00000000   __gmon_start__
00024fe0  00000506 R_386_GLOB_DAT    00000000   _Jv_RegisterClasses
00024fe4  00000806 R_386_GLOB_DAT    00000000   _rtld_global_ro
00024fe8  00000906 R_386_GLOB_DAT    00000000   stderr
00024fec  00013006 R_386_GLOB_DAT    0002507c   signgam
00024ff0  00000e06 R_386_GLOB_DAT    00000000   __cxa_finalize

Relocation section '.rel.plt' at offset 0x32f8 contains 12 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00025000  00000107 R_386_JUMP_SLOT   00000000   fputs
00025004  00000207 R_386_JUMP_SLOT   00000000   __errno_location
00025008  00000307 R_386_JUMP_SLOT   00000000   sprintf
0002500c  00000407 R_386_JUMP_SLOT   00000000   __gmon_start__
00025010  00000607 R_386_JUMP_SLOT   00000000   strtod
00025014  00000707 R_386_JUMP_SLOT   00000000   __assert_fail
00025018  00000a07 R_386_JUMP_SLOT   00000000   strlen
0002501c  00000b07 R_386_JUMP_SLOT   00000000   strtof
00025020  00000c07 R_386_JUMP_SLOT   00000000   fwrite
00025024  00000d07 R_386_JUMP_SLOT   00000000   strtold
00025028  00005e07 R_386_JUMP_SLOT   00005970   matherr
0002502c  00000e07 R_386_JUMP_SLOT   00000000   __cxa_finalize

Look that as you noted there are two relocation sections, namely .rel.dyn and .rel.plt. You can see that all relocations for .rel.plt are of type R_386_JUMP_SLOT which means that they are branch relocations on the other hand almost all relocations in .rel.dyn are R_386_GLOB_DAT which means relocation for global variables.

Another subtle difference exist between .symtab and .dynsym. While the first contain references for all symbols used during static link editing the later contain only those symbols needed for dynamic linking. Thus, the relocations mentioned above refer only to .dynsym section.

Why does linux use two GOT sections in x64? .GOT vs .got.plt

The split is due to security reasons. By default (used to be only under -Wl,-z,relro in the past) .got section is remapped as read-only once dynamic loader resolved all data relocations at startup (i.e. before entering main function) to prevent some types of exploits. .got.plt can not be remapped because of lazy symbol binding (unless LD_BIND_NOW or -Wl,-z,now were used in which case lazy binding is turned off and .got.plt is remapped as well).

.plt .plt.got what is different?

The difference between .plt and .plt.got is that .plt uses lazy binding and .plt.got uses non-lazy binding.

Lazy binding is possible when all uses of a function are simple function calls. However, if anything requires the address of the function, then non-lazy binding must be used, since binding can only occur when the function is called, and we may need to know the address before the first call. Note that when obtaining the address, the GOT entry is accessed directly; only the function calls go via .plt and .plt.got.
If the -fno-plt compiler option is used, then neither .plt nor .plt.got are emitted, and function calls also directly access the GOT entry.

In the following examples, objdump -d is used for disassembly, and readelf -r is used to list relocations.

`.plt`

Using x64-64 as an example, .plt will contain entries such as:

0000000000014050 <_Unwind_Resume@plt>:
   14050:       ff 25 3a e6 0e 00       jmpq   *0xee63a(%rip)        # 102690 <_Unwind_Resume@GCC_3.0>
   14056:       68 02 00 00 00          pushq  $0x2
   1405b:       e9 c0 ff ff ff          jmpq   14020 <.plt>

The first jmpq is to the GOT entry, and the second jmpq performs the lazy binding if the GOT entry hasn't been bound yet.

The relocations for .plt's associated GOT entries are in the .rela.plt section and use R_X86_64_JUMP_SLOT, which lets the dynamic linker know these are lazy.

0000000000102690  0000004600000007 R_X86_64_JUMP_SLOT     0000000000000000 _Unwind_Resume@GCC_3.0 + 0

`.plt.got`

.plt.got contains entries that only need a single jmpq since they aren't lazy:

0000000000014060 <memset@plt>:
   14060:       ff 25 5a ea 0e 00       jmpq   *0xeea5a(%rip)        # 102ac0 <memset@GLIBC_2.2.5>
   14066:       66 90                   xchg   %ax,%ax

The relocations for .plt.got's associated GOT entries are in the .rela.dyn section (along with the rest of the GOT relocations), which the dynamic linker binds immediately:

0000000000102ac0  0000004b00000006 R_X86_64_GLOB_DAT      0000000000000000 memset@GLIBC_2.2.5 + 0

Why does the PLT exist in addition to the GOT, instead of just using the GOT?

The problem is that replacing call printf@PLT with call [printf@GOTPLT] requires that the compiler knows that the function printf exists in a shared library and not a static library (or even in just a plain object file). The linker can change call printf into call printf@PLT, jmp printf into jmp printf@PLT or even mov eax, printf into mov eax, printf@PLT because all it's doing it changing a relocation based on the symbol printf into relocation based on the symbol printf@PLT. The linker can't change call printf into call [printf@GOTPLT] because it doesn't know from the relocation whether it's a CALL or JMP instruction or something else entirely. Without knowing whether it's a CALL instruction or not, it doesn't know whether it should change the opcode from a direct CALL to a indirect CALL.

However even if there was a special relocation type that indicated that the instruction was a CALL, you still have the problem that a direct call instruction is a 5 bytes long but a indirect call instruction is 6 bytes long. The compiler would have to emit code like nop; call printf@CALL to give the linker room to insert the additional byte needed and it would have to do it for all calls to any global function. It would probably end up being a net performance loss because of all the extra and not actually necessary NOP instructions.

Another problem is that on 32-bit x86 targets the PLT entries are relocated at runtime. The indirect jmp [xxx@GOTPLT] instructions in the PLT don't use relative addressing like the direct CALL and JMP instructions, and since the address of xxx@GOTPLT depends on where the image was loaded in memory the instruction needs to be fixed up to use the correct address. By having all these indirect JMP instructions grouped together in one .plt section means that much smaller number of virtual memory pages need to be modified. Each 4K page that's modified can no longer be shared with other processes, when the instructions that need to modified are scattered all over memory it requires that a much larger part the image to be unshared.

Note that this later issue is only a problem with shared libraries and position independent executables on 32-bit x86 targets. Traditional executables can't be relocated, so there's no need to fix the @GOTPLT references, while on 64-bit x86 targets RIP relative addressing is used to access the @GOTPLT entries.

Because of that last point new versions of a GCC (6.1 or later) support the -fno-plt flag. On 64-bit x86 targets this option causes the compiler to generate call printf@GOTPCREL[rip] instructions instead of call printf instructions. However it appears to do this for any call to a function that isn't defined in the same compilation unit. That is any function it doesn't know for sure isn't defined in shared library. That would mean that indirect jumps would also be used for calls to functions defined in other object files or static libraries. On 32-bit x86 targets the -fno-plt option is ignored unless compiling position independent code (-fpic or -fpie) where it results in call printf@GOT[ebx] instructions being emitted. In addition to generating unnecessary indirect jumps, this also has the disadvantage of requiring the allocation of a register for the GOT pointer though most functions would need it allocated anyways.

Finally, Windows is able to do what you suggest by declaring symbols in header files with the "dllimport" attribute, indicating that they exist in DLLs. This way the compiler knows whether or not to generate direct or indirect call instruction when calling the function. The disadvantage of this is that the symbol has to exist in a DLL, so if this attribute used is you can't decide after compilation to link with a static library instead.

Read also Drepper's How to write a shared library paper, it explains that quite well in details (for Linux).

Difference between GOT and GOTOFF

symbol@GOTOFF addresses the variable itself, relative to the GOT base (as a convenient but arbitrary choice of anchor). lea of that gives you symbol address, mov would give you data at the symbol. (The first few bytes of the string in this case.)

symbol@GOT gives you offset (within the GOT) of the GOT entry, for that symbol. A mov load from there gives you the address of the symbol. (GOT entries are filled in by the dynamic linker).

Why use the Global Offset Table for symbols defined in the shared library itself? has an example of accessing an extern variable that does result in getting its address from the GOT and then dereferencing that.

BTW, this is position-independent code. Your GCC is configured that way by default. If you used -fno-pie -no-pie to make a traditional position-dependent executable, you'd just get a normal efficient pushl $.LC0. (32-bit is missing RIP-relative addressing so it's quite inefficient.)

In a non-PIE (or in 64-bit PIE), the GOT barely gets used at all. The main executable defines space for symbols so it can access them without going through the GOT. libc code uses the GOT anyway (mostly because of symbol interposition in 64-bit code) so letting the main executable provide the symbol doesn't cost anything and makes the non-PIE executable faster.

We can get a non-PIE executable to use the GOT directly for shared library function addresses with -fno-plt, instead of calling into the PLT and having it use the GOT.

#include <stdio.h>
void foo() { putchar('\n'); }

gcc9.2 -O3 -m32 -fno-plt on Godbolt (-fno-pie is the default on the Godbolt compiler explorer, unlike your system.)

foo():
        sub     esp, 20                  # gcc loves to waste an extra 16 bytes of stack 
        push    DWORD PTR stdout         # [disp32] absolute address
        push    10
        call    [DWORD PTR _IO_putc@GOT]
        add     esp, 28
        ret

Both push and call have a memory operand using a 32-bit absolute address. push is loading the FILE* value of stdout from a known (link-time-constant) address. (There isn't a text relocation for it.)

call is loading the function pointer saved by the dynamic linker from the GOT. (And loading it directly into EIP.)

Why there are no .rel.dyn/.got.plt section in dynamic ELF files?

.rel.dyn and .got.plt can be present in a shared library .elf it depends on the structure of the functions in the library, this is in https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

.rela.dyn is present if on extern function is included :

$ cat test.c
extern int foo;

int function(void) {
    return foo;
}
$ gcc -shared -fPIC -o libtest.so test.c

Also see https://eli.thegreenplace.net/2011/08/25/load-time-relocation-of-shared-libraries/ and
https://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries

https://blog.ramdoot.in/how-can-i-link-a-static-library-to-a-dynamic-library-e1f25c8095ef (linking static libraries into a dynamic library)

x86_64: Is it possible to in-line substitute PLT/GOT references?

This optimization has since been implemented in GCC. It can be enabled with the -fno-plt option and the noplt function attribute:

Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations. On architectures such as 32-bit x86 where PLT stubs expect the GOT pointer in a specific register, this gives more register allocation freedom to the compiler. Lazy binding requires use of the PLT; with -fno-plt all external symbols are resolved at load time.

Alternatively, the function attribute noplt can be used to avoid calls through the PLT for specific external functions.

In position-dependent code, a few targets also convert calls to functions that are marked to not use the PLT to use the GOT instead.