Difference Between .Dynamic .Dynsym and .Dynstr in an Elf Executable

Difference between .dynamic .dynsym and .dynstr in an ELF executable


Are my statements above correct?

The .dynsym section contains a set of fixed-length records of type Elf32_Sym or Elf64_Sym.

Since these are fixed length entries, they can not by themselves describe arbitrary length symbols (strings) that the binary exports or imports.

Therefore these entries don't contain the strings. Instead, they contain an offset into .dynstr (in the .st_name field), and the symbol name is found at this offset.

So it's not true that ".dynsym contains setsockopt@GLIBC_2.0" and that ".dynstr contains strings of function requirements" (whatever that last statement means).

The .dynsym contains an Elf32_Sym or an Elf64_sym describing an imported symbol setsockopt, and referencing the offset of "setsockopt" string in the .dynstr section.

Likewise, ".dynamic contains libraries that the executable needs to load" is false -- the section doesn't contain any libraries.

It contains fixed length entries of Elf64_Dyn or Elf32_Dyn, some of which (such as ones with .d_tag == DT_NEEDED or DT_RPATH) may reference strings from .dynstr via their offsets. The dynamic loader interprets these entries in certain way -- for DT_NEEDED as "must load this other library", for DT_RPATH as "must search these colon-separated paths", etc.

how to dump the .dynsym section of an ELF file?


This however is SEGFAULTing for some reason,

It SIGSEGVs because symbol->st_name is not a pointer to a string, it's an offset into .dynstr section where the actual string resides.

In order to print the name, you must read the contents of .dynstr into a buffer (or mmap the .dynstr section), and use st_name as offset into that buffer.

How to understand the difference between Offset and VirAddr in Program Headers in elf?

Generally speaking-

p_offset - offset within the elf file

p_vaddr - address of section after loaded to memory (say, after c runtime initialization finished)

They will not always be the same, those addresses can be configured using linker script for example. Refer to this.

As for the shared library addresses after library loaded into a process address space - this depends on process addresses space, ASLR, and more, but its safe to say that the dynamic loader will set new addresses (p_vaddr, aka execution address)

What's the difference of section and segment in ELF file format


But what's difference between section and segment?

Exactly what you quoted: the segments contain information needed at runtime, while the sections contain information needed during linking.

does a segment contain one or more sections?

A segment can contain 0 or more sections. Example:

readelf -l /bin/date

Elf file type is EXEC (Executable file)
Entry point 0x402000
There are 9 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000d5ac 0x000000000000d5ac R E 200000
LOAD 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x0000000000000440 0x0000000000000610 RW 200000
DYNAMIC 0x000000000000de38 0x000000000060de38 0x000000000060de38
0x00000000000001a0 0x00000000000001a0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x000000000000c700 0x000000000040c700 0x000000000040c700
0x00000000000002a4 0x00000000000002a4 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 8
GNU_RELRO 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x00000000000001f0 0x00000000000001f0 R 1

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .ctors .dtors .jcr .dynamic .got

Here, PHDR segment contains 0 sections, INTERP segment contains .interp section, and the first LOAD segment contains a whole bunch of sections.

Further reading with a nice illustration:

Sample Image

How do I remove defined symbols from the .dynsym table in my executable?


I see that my executable has defined symbols (size>0) and undefined symbols (size=0, coming from library).

The size of the symbol has nothing to do with whether the symbol is defined or not.

I don't need or want to export any symbols from my executable

Normally (in the absence of --export-dynamic or -rdynamic flags) the linker exports symbols from an executable only if the symbol is referenced by some shared library that you are linking, and the library will not work correctly if you manage to remove such symbol.

If you have 3000 exported symbols, it's likely that you have the -rdynamic flag. Some binaries will not work correctly without this flag (usually binaries that load plugins at runtime, but don't link against the plugins directly). Often there are better solutions for such binaries, such as explicitly exporting only the symbols that are required by the plugins.

relationship between VMA and ELF segments

A VMA is a homogeneous region of virtual memory with:

  • the same permissions (PROT_EXEC, etc.);

  • the same type (MAP_SHARED/MAP_PRIVATE);

  • the same backing file (if any);

  • a consistent offset within the file.

For example, if you have a VMA which is RW and you mprotect PROT_READ (you remove the permission to write) a part in the middle of the VMA, the kernel will split the VMA in three VMAs (the first one being RW, the second R and the last RW).

Let's look at a typical VMA from an executable:

$ cat /proc/$$/maps
00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash
006f1000-006f2000 r--p 000f1000 08:01 524453 /bin/bash
006f2000-006fb000 rw-p 000f2000 08:01 524453 /bin/bash
006fb000-00702000 rw-p 00000000 00:00 0
[...]

The first VMA is the text segment. The second, third and fourth VMAs are the data segment.

Anonymous mapping for .bss

At the beginning of the process, you will have something like this:

$ cat /proc/$$/maps
00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash
006f1000-006fb000 rw-p 000f1000 08:01 524453 /bin/bash
006fb000-00702000 rw-p 00000000 00:00 0
[...]
  • 006f1000-006fb000 is the part of the text segment which comes from the executable file.

  • 006fb000-00702000 is not present in the executable file because it is initially filled with zeroes. The non-initialized variables of the process are all grouped together (in the .bss segment) and are not represented in the executable file in order to save space (1).

This come from the PT_LOAD entries of the program header table of the executable file (readelf -l) which describe the segments to map into memory:


Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[...]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000f1a74 0x00000000000f1a74 R E 200000
LOAD 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
0x0000000000009068 0x000000000000f298 RW 200000
[...]

If you look at the corresponding PT_LOAD entry, you will notice that a part of the the segment is not represented in the file (because the file size is smaller than the memory size).

The part of the data segment which is not in the executable file is initialized with zeros: the dynamic linker uses a MAP_ANONYMOUS mapping for this part of the data segment. This is why is appears as a separate VMA (it does not have the same backing file).

Relocation protection (PT_GNU_RELRO)

When the dynamic, linker has finished doing the relocations (2), it might mark some part of the data segment (the .got section among others) as read-only in order to avoid GOT-poisoning attacks or bugs. The section of the data segment which should be protected after the relocations in described by the PT_GNU_RELRO entry of the program header table: the dynamic linker mprotect(addr, len, PROT_READ) the given region after finishing the relocations (3). This mprotect call splits the second VMA in two VMAs (the first one R and the second one RW).


Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[...]
GNU_RELRO 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
0x0000000000000220 0x0000000000000220 R
[...]

Summary

The VMAs


00400000-004f2000 r-xp 00000000 08:01 524453 /bin/bash
006f1000-006f2000 r--p 000f1000 08:01 524453 /bin/bash
006f2000-006fb000 rw-p 000f2000 08:01 524453 /bin/bash
006fb000-00702000 rw-p 00000000 00:00 0

are derived from the VirtAddr, MemSiz and Flags fields of the PT_LOAD and PT_GNU_RELRO entries:


Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
[...]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000f1a74 0x00000000000f1a74 R E 200000
LOAD 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
0x0000000000009068 0x000000000000f298 RW 200000
[...]
GNU_RELRO 0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0
0x0000000000000220 0x0000000000000220 R
[...]
  1. First all PT_LOAD entries are processes. Each of them triggers the creation of one VMA by using a mmap(). In addition, if MemSiz > FileSiz, it might create an additional anonymous VMA.

  2. Then all (well there is only once in pratice) PT_GNU_RELRO are processes. Each of them triggers a mprotect() call which might split an existing VMA into different VMAs.

In order to do what you want, the correct way is probably to simulate the mmap and mprotect calls:

// Virtual Memory Area:
struct Vma {
std::uint64_t addr, length;
std::string file_name;
int prot;
int flags;
std::uint64_t offset;
};

// Virtual Address Space:
class Vas {
private:
std::list<Vma> vmas_;
public:
Vma& mmap(
std::uint64_t addr, std::uint64_t length, int prot,
int flags, int fd, off_t offset);
int mprotect(std::uint64_t addr, std::uint64_t len, int prot);
std::list<Vma> const& vmas() const { return vmas_; }
};

for (Elf32_Phdr const& h : phdrs)
if (h.p_type == PT_LOAD) {
vas.mmap(...);
if (anon_size)
vas.mmap(...);
}
for (Elf32_Phdr const& h : phdrs)
if (h.p_type == PT_GNU_RELRO)
vas.mprotect(...);

Some examples of computations

The addresses are slightly different because the VMAs are page-aligned (3) (using 4Kio = 0x1000 pages for x86 and x86_64):

The first VMA is describes by the first PT_LOAD entry:

vma[0].start = page_floor(load[0].virt_addr)
= 0x400000

vma[0].end = page_ceil(load[1].virt_addr + load[1].phys_size)
= page_ceil(0x400000 + 0xf1a74)
= page_ceil(0x4f1a74)
= 0x4f2000

The next VMA is the part of the data segment which as been protected and is described by PT_GNU_RELRO:

vma[1].start = page_floor(relro[0].virt_addr)
= page_floor(0xf1de0)
= 0x6f1000

vma[1].end = page_ceil(relro[0].virt_addr + relo[0].mem_size)
= page_ceil(0x6f1de0 + 0x220)
= page_ceil(0x6f2000)
= 0x6f2000

[...]

Correspondence with the sections

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 00000298
0000000000004894 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 0000000000404b30 00004b30
000000000000d6c8 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 00000000004121f8 000121f8
0000000000008c25 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 000000000041ae1e 0001ae1e
00000000000011e6 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 000000000041c008 0001c008
00000000000000b0 0000000000000000 A 6 2 8
[ 9] .rela.dyn RELA 000000000041c0b8 0001c0b8
00000000000000c0 0000000000000018 A 5 0 8
[10] .rela.plt RELA 000000000041c178 0001c178
00000000000013f8 0000000000000018 AI 5 12 8
[11] .init PROGBITS 000000000041d570 0001d570
000000000000001a 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 000000000041d590 0001d590
0000000000000d60 0000000000000010 AX 0 0 16
[13] .text PROGBITS 000000000041e2f0 0001e2f0
0000000000099c42 0000000000000000 AX 0 0 16
[14] .fini PROGBITS 00000000004b7f34 000b7f34
0000000000000009 0000000000000000 AX 0 0 4
[15] .rodata PROGBITS 00000000004b7f40 000b7f40
000000000001ebb0 0000000000000000 A 0 0 64
[16] .eh_frame_hdr PROGBITS 00000000004d6af0 000d6af0
000000000000407c 0000000000000000 A 0 0 4
[17] .eh_frame PROGBITS 00000000004dab70 000dab70
0000000000016f04 0000000000000000 A 0 0 8
[18] .init_array INIT_ARRAY 00000000006f1de0 000f1de0
0000000000000008 0000000000000000 WA 0 0 8
[19] .fini_array FINI_ARRAY 00000000006f1de8 000f1de8
0000000000000008 0000000000000000 WA 0 0 8
[20] .jcr PROGBITS 00000000006f1df0 000f1df0
0000000000000008 0000000000000000 WA 0 0 8
[21] .dynamic DYNAMIC 00000000006f1df8 000f1df8
0000000000000200 0000000000000010 WA 6 0 8
[22] .got PROGBITS 00000000006f1ff8 000f1ff8
0000000000000008 0000000000000008 WA 0 0 8
[23] .got.plt PROGBITS 00000000006f2000 000f2000
00000000000006c0 0000000000000008 WA 0 0 8
[24] .data PROGBITS 00000000006f26c0 000f26c0
0000000000008788 0000000000000000 WA 0 0 64
[25] .bss NOBITS 00000000006fae80 000fae48
00000000000061f8 0000000000000000 WA 0 0 64
[26] .shstrtab STRTAB 0000000000000000 000fae48
00000000000000ef 0000000000000000 0 0 1

It you compare the Address of the sections (readelf -S) with the ranges of the VMAs, you find the mappings:


00400000-004f2000 r-xp /bin/bash : .interp, .note.ABI-tag, .note.gnu.build-id, .gnu.hash, .dynsym, .dynstr, .gnu.version, .gnu.version_r, .rela.dyn, .rela.plt, .init, .plt, .text, .fini, .rodata.eh_frame_hdr, .eh_frame
006f1000-006f2000 r--p /bin/bash : .init_array, .fini_array, .jcr, .dynamic, .got
006f2000-006fb000 rw-p /bin/bash : .got.plt, .data, beginning of .bss
006fb000-00702000 rw-p - : rest of .bss

Notes

(1): In fact, its more complicated: a part of the .bss section might be represented in the executable file for page alignment reasons.

(2): In fact, when it has finished doing the non-lazy relocations.

(3): MMU operations are using the page-granularity so the memory ranges of mmap(), mprotect(), munmap() calls are extended to cover full-pages.



Related Topics



Leave a reply



Submit