Elf Program Header Virtual Address and File Offset

ELF program header virtual address and file offset

Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?

If it didn't, it would have to start at 0x08048154, but it can't: the two LOAD segments have different flags specified for their mapping (the first is mapped with PROT_READ|PROT_EXEC, the second with PROT_READ|PROTO_WRITE. Protections (being part of the page table) can only apply to whole pages, not parts of a page. Therefore, the mappings with different protections must belong to different pages.

virtual address mod page alignment == file offset mod page alignment

But I don't know why this relationship must be satisfied.

The LOAD segments are directly mmaped from file. The actual mapping of the second LOAD segment performed for your example will look something like this (you can run your program under strace and see that it does):

mmap(0x08049000, 0x158, PROT_READ|PROT_WRITE, MAP_PRIVATE, $fd, 0)

If you try to make the virtual address or the offset non-page-aligned, mmap will fail with EINVAL. The only way to make file data to appear in virtual memory at desired address it to make VirtAddr congruent to Offset modulo Align, and that is exactly what the static linker does.

Note that for such a small first LOAD segment, the entire first segment also appears at the beginning of the second mapping (with the wrong protections). But the program is not supposed to access anything in the [0x08049000,0x08049154) range. In general, it is almost always the case that there is some "junk" before the start of actual data in the second LOAD segment (unless you get really lucky and the first LOAD segment ends on a page boundary).

ELF Program Headers and Virtual Address

I see virtual address value in Program Headers is actually from kernel Virtual address space.

No, you do not see that. None of the addresses in your output have anything to do with the kernel.

What you are looking at is a Position Independent executable, which can be loaded anywhere in memory.

I'm not able to find out how loader calculates the Base Address Value?

The loader doesn't load the main executable (the kernel does), and doesn't decide the load address.

Given that the file type is ET_DYN, the kernel performs an equivalent of
mmap(0, ...) (without MAP_FIXED flag), and selects a suitable virtual address, which is then communicated to the loader in the aux vector.

But I see 4 PT_LOAD program headers in my example. What are two PT_LOAD program headers used for?

See this answer.

Getting difference between virtual address and Offset in an ELF file

how is this calculated?

You just calculated it yourself: 0x400238 - 0x238 == 0x400000. Your question is probably "why is this particular address selected?".

This is the default link-at address for Linux x86_64 position dependent binaries. You can change that address with -Ttext=... linker flag. The default is different for ix86 (32-bit) binaries: it's 0x8048000.

I am not sure why these particular defaults were chosen.

Is there a programmatic way of determining this?

Sure: read the Elf64_Ehdr from the start of the file. It will tell you offset to the start of program headers (.e_phoff). Seek to that offset, and read Elf64_Phdrs. Now iterate over them, and their .p_vaddr and .p_offset will have the same values.

P.S. You are looking at program sections which are not used and are not guaranteed to be present in a fully-linked binary. You should be looking at program segments instead. Use readelf -Wl a.out to examine them.

Calculate the entry point of an ELF file as a physical address (offset from 0)

My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?

First, forget about sections. Only segments matter at runtime.

Second, use readelf -Wl to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset + .p_filesz)) goes into which in-memory region ([.p_vaddr, .p_vaddr + .p_memsz)).

The exact calculation of "at which offset in the file does _start reside" is:

Find Elf32_Phdr which "covers" the address contained in Elf32_Ehdr.e_entry.
Using that phdr, file offset of _start is: ehdr->e_entry - phdr->p_vaddr + phdr->p_offset.

Update:

So, am I always looking for the 1st program header?

No.

Also by "covers" you mean that the 1st phdr->p_vaddr is always equal to e_entry?

No.

You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the ehdr->e_entry in memory. That is, you are looking for the segment for which phdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz. This segment is often the first, but that is in no way guaranteed. See also this answer.

How to understand the difference between Offset and VirAddr in Program Headers in elf?

Generally speaking-

p_offset - offset within the elf file

p_vaddr - address of section after loaded to memory (say, after c runtime initialization finished)

They will not always be the same, those addresses can be configured using linker script for example. Refer to this.

As for the shared library addresses after library loaded into a process address space - this depends on process addresses space, ASLR, and more, but its safe to say that the dynamic loader will set new addresses (p_vaddr, aka execution address)

ELF program header offset

The ELF specification requires that for demand paged executables, the file offset and virtual address for a segment must indeed match in the lower-order bits.

These restrictions are the same that the mmap() system call places on mappings -- it only accepts mappings at an offset in the file that is a multiple of the page size. When mapping an ELF file, the segments are extended to the nearest page boundary, so the lower-order bits are effectively ignored except for the segment size calculation.

One possible rationale for this is that the underlying device may already be memory-mapped -- such as a frame buffer, or flash memory -- in which case it would impose a substantial overhead to create a mapping with an offset that is not page aligned.

why virtual address of LOAD program header and runtime virtual address shown by gdb is different?

VirtAddr of LOAD header should be the virtual address of the loaded segment.

This is only true for ELF images of type ET_EXEC.

But you have an ELF image of type ET_DYN (probably a position independent executable), and these are relocated at runtime to a different virtual address.

Elf Program Header Virtual Address and File Offset