ELF program header virtual address and file offset
Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?
If it didn't, it would have to start at 0x08048154
, but it can't: the two LOAD
segments have different flags specified for their mapping (the first is mapped with PROT_READ|PROT_EXEC
, the second with PROT_READ|PROTO_WRITE
. Protections (being part of the page table) can only apply to whole pages, not parts of a page. Therefore, the mappings with different protections must belong to different pages.
virtual address mod page alignment == file offset mod page alignment
But I don't know why this relationship must be satisfied.
The LOAD
segments are directly mmap
ed from file. The actual mapping of the second LOAD
segment performed for your example will look something like this (you can run your program under strace
and see that it does):
mmap(0x08049000, 0x158, PROT_READ|PROT_WRITE, MAP_PRIVATE, $fd, 0)
If you try to make the virtual address or the offset non-page-aligned, mmap
will fail with EINVAL
. The only way to make file data to appear in virtual memory at desired address it to make VirtAddr
congruent to Offset
modulo Align
, and that is exactly what the static linker does.
Note that for such a small first LOAD
segment, the entire first segment also appears at the beginning of the second mapping (with the wrong protections). But the program is not supposed to access anything in the [0x08049000,0x08049154)
range. In general, it is almost always the case that there is some "junk" before the start of actual data in the second LOAD
segment (unless you get really lucky and the first LOAD
segment ends on a page boundary).
See also mmap man page.
ELF Program Headers and Virtual Address
I see virtual address value in Program Headers is actually from kernel Virtual address space.
No, you do not see that. None of the addresses in your output have anything to do with the kernel.
What you are looking at is a Position Independent executable, which can be loaded anywhere in memory.
I'm not able to find out how loader calculates the Base Address Value?
The loader doesn't load the main executable (the kernel does), and doesn't decide the load address.
Given that the file type is ET_DYN
, the kernel performs an equivalent ofmmap(0, ...)
(without MAP_FIXED
flag), and selects a suitable virtual address, which is then communicated to the loader in the aux
vector.
But I see 4 PT_LOAD program headers in my example. What are two PT_LOAD program headers used for?
See this answer.
Getting difference between virtual address and Offset in an ELF file
how is this calculated?
You just calculated it yourself: 0x400238 - 0x238 == 0x400000
. Your question is probably "why is this particular address selected?".
This is the default link-at address for Linux x86_64
position dependent binaries. You can change that address with -Ttext=...
linker flag. The default is different for ix86
(32-bit) binaries: it's 0x8048000
.
I am not sure why these particular defaults were chosen.
Is there a programmatic way of determining this?
Sure: read the Elf64_Ehdr
from the start of the file. It will tell you offset to the start of program headers (.e_phoff
). Seek to that offset, and read Elf64_Phdr
s. Now iterate over them, and their .p_vaddr
and .p_offset
will have the same values.
P.S. You are looking at program sections which are not used and are not guaranteed to be present in a fully-linked binary. You should be looking at program segments instead. Use readelf -Wl a.out
to examine them.
Calculate the entry point of an ELF file as a physical address (offset from 0)
My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?
First, forget about sections. Only segments matter at runtime.
Second, use readelf -Wl
to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset + .p_filesz)
) goes into which in-memory region ([.p_vaddr, .p_vaddr + .p_memsz)
).
The exact calculation of "at which offset in the file does _start
reside" is:
- Find
Elf32_Phdr
which "covers" the address contained inElf32_Ehdr.e_entry
. - Using that
phdr
, file offset of_start
is:ehdr->e_entry - phdr->p_vaddr + phdr->p_offset
.
Update:
So, am I always looking for the 1st program header?
No.
Also by "covers" you mean that the 1st phdr->p_vaddr is always equal to e_entry?
No.
You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the ehdr->e_entry
in memory. That is, you are looking for the segment for which phdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz
. This segment is often the first, but that is in no way guaranteed. See also this answer.
How to understand the difference between Offset and VirAddr in Program Headers in elf?
Generally speaking-
p_offset
- offset within the elf file
p_vaddr
- address of section after loaded to memory (say, after c runtime initialization finished)
They will not always be the same, those addresses can be configured using linker script for example. Refer to this.
As for the shared library addresses after library loaded into a process address space - this depends on process addresses space, ASLR, and more, but its safe to say that the dynamic loader will set new addresses (p_vaddr
, aka execution address)
ELF program header offset
The ELF specification requires that for demand paged executables, the file offset and virtual address for a segment must indeed match in the lower-order bits.
These restrictions are the same that the mmap()
system call places on mappings -- it only accepts mappings at an offset in the file that is a multiple of the page size. When mapping an ELF file, the segments are extended to the nearest page boundary, so the lower-order bits are effectively ignored except for the segment size calculation.
One possible rationale for this is that the underlying device may already be memory-mapped -- such as a frame buffer, or flash memory -- in which case it would impose a substantial overhead to create a mapping with an offset that is not page aligned.
why virtual address of LOAD program header and runtime virtual address shown by gdb is different?
VirtAddr of LOAD header should be the virtual address of the loaded segment.
This is only true for ELF images of type ET_EXEC
.
But you have an ELF image of type ET_DYN
(probably a position independent executable), and these are relocated at runtime to a different virtual address.
Related Topics
Coreos - Get Docker Container Name by Pid
How to Make Sure the Floating Point Arithmetic Result the Same in Both Linux and Windows
Count Files and Directories Using Shell Script
Accessing Linux /Dev/Usb as Standard Files to Communicate with Usb Device
"Tput: No Value for $Term and No -T Specified " Error Logged by Cron Process
Effects of Removing All Symbol Table and Relocation Information from an Executable
Accessing Data Appended to an Elf Binary
Svg to PDF on a Shared Linux Server
What Is This $Path in Linux and How to Modify It
How to Delete Everything in a String After a Specific Character
How to Limit the Cache Used by Copying So There Is Still Memory Available for Other Caches
What Is Chained Irq in Linux, When Are They Need to Used
Linux Mail < File.Log Has Content-Type: Application/Octet-Stream (A Noname Attachment in Gmail)
Force Linux to Use Only Memory Over 4G
How to Flush Cache of Hard-Disk and Flash-Disk (Or Filesystem) from Command Line