Base Address of Elf

ELF Program Headers and Virtual Address

I see virtual address value in Program Headers is actually from kernel Virtual address space.

No, you do not see that. None of the addresses in your output have anything to do with the kernel.

What you are looking at is a Position Independent executable, which can be loaded anywhere in memory.

I'm not able to find out how loader calculates the Base Address Value?

The loader doesn't load the main executable (the kernel does), and doesn't decide the load address.

Given that the file type is ET_DYN, the kernel performs an equivalent of
mmap(0, ...) (without MAP_FIXED flag), and selects a suitable virtual address, which is then communicated to the loader in the aux vector.

But I see 4 PT_LOAD program headers in my example. What are two PT_LOAD program headers used for?

See this answer.

In an ELF file, how does the address for _start get detemined?

The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:

.globl _start
_start:
    // assembly here

When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.

Why can I get the elf entry without the help of a base address?

The linker, as a rule, doesn't really care where in memory it places anything. It's primary job is to make sure that all memory references are consistent, no matter how memory is laid out. The purpose of a linker script is to tell the linker how to lay out the memory. If you don't provide a linker script, it will use its own defaults. In other words, the linker doesn't know or care whether you have something already loaded at 0x4000. It's your job to know how your memory is laid out, and to provide a linker script if you want it laid out in a specific way.

As for the -no-pie bit of the question, that comes down to how position-independent and non-position-independent executables are loaded. Your UEFI bootloader is, among other things, a loader. There is a flag in the executable telling the loader whether or not it's a PIE. If it's not, then the loader just has to use the exact addresses encoded in the file. In this case, the elf->entry pointer will be exactly correct. If it is a PIE, then the loader can place it at whatever memory address it likes, in which case the elf->entry pointer will be relative to the address at which the executable is loaded. That's why you need to use base + elf->entry when you don't provide the -no-pie flag.

Elegant way to set base address of ELF image with Linux binutils?

The ELF entry point can be set in the linker response file, which can be passed to ld with -T

Doing a bogus link with --verbose will show you the default linker responsefile (which might be system specific, but in reality it is not that bad, one per arch per OS for the most).

Note that there might be additional constraints (like the entry point residing in a text/codesegment)

For a practical example of lugging along custom linker files, see the Free Pascal project, which does this to implement resources.

how to get heap start address of a elf binary

Question is how can i get heap & stack base address of a live process from within it

Note that neither heap, nor stack have anything to do with the ELF format, or libelf.

There is no such thing as "heap base address" -- most modern heap allocators will perform multiple mmap calls to obtain memory from the OS, then "dole" it out to various malloc requests.

i did also try at the very start in main call sbrk(0)

"Legacy" malloc used to obtain memory using sbrk(), but few modern ones do. If the malloc you are using does use sbrk, then calling sbrk(0) near the start of main is a usable approximation.

For the main thread stack, you would want to do the same. A good first approximation is taking &argc, and rounding it up to page boundary.

If you want to get better approximation, you could use the fact that on Linux (and possibly other ELF platforms) the kernel puts specific values on the stack before invoking the entry point. Iterating through the __environ values looking for the highest address will give a better approximation.

In elf binary, Is there any simple method to get memory address from offset?

ELF programs have a program header, which lists PT_LOAD segments (struct Elf32_Phdr or struct Elf64_Phdr). These have both a file offset and length (p_offset and p_filesz members) and a virtual address and length (p_vaddr and p_memsz). The point is that the the region identified by the the file offset and length becomes available at run time at the specified virtual address. The virtual address is relative to the base address of the object in memory.

You can view the program headers using readelf -l:

Elf file type is DYN (Shared object file)
Entry point 0x1670
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R E    0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x000000000000627c 0x000000000000627c  R E    0x200000
  LOAD           0x0000000000006d68 0x0000000000206d68 0x0000000000206d68
                 0x00000000000004b8 0x0000000000000658  RW     0x200000
…

In this case, there are two load segments, one readable and executable (the program code), and one readable and writable (data and relocations).

Not all parts of the binary are covered by PT_LOAD segments and thus mapped by the loader at run time. If the data is in an unallocated section, it will just not be in memory (unless you read it from disk by other means).

But if the data is allocated, then it will fall into one of the load segments, and once you have the base address, you can use the information in the load segment to compute the virtual address from the file offset.