Why Linux/Gnu Linker Chose Address 0X400000

Why Linux/gnu linker chose address 0x400000?

The start address is usually set by a linker script.

For example, on GNU/Linux, looking at /usr/lib/ldscripts/elf_x86_64.x we see:

...
PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); \
    . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

The value 0x400000 is the default value for the SEGMENT_START() function on this platform.

You can find out more about linker scripts by browsing the linker manual:

% info ld Scripts

why my x64 process base address not start from 0x400000?

I learned from this link Why is address 0x400000 chosen as a start of text segment in x86_64

That address is used for executables (ELF type ET_EXEC).

I only found my bash process starts from a very high base address (0x55971cea6000). Any one knows why?

Because your bash is (newer) position-independent executable (ELF type ET_DYN). It behaves much like a shared library, and is relocated to random address at runtime.

The 0x55971cea6000 address you found will vary from one execution to another. In contrast, ET_EXEC executables can only run correctly when loaded at their "linked at" address (typically 0x400000).

how does dynamic linker choose the start address for a 64-bit process?

The dynamic linker doesn't choose the start address of the executable -- the kernel does (by the time the dynamic linker starts running, the executable has already been mmaped into memory).

The kernel looks at the .e_type in the ELF header and .p_vaddr field of the first program header and goes from there. IFF .e_type == ET_EXEC, then the kernel maps executable segments at their .p_vaddr addresses. For ET_DYN, if ASLR is in effect, the kernel performs mmaps at a random address.

Why does not entry point address start at 0x400000

The binary is relocatable, and the segments have been packed against each other. The .text segment starts at 0x1050 in the file, and the entry point is relative to its location in the file, not the location it will ultimately be loaded.

For a non-relocatable file try readelf -h /usr/lib/klibc/bin/sh. This file is simpler in a number of ways, including not using an interpreter but actually being loaded as-is by the kernel.

We're not in x86 anymore. Relocatable binaries are the default for everything now, not just shared libraries. It's no longer a great pain to do relocatable stuff, and in a certain way it's cheaper than non-relocatable now because we have to emit the vector table anyway because the processor doesn't have 64 bit displacement.

As Nate Eldredge points out, compiling with -no-pie yields a non-relocatable binary, and I have verified the expected start address appears.

What is the address of entry point of a file linked by gnu linker?

It depends on the binary format you use. For ELF, the e_entry member of the main header is what you want.

Why kernels make use of the high logical address

Before PIC (Position-independent code) was popular, there are lots of static linked programs can only be loaded at specified address, likely 0x400000

To be able to compatible with these programs, the kernel must not obtain the address space. So the kernel is located at high 1G address space.

Why does the `--oformat binary' option of the gnu linker place the `.data' segment at 0x0200000

1) The linker probably does this because it felt like it (e.g. possibly for alignment with "2 MiB pages" on 80x86), and because you didn't provide a linker script that says to do anything other than "whatever the linker feels like".

2) I'd assume all output formats do "whatever the linker feels like" (unless the linker is told otherwise).

Note: The actual behaviour may be determined by a default linker script hidden somewhere, and may be "whatever the distribution of the OS felt like" rather than merely "whatever the linker felt like".

In any case, if you want the linker to do something specific, you need to tell the linker specifically what you want by writing a linker script. If you actually have written a "less specific than necessary" linker script then you'd need to make that script more specific.

Why doesn't Linux cache object and/or .so files when using GNU Linker?

It looks like an incorrect setting of vm.vfs_cache_pressure = 1000 was causing this misbehaviour. Setting it to 70 fixed the problem and restored good cache performance.

And the documentation explicitly recommends against increasing the value beyond 100. Unfortunately, the Internet is full of examples with insane values like 1000.

Some questions about ELF file format

xxd shows the offset of the bytes within the file on disk. objdump -D shows (tentatively) the address in memory where those bytes will be loaded when the program is run. It is common for them to differ by a round number. In particular, 0x400000 may correspond to one higher-level page table entry; see Why Linux/gnu linker chose address 0x400000? which is for x86-64 but I think ARM64 is similar (haven't checked). It doesn't have anything to do with the fact that 0x40 is ASCII @; that's just a coincidence.
Note that if ASLR is in use, the actual memory address will be randomly chosen every time the program is run, and will not match what objdump shows you, though the difference will still be a multiple of the page size.

Why Linux/Gnu Linker Chose Address 0X400000