What Is Segment 00 in My Linux Executable Program (64 Bits)

what is segment 00 in my Linux executable program (64 bits)

Since it starts at file offset zero, it is probably a "padding" segment introduced to make the loading of the ELF more efficient.
The .text segment will, in fact, be already aligned in the file as it should be in memory.

You can force ld not to align sections both in memory and in the file with -n. You can also strip the symbols with -s.

This will reduce the size to about 352 bytes.

Now the ELF contains:

The ELF header (Needed)
The program header table (Needed)
The code (Needed)
The string table (Possibly unneeded)
The section table (Possibly unneeded)

The string table can be removed, but apparently strips can't do that.
I've removed the .shstrtab section data and all the section headers manually to shrink the size down to 144 bytes.
Consider that 64 bytes come from the ELF header, 60 from the single program header and 12 from your code; for a total of 136 bytes.

The extra 8 bytes are padding, 4 bytes at the end of the code section (easy to remove), and one at the end of the program header (which requires a bit of patching).

Why is the text-segment of an executable offset (nonzero)?

I dare to recommend one more resource to read up:
https://learn.microsoft.com/en-us/windows/desktop/Debug/pe-format
which describes COFF-based executables used on MS Windows.

Memory assigned to an executable image in process' virtual address space starts at VA specified by IMAGE_OPTIONAL_HEADER.ImageBase
which in most linkers defaults to 0x0040_0000.
OS loads headers and segments from the executable file and it begins at ImageBase. Starting VA of each division is rounded up to IMAGE_OPTIONAL_HEADER.SectionAlign, which is usually 0x0000_1000.

The first division are headers. Windows loads the headers of EXE file here, i.e. DOS stubfile, COFF_FILE_HEADER, IMAGE_OPTIONAL_HEADER, SECTION_HEADERs.
If total size of those headers does not exceed 4 KB, the next available aligned VA is 0x0040_1000 and the first segment (usually .text) is loaded there.
The next segment (AKA section in MS terminology) is .data and it is loaded to 0x0040_2000 and so on.

Segment-starting virtual addresses were chosen more or less arbitrary. Notice that these are rounded and pretty numbers which look well and it's easy to recalculate segment-related addresses visible in listing to those virtual addresses visible in debuggers.

Virtual address space assigned to an executable image begins at IMAGE_OPTIONAL_HEADER.ImageBase and its rounded-up size is stored in IMAGE_OPTIONAL_HEADER.SizeOfImage. Anything below and above this range can be used by OS for other purposes: stack, heap, dynamically-loaded libraries, file-memory mapping.

Why does the VirtAddr of the LOAD segment in my ELF binary show as 0x0000000000000000?

You're seeing 0 for LOAD because your ELF is position independent.

Modern versions of GCC generate Position Independent Executables by default (unless configured otherwise). If the executable is PIE, the base virtual address in the ELF headers is set to 0. When you run your program under GDB, it temporarily disables address randomization and loads your program at the default address 0x0000555555554000.

If you want to compile a non-PIE executable you can use the -no-pie -fno-pie compilation flags.

In ELF, why do the headers need to be in one segment?

But I don't understand why Linux needs that headers to be loaded at run time.

It doesn't.

What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?

To answer all of these questions, you need to look at the Linux kernel source.

In the source, you can see that in fact program headers do not need to be a part of any PT_LOAD segment, and that the kernel will read them all on its own.

Changing your original program like so:

diff -u exe.asm.orig exe.asm
--- exe.asm.orig        2021-02-07 18:54:34.449336515 -0800
+++ exe.asm     2021-02-07 18:53:19.773532451 -0800
@@ -24,9 +24,9 @@
 programHeader:
     dd  1                         ; p_type
     dd  7                         ; p_flags
-    dq  0                         ; p_offset
-    dq  $$                        ; p_vaddr
-    dq  $$                        ; p_paddr
+    dq  _start - $$               ; p_offset
+    dq  _start                    ; p_vaddr
+    dq  _start                    ; p_paddr
     dq  fileSize                  ; p_filesz
     dq  fileSize                  ; p_memsz
     dq  0x1000                    ; p_align

produces a program which runs fine, but in which the program header is not in the PT_LOAD segment:

 eu-readelf --all exe
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048078 0x0000000008048078 0x000081 0x000081 RWE 0x1000

I have tried adding padding

You didn't do that correctly. Using your "with padding" source results in the following exe-padding:

...
  Entry point address:               0x8049000
...
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

This binary is started by the kernel, and immediately jumps to the start address 0x8049000, which isn't mapped (since it's not covered by the PT_LOAD segment), resulting in immediate SIGSEGV.

To fix this, you need to adjust the entry address:

diff -u exe-padding.asm.orig exe-padding.asm
--- exe-padding.asm.orig        2021-02-07 18:57:31.800871195 -0800
+++ exe-padding.asm     2021-02-07 19:34:27.303071700 -0800
@@ -8,7 +8,7 @@
     dw  2                         ; e_type
     dw  62                        ; e_machine
     dd  1                         ; e_version
-    dq  _start                    ; e_entry
+    dq  _start - 0x1000           ; e_entry
     dq  programHeader - $$        ; e_phoff
     dq  0                         ; e_shoff
     dd  0                         ; e_flags

This again produces a working executable. For the record:

eu-readelf --all exe-padding
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

P.S. You are linking your 64-bit program at 0x08048000, which is the traditional load address for i*86 (32-bit) executables. x86_64 binaries traditionally start at 0x400000.

Update:

About the first example, p_filesz is still fileSize, I think that should get outside of the boundaries of the file.

That is correct: p_filesz and p_memsz should be reduced by the size of headers (0x78 here). Note that both of these will be rounded up to page size (after adding p_offset), so for this example there is no practical difference.

Update 2:

pastebin.ubuntu.com/p/rgfVMrbcmJ

This results in the following LOAD segment:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048000 0x0000000008048000 0x000081 0x000081 RWE 0x1000

This binary will not run (kernel will reject it), because it is asking the kernel to do the impossible: to mmap bytes at offset 0x78 to page start.

If the application performed equivalent mmap call, it would have gotten EINVAL error, because mmap requires that (offset % pagesize) == (addr % pagesize).

Linux default behavior of executable .data section changed between 5.4 and 5.9?

This is only a guess: I think the culprit is the READ_IMPLIES_EXEC personality that was being set automatically in the absence of a PT_GNU_STACK segment.

In the 5.4 kernel source we can find this piece of code:

SET_PERSONALITY2(loc->elf_ex, &arch_state);
if (elf_read_implies_exec(loc->elf_ex, executable_stack))
    current->personality |= READ_IMPLIES_EXEC;

That's the only thing that can transform an RW section into an RWX one. Any other use of PROC_EXEC didn't seem to be changed or relevant to this question, to me.

The executable_stack is set here:

for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
    switch (elf_ppnt->p_type) {
    case PT_GNU_STACK:
        if (elf_ppnt->p_flags & PF_X)
            executable_stack = EXSTACK_ENABLE_X;
        else
            executable_stack = EXSTACK_DISABLE_X;
        break;

But if the PT_GNU_STACK segment is not present, that variable retains its default value:

int executable_stack = EXSTACK_DEFAULT;

Now this workflow is identical in both 5.4 and the latest kernel source, what changed is the definition of elf_read_implies_exec:

Linux 5.4:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (executable_stack != EXSTACK_DISABLE_X)

Latest Linux:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 *
 * The decision process for determining the results are:
 *
 *                 CPU: | lacks NX*  | has NX, ia32     | has NX, x86_64 |
 * ELF:                 |            |                  |                |
 * ---------------------|------------|------------------|----------------|
 * missing PT_GNU_STACK | exec-all   | exec-all         | exec-none      |
 * PT_GNU_STACK == RWX  | exec-stack | exec-stack       | exec-stack     |
 * PT_GNU_STACK == RW   | exec-none  | exec-none        | exec-none      |
 *
 *  exec-all  : all PROT_READ user mappings are executable, except when
 *              backed by files on a noexec-filesystem.
 *  exec-none : only PROT_EXEC user mappings are executable.
 *  exec-stack: only the stack and PROT_EXEC user mappings are executable.
 *
 *  *this column has no architectural effect: NX markings are ignored by
 *   hardware, but may have behavioral effects when "wants X" collides with
 *   "cannot be X" constraints in memory permission flags, as in
 *   https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
 *
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (mmap_is_ia32() && executable_stack == EXSTACK_DEFAULT)

Note how in the 5.4 version the elf_read_implies_exec returned a true value if the stack was not explicitly marked as not executable (via the PT_GNU_STACK segment).

In the latest source, the check is now more defensive: the elf_read_implies_exec is true only on 32-bit executable, in the case where no PT_GNU_STACK segment was found in the ELF binary.

I assembled your program, linked it, and found no PT_GNU_STACK segment, so this may be the reason.

If this is indeed the issue and if I followed the code correctly, if you set the stack as not executable in the binary, its data section should not be mapped executable anymore (not even on Linux 5.4).

why my x64 process base address not start from 0x400000?

I learned from this link Why is address 0x400000 chosen as a start of text segment in x86_64

That address is used for executables (ELF type ET_EXEC).

I only found my bash process starts from a very high base address (0x55971cea6000). Any one knows why?

Because your bash is (newer) position-independent executable (ELF type ET_DYN). It behaves much like a shared library, and is relocated to random address at runtime.

The 0x55971cea6000 address you found will vary from one execution to another. In contrast, ET_EXEC executables can only run correctly when loaded at their "linked at" address (typically 0x400000).

how does dynamic linker choose the start address for a 64-bit process?

The dynamic linker doesn't choose the start address of the executable -- the kernel does (by the time the dynamic linker starts running, the executable has already been mmaped into memory).

The kernel looks at the .e_type in the ELF header and .p_vaddr field of the first program header and goes from there. IFF .e_type == ET_EXEC, then the kernel maps executable segments at their .p_vaddr addresses. For ET_DYN, if ASLR is in effect, the kernel performs mmaps at a random address.

On x64, how does the Linux kernel access the data segment? Does it use -mcmodel=large during compilation?

I think the confusion is between the gcc memory model and the 64-bit CPU's MMU. Using the kernel memory model generates code that uses signed 32-bit offsets, which means all symbols in the kernel must fit in the top 2GB of the address space. This does not change the fact that virtual address pointers in the kernel are 64-bit, of which 48 or so bits are significant, allowing anything in the kernel or current user space to be indirectly accessed via the page tables and MMU.

What Is Segment 00 in My Linux Executable Program (64 Bits)