How to Change Alignment of Code Segment in Elf

Required alignment of .text versus .data

The ELF ABI does not require that VirtAddr or PhysAddr be page-aligned. (I believe) it only requires that

({Virt,Phys}Addr - Offset) % PageSize == 0

That is true for both working binaries, and false for the non-working one.

Update:

I don't see how this fails for the latter.

We have: VirtAddr == 0x08048100 and Offset == 0x80 (and PageSize == 4096 == 0x1000).

(0x08048100 - 0x80) % 0x1000 == 0x80 != 0

has to agree when align == 0x10, doesn't it?

No: it has to agree with the page size (as I said earlier), or the kernel will not be able to mmap the segment.

ELF program header segments sizes and offsets

The readelf output displays the program header table. It contains the list of segments (which may be loadable or non-loadable) in the ELF file. It is common for a segment to contain other segments, as seen here.

I find the PHDR segment just after the ELF header and having the size
of this entire program header. It has an alignment of 8 bytes and is
readable/executable. [!]I don't understand why executable.

If you read the readelf output carefully, you will notice that PHDR is actually a part of the code segment (notice the VirtAddr and the MemSiz fields). That explains why it shares the same permissions as the code segment (RX).

Now I have a segment that is readable and executable, which [!]I
suppose is the code segment. I don't understand why does it start at
0x0000000000000000. Shouldn't this start where the entry point is
located? Why does it have a size of 0xafc bytes? Isn't the size only
the size of the code? How much of the file is executable? Also, I
don't understand why the alignment is 0x200000 bytes. Is that how much
space is reserved for a LOAD segment in memory?. This is where this
segment ends and an amout of 764 0x0 bytes follows it:

Yes, this is the code segment. It begins at the beginning of the file (i.e. offset 0) and extends upto 0xafc bytes in the file. The header specifies that this part of the file is mapped to 0x0000000000400000 in memory when the ELF is loaded. The segment not only consists of the main( ) from the C++ file, some other executable stuff is also added by the compiler. Alignment only specifies where should the next segment begin, not the size of the segment. Loadable segments should have congruent values of VirtAddr and PhysAddr fields modulo page size (or Align field, if Align!=0 && Align!=1). That explains why VirtAddr for data segment is 0x0000000000600df8 (0x0000000000600df8 - 0x0000000000000df8 % 0x200000 == 0). The region in file between the text segment and the data segment (i.e. between 0xafc and 0xdf8) is filled with zeroes.

The next one (readable and writable) [!]I suppose is a segment where
variables are stored. It ends just where something like the sections
header might be starting.

Correct, this is the data segment that stores the global and static variables (among other stuff). It ends just before the section headers.

Now the next one is a DYNAMIC header. It starts at 0xe18, which is
inside the one above. [!]I thought this was a segment where references
to external functions and variables are stored but I am not sure. It
is readable and writable. I just don't know what segment is this and
why it is "inside" the LOAD segment above

Just like the PHDR segment is a part of the code segment, DYNAMIC segment is a part of the data segment. That's why the same permissions (RW). It contains .dynamic section which contains an array of structures such as addresses of symbol and string tables.

GNU specific segments, one of them having any offsets and sizes equal
to 0x0000000000000000, others interfering with other segments, which I
don't get, either.

GNU_EH_FRAME is a part of code segment and GNU_RELRO is a part of data segment (See the VirtAddr and MemSiz fields). GNU_STACK is just an program header which tells the system how to control the stack when the ELF is loaded into memory. (FileSiz and MemSiz are 0).

References:

ELF File format specification
Linkers and Loaders, by John R. Levine

How to separate .text section into a 4K aligned address in Linux ELF

The tool responsible of the issue you post is ld(1) (the linker) It has a complete manual describing the scriptable language used to align pages and create the final program. Just have a read to that manual (I'm referring to the ld manual, not the manual page)

Once you have read it, you can create a miniscript file that, based on the standard linux script, forces *(.text) segments to be page aligned and use it to link your program.

How to map the section to the segment from an ELF output file?

Two serious issues cause most of the problems are:

You load the second sector of the disk to 0x0000:0x8000 when all of the code expect the kernel to be loaded after the bootloader at 0x0000:0x7e00
You compile your kernel.c straight to an executable name kernel.o. You should compile it to a proper object file so it can go through the expected linking phase when you run ld.

To fix the problem with the kernel being loaded into memory at the wrong memory location, change:

mov bx, 0x8000           ; load into es:bx segment :offset of buffer

to:

mov bx, 0x7e00           ; load into es:bx segment :offset of buffer

To fix the issue of compiling kernel.cto an executable ELF file named kernel.o remove the -e kernel_main -Ttext 0x0 and replace it with -c. -c option forces GCC to produce an object file that can be properly linked with LD. Change:

/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-gcc -m32 kernel.c -o kernel.o -e kernel_main -Ttext 0x0 -nostdlib -ffreestanding -std=gnu99 -mno-red-zone -fno-exceptions -nostdlib  -Wall -Wextra

to:

/home/rakesh/Desktop/cross-compiler/i686-elf-4.9.1-Linux-x86_64/bin/i686-elf-gcc -m32 -c kernel.c -o kernel.o -nostdlib -ffreestanding -std=gnu99 -mno-red-zone -fno-exceptions -Wall -Wextra

Reason for Failure with Longer Strings

The reason the string with less than 64 bytes worked is because the compiler generated code in a position independent way by initializing the array on the stack with immediate values. When the size reached 64 bytes the compiler placed the string into the .rodata section and then initialized the array on the stack by copying it from the .rodata. This made your code position dependent. Your code was loaded at the wrong offsets and had incorrect origin points yielding code referencing incorrect addresses, so it failed.

Other Observations

You should initialize your BSS (.bss) section to 0 before calling kernel_main. This can be done in assembly by iterating through all the bytes from offset _bss_start to offset _bss_end.
The .comment section will be emitted into your binary file wasting bytes as a result. You should put it in the /DISCARD/ section.
You should place the BSS section in your linker script after all the others so it doesn't take up space in kernel.bin
In boot.asm you should set SS:SP (stack pointer) near the beginning before reading disk sectors. It should be set to a place that won't interfere with your code. This is especially important when reading data into memory from disk since you don't know where the BIOS placed the current stack. You don't want to read on top of the current stack area. Setting it just below the bootloader at 0x0000:0x7c00 should work.
Before calling into C code you should clear the direction flag to ensure string instructions use forward movement. You can do this by using the CLD instruction.
In boot.asmyou can make your code more generic by using the boot drive number passed by the BIOS in the DL register rather than hard coding it to the value 0x80 (0x80 being the first hard drive)
You might consider turning on optimization with -O3, or using optimization level -Os to optimize for code size.
Your linker script doesn't quite work the way you expect although it produces the correct results. You never declared .boot section in your NASM file so nothing actually gets placed in the .boot1 output section in the linker script. It works because it gets included in the .text section in the .kernel output section.
It is preferable to remove the padding and boot signature from the assembly file and move it to the linker script
Instead of having your linker script output a binary file directly, it is more useful to output to the default ELF executable format. You can then use OBJCOPY to convert the ELF file to a binary file. This allows you to build with debug information which will appear as part of the ELF executable. The ELF executable can be used to symbolically debug your binary kernel in QEMU.
Rather than use LD directly for linking, use GCC. This has the advantage that the libgcc library can be added without specifying the full path to the library. libgcc is a set of routines that may be needed for C code generation with GCC

Revised source code, linker script and build commands with the observations above taken into account:

boot.asm:

bits 16

section .boot

extern kernel_main
extern _bss_start
extern _bss_len

global boot

    jmp 0x0000:boot
boot:
    ; Place realmode stack pointer below bootloader where it doesn't
    ; get in our way
    xor ax, ax
    mov ss, ax
    mov sp, 0x7c00

    mov ah, 0x02             ; load second stage to memory
    mov al, 1                ; numbers of sectors to read into memory

;   Remove this, DL is already set by BIOS to current boot drive number
;    mov dl, 0x80             ; sector read from fixed/usb disk ;0 for floppy; 0x80 for hd
    mov ch, 0                ; cylinder number
    mov dh, 0                ; head number
    mov cl, 2                ; sector number
    mov bx, 0x7e00           ; load into es:bx segment :offset of buffer
    int 0x13                 ; disk I/O interrupt

    mov ax, 0x2401
    int 0x15 ; enable A20 bit
    mov ax, 0x3
    int 0x10 ; set vga text mode 3

    cli

    lgdt [gdt_pointer] ; load the gdt table
    mov eax, cr0
    or eax,0x1 ; set the protected mode bit on special CPU reg cr0
    mov cr0, eax
    jmp CODE_SEG:boot2 ; long jump to the code segment

gdt_start:
    dq 0x0
gdt_code:
    dw 0xFFFF
    dw 0x0
    db 0x0
    db 10011010b
    db 11001111b
    db 0x0
gdt_data:
    dw 0xFFFF
    dw 0x0
    db 0x0
    db 10010010b
    db 11001111b
    db 0x0
gdt_end:

gdt_pointer:
    dw gdt_end - gdt_start
    dd gdt_start
CODE_SEG equ gdt_code - gdt_start
DATA_SEG equ gdt_data - gdt_start

bits 32
boot2:
    mov ax, DATA_SEG
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax

    ; Zero out the BSS area
    cld
    mov edi, _bss_start
    mov ecx, _bss_len
    xor eax, eax
    rep stosb

    mov esp,kernel_stack_top
    call kernel_main

    cli
    hlt

section .bss
align 4
kernel_stack_bottom: equ $
    resb 16384 ; 16 KB
kernel_stack_top:

kernel.c:

void kernel_main(void){
  const char string[] = "01234567890123456789012345678901234567890123456789012345678901234";
  volatile unsigned char* vid_mem = (unsigned char*) 0xb8000;
  int j=0;
  while(string[j]!='\0'){

    *vid_mem++ = (unsigned char) string[j++];
    *vid_mem++ = 0x09;
  }

for(;;);

}

linker3.ld:

ENTRY(boot)

SECTIONS{
  . = 0x7c00;

  .boot1 : {
    *(.boot);
  }

  .sig : AT(0x7dfe){
     SHORT(0xaa55);
  }

  . = 0x7e00;
  .kernel : AT(0x7e00){
    *(.text);
    *(.rodata*);
    *(.data);
    _bss_start = .;
    *(.bss);
    *(COMMON);
    _bss_end = .;
    _bss_len = _bss_end - _bss_start;
  }
  /DISCARD/ : {
    *(.eh_frame);
    *(.comment);
  }

}

Commands to build this bootloader and kernel:

nasm -g -F dwarf -f elf32 boot.asm -o boot.o
i686-elf-gcc -g -O3 -m32 kernel.c -c -o kernel.o -ffreestanding -std=gnu99 \
    -mno-red-zone -fno-exceptions -Wall -Wextra    
i686-elf-gcc -nostdlib -Wl,--build-id=none -T linker3.ld boot.o kernel.o \
    -lgcc -o kernel.elf
objcopy -O binary kernel.elf kernel.bin

To symbolically debug the 32-bit kernel with QEMU you can launch QEMU this way:

qemu-system-i386 -fda kernel.bin -S -s &
gdb kernel.elf \
        -ex 'target remote localhost:1234' \
        -ex 'break *kernel_main' \
        -ex 'layout src' \
        -ex 'continue'

This will start up your kernel.bin file in QEMU and then remotely connect the GDB debugger. The layout should show the source code and break on kernel_main.

32bit executable session is aligned by 4kb, is it part of elf format?

The ELF 1.2 specifications describes entities that reside at a specific offset in the file, often denoted as p_offset, and that will be loaded at a specific address in memory (often denoted as p_vaddr).

The specification doesn't mandate any alignment of a segment directly.

However it requires that

Loadable process segments must have congruent values for p_vaddr and
p_offset, modulo the page size.
This member [p_align] gives the value to which the
segments are aligned in memory and in the file.

Values 0 and 1 mean that no
alignment is required. Otherwise, p_align should be a positive, integral power of
2, and p_addr should equal p_offset, modulo p_align.

The terminology is a bit off in my opinion (segments are not aligned in the usual sense, they don't start at a multiple of p_align).

The rationale behind the quote is that the system must be able to quickly load a segment, it is thus necessary to avoid shifting it in memory to match its loading address.

The file when loaded are made of one or mode "unit" of memory, called pages.

Pages has generally fixed size, thus they all start at address that are multiple of their size.

For 32-bit x86 system, this size is 4KiB, imagine then a sequence of pages and their starting addresses:

Page 0  Page 1  Page 2  ... Page 4 ... Page 100 ... Page K 
  0      4096    8192        16384      409600      K*4096

The point is that is possible to change the address of a page very quickly, without copying any byte, this is called remapping.

Once the file is loaded the OS remaps the page of the file so that each segment is at its address specified in p_vaddr.

Sections loaded in memory

Now if start of the segment in the file doesn't satisfy the conditions stated in the quote and p_alignis not a multiple of 4KiB, this "trick" won't work and the OS needs to revert to shifting the segment once loaded.

To make things easy and don't waste memory, segments are usually aligned to 4KiB in the file.

What's the difference of section and segment in ELF file format

But what's difference between section and segment?

Exactly what you quoted: the segments contain information needed at runtime, while the sections contain information needed during linking.

does a segment contain one or more sections?

A segment can contain 0 or more sections. Example:

readelf -l /bin/date

Elf file type is EXEC (Executable file)
Entry point 0x402000
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000d5ac 0x000000000000d5ac  R E    200000
  LOAD           0x000000000000de10 0x000000000060de10 0x000000000060de10
                 0x0000000000000440 0x0000000000000610  RW     200000
  DYNAMIC        0x000000000000de38 0x000000000060de38 0x000000000060de38
                 0x00000000000001a0 0x00000000000001a0  RW     8
  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x000000000000c700 0x000000000040c700 0x000000000040c700
                 0x00000000000002a4 0x00000000000002a4  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8
  GNU_RELRO      0x000000000000de10 0x000000000060de10 0x000000000060de10
                 0x00000000000001f0 0x00000000000001f0  R      1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .ctors .dtors .jcr .dynamic .got

Here, PHDR segment contains 0 sections, INTERP segment contains .interp section, and the first LOAD segment contains a whole bunch of sections.

Further reading with a nice illustration:

Sample Image

which part of ELF file must be loaded into the memory?

Sections and segments are two different concepts completely. Sections pertain the the semantics of the data stored there (i.e. what it will be used for) and are actually irrelevant once a program or shared library is linked except for debugging purposes. You could even remove the section headers entirely (or overwrite them with random garbage) and a program would still work.

Segments (i.e. program header load directives) are what the kernel and/or dynamic linker actually look at when loading a program. For example, in your case you have two load directives. The first one causes the first 4k (1 page) of the file to be mapped at address 0x08048000, and indicates that only the first 0x4b8 bytes of this mapping are actually to be used (the rest is alignment). The second causes the first 8k (2 pages) of the file to be mapped at address 0x08049000. The vast majority of that is alignment. The first 0xf14 bytes are not part of the load directive (just alignment) and will be wasted. Beginning at 0x08049f14, 0x108 bytes mapped from the file are actually used, and another 0x10 bytes (to reach the MemSize of 0x118) are zero-filled by the loader (kernel or dynamic linker). This spans up to 0x0804a02c (in the second mapped page). The rest of the second mapped page is unused/wasted (but malloc might be able to recover it for use as part of the heap).

Finally, while the section headers will not be used at all, the contents of many different sections may be used by your program while it's running. Note that the address ranges of .ctors and .dtors lie in the beginning of the second load mapping, so they are mapped and accessible by the program at runtime (the runtime startup/exit code will use them to run global constructors and destructors, if C++ or "GNU C" code with ctor/dtor attribute was used). Also note that .data starts at address 0x0804a00c, in the second mapped page. This allows the first page to be protected read-only after relocations are applied (the RELRO directive in the program header).