What's the Difference of Section and Segment in Elf File Format

What's the difference of section and segment in ELF file format

But what's difference between section and segment?

Exactly what you quoted: the segments contain information needed at runtime, while the sections contain information needed during linking.

does a segment contain one or more sections?

A segment can contain 0 or more sections. Example:

readelf -l /bin/date

Elf file type is EXEC (Executable file)
Entry point 0x402000
There are 9 program headers, starting at offset 64

Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x000000000000d5ac 0x000000000000d5ac R E 200000
LOAD 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x0000000000000440 0x0000000000000610 RW 200000
DYNAMIC 0x000000000000de38 0x000000000060de38 0x000000000060de38
0x00000000000001a0 0x00000000000001a0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x000000000000c700 0x000000000040c700 0x000000000040c700
0x00000000000002a4 0x00000000000002a4 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 8
GNU_RELRO 0x000000000000de10 0x000000000060de10 0x000000000060de10
0x00000000000001f0 0x00000000000001f0 R 1

Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .ctors .dtors .jcr .dynamic .got

Here, PHDR segment contains 0 sections, INTERP segment contains .interp section, and the first LOAD segment contains a whole bunch of sections.

Further reading with a nice illustration:

Sample Image

Section vs. segment?

In the context of ELF, they are two different related things.

  • Segments are described in the program header. Loosely, each segment describes a chunk of the file to be loaded into memory when an executable is run.

  • Sections are described in the section header. Loosely, each section describes a chunk of data relevant to the program.

So both sections and segments are chunks of the file, described as an offset and size (though in both cases the size may be 0, in which case the offset is ignored). Any given ELF file might have only segments, or only sections, or both segments and sections. In order to be executable it must have segments to load. In order to be linkable, it must have sections describing what is where.

Dynamically linked executable must have segments, but sections are still optional: There is a PT_DYNAMIC segment (see this) which indicates the content of the .dynamic section. In this way, the dynamic linker can still find the offset of the symbol table .dynsym.

In general, segments do not overlap each other and sections do not overlap each other, but sections may describe data that is part (or all) of a segment. That is not a strict requirement of the format, but it would be strange to violate. It would also be very strange for a section to describe data in two different segments. There are also (generally) sections that are not part of any segment.

ELF files - What is a section and why do we need it?

  1. How are ELF files generated? is it the compiler responsibility?

    They can be generated by a compiler, an assembler, or any other tool that can generate them. Even your own program you wrote for generating ELF files ;) They're just streams of bytes after all, so they can be generated by just writing bytes into a file in binary mode. You can do that too.

  2. What are sections and why do we need them?

    ELF files are subdivided into sections. Sections are the smallest continuous regions in the file. You can think of them as pages in an organizer, each with its own name and type that describes what does it contain inside. Linkers use this information to combine different parts of the program coming from different modules into one executable file or a library, by merging sections of the same type (gluing pages together, if you will).

    In executable files, sections are optional, but they're usually there to describe what's in the file and where does it begin, and how much bytes does it take.

  3. What are program headers and why do we need them?

    They're mostly for making executable files. In order to run a program, sections aren't enough, because you have to specify not only what's there in the file, but also where should it be loaded into memory in the running process. Program headers are just for that purpose: they describe segments, which are regions of memory in the running process, with different access privileges & stuff.

    Each program header describes one segment. It tells the loader where should it load a certain region in the file into memory and what permissions should it set for that region (e.g. should it be allowed to execute code from it? should it be writable or just for reading?)

    Segments can be further subdivided into sections. For example, if you have to specify that your code segment is further subdivided into code and static read-only strings for the messages the program displays. Or that your data segment is subdivided into funky data and hardcore data :J It's for you to decide.

    In executable files, sections are optional, but it's nice to have them, because they describe what's in the file and allow for dumping selected parts of it (e.g. with the objdump tool). Sometimes they are needed, though, for storing dynamic linking information, symbol tables, debugging information, stuff like that.

  4. Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?

    Those are the addresses at which the data in the file will be loaded. They map the contents of the file into their corresponding memory locations. The first one is a virtual address, the second one is physical address.

    Physical addresses are the "raw" memory addresses. On modern operating systems, those are no longer used in the userland. Instead, userland programs use virtual addresses. The operating system deceives the userland program that it is alone in memory, and that the entire address space is available for it. Under the hood, the operating system maps those virtual addresses to physical ones in the actual memory, and it does it transparently to the program.

    Of course, not every address in the virtual address space is available at the same time. There are limitations imposed by the actual physical memory available. So the operating system just maps the memory for the segments the program actually uses (here's where the "segments" part from the ELF file's program headers comes into play). If the process tries to access some unmapped memory, the operating system steps in and says, "sorry, chap, this memory doesn't belong to you". (The program can address it, but it cannot access it.)

  5. Does each section have it's own section header?

    Yes. If it doesn't have an entry in the Section Headers Table, it's not a section :q Because they only way to tell if some part of the file is a section, is by looking in to the Section Headers Table which tells you what sections are defined in the file and where you can find them.

    You can think of the Section Headers Table as a table of contents in a book. Without the table of contents, there aren't any chapters after all, because they're not listed anywhere. The book may have headings, but the content is not subdivided into logical chapters that can be found through the table of contents. Same goes with sections in ELF files: there can be some regions of data, but you can't tell without the "table of contents" which is the SHT.

What is the format of special section in ELF

There is not special format for .text and .data.

When the static linker links several .o file,

  1. it simply concatenates the .text and .data segments (while resolving relocations)
  2. and places them in the final .so or executable file according to the linker script (see gcc -Wl,-verbose /dev/null).

The .data segment simply contains the initial values of the instanciated global variables.

The .text segment simply contains the machine code of the routines/functions.

Let's take this simple C file:

char x[5] = {0xba, 0xbb, 0xbc, 0xbd, 0xbe};

char f(int i) {
return x[i];
}

Let's compile it:

$ gcc -c -o test.o test.c

Let's dump the .data section, using elfcat:

$ elfcat test.o --section-name .data | xxd
00000000: babb bcbd be .....

We can clearly explain the content of .data section.

Let's dump the .text section:

$ elfcat test.o --section-name .text | xxd
00000000: 5548 89e5 897d fc8b 45fc 4898 488d 1500 UH...}..E.H.H...
00000010: 0000 000f b604 105d c3

Let's decompile this:

$ elfcat test.o --section-name .text > test.text
$ r2 -a x86 -b 64 -qc pd test.text
0x00000000 55 push rbp
0x00000001 4889e5 mov rbp, rsp
0x00000004 897dfc mov dword [rbp - 4], edi
0x00000007 8b45fc mov eax, dword [rbp - 4]
0x0000000a 4898 cdqe
0x0000000c 488d15000000. lea rdx, qword [0x00000013] ; 19
0x00000013 0fb60410 movzx eax, byte [rax + rdx]
0x00000017 5d pop rbp
0x00000018 c3 ret

Again, there is nothing special in the text segment: it only contains the machine code of the routines/functions of my program.

Notice however the relocation and symbol informations in other segments:

$ readelf -a test.o
[ ... ]

Relocation section '.rela.text' at offset 0x1b8 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000f 000800000002 R_X86_64_PC32 0000000000000000 x - 4

Relocation section '.rela.eh_frame' at offset 0x1d0 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0

[...]

Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000000 5 OBJECT GLOBAL DEFAULT 3 x
9: 0000000000000000 25 FUNC GLOBAL DEFAULT 1 f

Some beginner questions about elf files, sections headers and how they work in general when we run an application

There are multiple inaccuracies in your description, and it's unclear whether you are imprecise in understanding the processing involved, or in describing them.

When you write a C application and you compile it(say with gcc) it gets translated into machine instructions which represent code and data.

This isn't entirely accurate: there is a difference between "machine instructions" and "machine code".

When you compile a .c file, some compilers will translate it into machine instructions (assembly), and then pass it to assembler to produce machine code (GCC does that). Other compilers have integrated assembler, and effectively skip the assembly generation step (Clang does that).

The output of invoking the compiler is an elf file.

On some but not all systems, the result of compilation is a relocatable ELF file. Other systems produce object files in a different format, such as XCOFF or Mach-O.

The elf file contains(among other things) a section header which is basically a series of Elf64_Shdr each for every section your compiled application contains.

The application is not built yet, so this is inaccurate. Also, Elf64_Shdr only applies to 64-bit ELF platforms; on 32-bit machines it's Elf32_Ehdr.

When we run the make command

The make command has nothing to do with anything. It just invokes compiler and linker (or other tools) as appropriate. You can replace it with a shell script, or just type the commands by hand.

and pass it the elf file

The link step involves one or more (usually more) relocatable ELF object files, archive libraries and dynamic libraries.

the linker comes into play and looks at all sections created by the compiler, at their names and attributes and groups them into 'segments' following the rules of a ld script file

To understand what the linker does you can read this series of blog posts.

Your description trivializes what the linker does. The linker is much more complicated, and performs relocation resolution which you didn't mention, and many other tasks.

and creates the executable file which we can run.

Usually true.

You can ask linker to combine several relocatable object files into a combined object file (with ld -r foo.o bar.o -o combined.o), and in that case the result will not be an executable file.

You can also ask the linker to link a shared library instead of linking an executable.

So basically segments are nothing more than sections of same attributes grouped togheter in a common section with a specific name.

False. There is lot more to linking than grouping sections together.

Then when we actually run the created executable the loader comes into play

The loader only comes into play for dynamically-linked executables. Fully-static executables do not have a loader, and are started directly by the kernel itself.

and looks at these segments created by the linker and by reading this information which they contain it maps the machine instructions to various memory locations so the process can run. This is what is called(in my understanding) a memory image.

Mostly correct. Some parts of the memory image do not come from disk at all (e.g. thread-local storage and contents of combined .bss sections)

Increase size of the BSS section in the elf and other file formats

I want to add a 4KB space to the bss section of an executable elf file. How can this be done?

Assuming you want to do that to an already linked ELF executable, note that sections are not used at all by anything (other than perhaps debugging tools) after the link is done; you want to modify the corresponding segment.

What you are asking for is impossible in general, but might be possible for your particular executable. In particular, it should be possible if the LOAD segment "covering" the .bss section is the last one, or if there is an in-memory gap between that segment and the next LOAD segment.

If there is space (in memory) to extend the LOAD segment in question, then all you have to do is patch its program header's .p_memsz and increment it by 4096.

You would need to understand the output from readelf -Wl a.out in detail.

Update:

assuming that bss occurs last, is there a tool to change .p_memsz of the last segment in a line or two ?

I don't know of any tool to do this, but it's pretty trivial: program headers are fixed sized table starting at file offset ehdr->e_phoff. The table contains ehdr->e_phnum records. You read each record until you find the last PT_LOAD segment, update the .p_memsz member, seek back and write the updated record on top of what was there.

The libelf or elfio libraries may (or may not) make writing this tool easier.

I guess to make the elf conformant, we also need to change the section size of bss accordingly ?

No, you don't:

  • there is no requirement that sizes of .bss and the load segment match
  • in fact, there is no requirement that any sections are present at all,
  • like I said, nothing cares about any sections after the binary is linked.


Related Topics



Leave a reply



Submit