Elf Label Address

ELF label address

You answered your own question: After linking....

Here is a good article:

Linkers and Loaders

In particular, note the section about "symbol relocation":

Relocation. Compilers and assemblers generate the object code for each
input module with a starting address of zero. Relocation is the
process of assigning load addresses to different parts of the program
by merging all sections of the same type into one section. The code
and data section also are adjusted so they point to the correct
runtime addresses.

There's no way to know the program address of "afterjmp" when a single object file is assembled. It's only when all the object files are assembled into an executable image can the addresses (relative to offset "0") be computed. That's one of the jobs of the linker: to keep track of "symbol references" (like "afterjmp"), and compute the machine address ("symbol relocation").

How do you get the start and end addresses of a custom ELF section?

As long as the section name results in a valid C variable name, gcc (ld, rather) generates two magic variables: __start_SECTION and __stop_SECTION. Those can be used to retrieve the start and end addresses of a section, like so:

/**
* Assuming you've tagged some stuff earlier with:
* __attribute((__section__("my_custom_section")))
*/

struct thing *iter = &__start_my_custom_section;

for ( ; iter < &__stop_my_custom_section; ++iter) {
/* do something with *iter */
}

I couldn’t find any formal documentation for this feature, only a few obscure mailing list references. If you know where the docs are, drop a comment!

If you're using your own linker script (as the Linux kernel does) you'll have to add the magic variables yourself (see vmlinux.lds.[Sh] and this SO answer).

See here for another example of using custom ELF sections.

ELF: Using the section size to calculate an address span

I was wondering about the size of a section in relation to the address space the section occupies.

A section does not normally occupy any address -- a segment does.

ELF stands for executable and linkable format, and serves dual purpose: (static) linking and execution.

During the linking phase, the linker operates on sections, and assigns them to 0 or more segments (but usually to at most 1 loadable segment). Some sections, such as .note or .comment usually don't have SHF_ALLOC flag set, and do not end up in any loadable segment.

Note that sections are not needed after static link, and can be completely stripped out.

During the execution phase, loadable segments are mmaped into the address space. If a section had size 100, had SHF_ALLOC flag, and got assigned to some PT_LOAD segment, then that section will occupy 100 bytes of the address space.

(I am not assuming dynamic loading or MMUs here)

Dynamic linking and MMUs are completely orthogonal to what's happening here. By mentioning them, you only muddy the waters.

For instance say a section size is 100 bytes long and starts at address 0. Naively I would assume that the address space taken by this section would be from 0 to 100.

As stated above, you view of the world is not entirely accurate, and the section is very unlikely to actually occupy the [0, 100) address range.

Assuming however that there are symbols there at address 0, 1, 2 and 3 which have a size of 0 but do have an address associated with them then the actual address space would be 0-103 with 0-3 as being empty?

Symbols are merely labels attached to certain addresses. They don't occupy any address space themselves. They can also be completely stripped after (static) link, though usually they are left in to simplify debugging. The presence of these symbols / labels is what allows the debugger to tell you that your program crashes in e.g. fscanf called from foo, which was called from main.

Relocatable symbols in ELF format (assembly language)

ELF doesn't know about instructions, per se. It knows about particular encodings of symbol offsets within instructions. In the assembler, you would need to output two relocation records, each with the corresponding [address,type,symbol] triplet to properly patch that portion of the instruction. The linker wouldn't necessarily even know that these two records point to the same instruction.

The ELF relocation types are completely CPU-dependent (or, to be more precise, ISA-dependent), so you are free to define whatever relocations you need for a new architecture.

It's hard to be more specific without details of the instruction encoding.

How to get the address of a label at compile time

Ah, you can put this bit of magic in your source file (crazy nasm...)

[map all myfile.map]


Related Topics



Leave a reply



Submit