Accessing Data Appended to an Elf Binary

Accessing data appended to an ELF binary

On the assumption at the start of the execution of the application you have access to the file, then opening a handle to it should prevent the operating system from obliterating the file on-disk until the last reference to the file has been closed. This would allow you to seek through the file to you heart's content, using that file handle without worry.

Create a global variable:

int app_fd;

The process for most of these is the same, in the main routine, simply issue:

app_fd = open(argv[0], O_RDONLY);

at the start of execution. When it comes to the point in the execution that you need to access the zip file, then simply use the file descriptor, rather than the filename.

At run-time, if you don't have some form of handle to the original contents of the application, then you will probably not be able to access the content of the zip file. This is due to the loader only mapping in the sections of the file that are expected. The content at the end of the binary would be considered garbage and not mapped in.

To accomplish the mapping of the zip file into memory, you would need to follow a different tack. You would need to embed the .zip into an ELF(linux)/COFF(Windows)/Mach-O(Mac OS X) section of the binary that has properties set such that it is guaranteed to be mapped into the application (this requires a lot of pre-work in the app, and a lot more post-work in the processing). It's not trivial, and probably involves quite a bit of coding to get it right for each of the platforms.

As an aside, it is not trivial to delete an application from a windows system while that application is running (I think you can move it if it resides on NTFS though).

Does appending arbitrary data to an ELF file violate the ELF spec?

The specification does not really say anything about it, so one could argue for "it's undefined behavior to have trailing data". On the other hand, the ELF specification is rather clear about its expectations: “sections and segments have no specified order. Only the ELF header has a fixed position in the file.”, which gives sufficient room to embed data one way or another, using a section, or doing without one [this is then unreferenced data!].

This "data freedom" has been exploited since at least the end of the 1980s; consider "self-extracting archives" where a generic unpacking code stub is let loose on a trailing data portion.

In fact, you can find such implicit feature even in non-executable data formats, such as RIFF and PNG. Not all formats allow this of course; in particular those where data is defined to runs until EOF rather than for a fixed length stored in some header. (Consider ZIP: appending data is not possible, but prepending is, which is what leads to EXE-ZIPs being readable by both (unmodified) unzip programs and operating systems.)

There is just one drawback to using unreferenced data like this: when reading and saving a file, you can lose this data.

Packing a file into an ELF executable

You could add the file to the elf file as a special section with objcopy(1):

objcopy --add-section sname=file oldelf newelf

will add the file to oldelf and write the results to newelf (oldelf won't be modified)
You can then use libbfd to read the elf file and extract the section by name, or just roll your own code that reads the section table and finds you section. Make sure to use a section name that doesn't collide with anything the system is expecting -- as long as your name doesn't start with a ., you should be fine.

How can I examine contents of a data section of an ELF file on Linux?


objdump -s -j .rodata exefile

gives a side-by-side hex/printable ASCII dump of the contents of the rodata section like:

Contents of section .rodata:
0000 67452301 efcdab89 67452301 efcdab89 gE#.....gE#.....
0010 64636261 68676665 64636261 68676665 dcbahgfedcbahgfe

It doesn't look like there's anything in there to control formatting, but it's a start. You could always undump the hex and feed it to od, I suppose :)

identify whether an ELF binary is built with optimizations


However sometimes I want use scripts to decide whether an elf binary was built with optimizations, so I can decide what to do next.

The problem with your question is that your binary is very likely to contain some code built with optimizations (e.g. crt0.o -- part of GLIBC).

Your final binary is composed of a bunch of object files, and these files do not have to have consistent optimization flags.

Most likely you only care about the code you wrote (as opposed to code linked from other libraries). You could use -frecord-gcc-switches for that. See this answer.

Embedding custom data into ELF file that will not get mmap'ed by ld.so


Is there a place in the ELF format to embed such big amount of data in a way that it won't get mapped by the OS automatically?

This is trivial to do with ELF -- just put the data into a non-allocated section.

It's easiest to use objcopy to convert an arbitrary data file into a .o that can be linked in:

objcopy -I binary -B i386 -O elf64-x86-64 \
--rename-section .data=.mydata,readonly,contents src dst.o

Embedding binary into elf with objcopy may cause alignment issues?

To answer my own question, I'd assert that objcopy is broken in this instance. I believe that using assembly is likely the best way to go here using Gnu as. Unfortunately I'm now linux machine-less so can't test this properly but I'll put this answer here in case someone finds it or wants to check:

.section ".rodata"
.align 4 # which either means 4 or 2**4 depending on arch!

.global _binary_file_bin_start
.type _binary_file_bin_start, @object
_binary_file_bin_start:
.incbin file.bin

.align 4
.global _binary_file_bin_end
_binary_file_bin_end:

The underscores are the traditional way to annoy yourself with C/asm interoperability. In other words they vanish with MS/Borland compilers under Windows.

C program to open binary elf files, read from them, and print them out (like objcopy)

Here is how you read a file using open() and read().

P.S I used fopen() and fread() instead of open() and read() because I am currently working with a Windows machine. However, the results will be the same for either.


int main()
{
FILE *file = fopen("input.txt", "r");
char buffer[2048];

if (file)
{
/* Loop will continue until an end of file is reached i.e. fread returns 0 elements read */
while (fread(buffer, 4, 1, file) == 1)
{
printf("%s", buffer);
}
fclose(file);
}
}

Update: For interpreting ELF files specifically, I would recommend taking a look at the following resources:

Check out the following code snippet. It shows how you can interpret an ELF file.

#include <stdio.h>
#include <libelf.h>
#include <stdlib.h>
#include <string.h>
static void failure(void);
void main(int argc, char **argv)
{
Elf32_Shdr *shdr;
Elf32_Ehdr *ehdr;
Elf *elf;
Elf_Scn *scn;
Elf_Data *data;
int fd;
unsigned int cnt;

/* Open the input file */
if ((fd = open(argv[1], O_RDONLY)) == -1)
exit(1);

/* Obtain the ELF descriptor */
(void)elf_version(EV_CURRENT);
if ((elf = elf_begin(fd, ELF_C_READ, NULL)) == NULL)
failure();

/* Obtain the .shstrtab data buffer */
if (((ehdr = elf32_getehdr(elf)) == NULL) ||
((scn = elf_getscn(elf, ehdr->e_shstrndx)) == NULL) ||
((data = elf_getdata(scn, NULL)) == NULL))
failure();

/* Traverse input filename, printing each section */
for (cnt = 1, scn = NULL; scn = elf_nextscn(elf, scn); cnt++)
{
if ((shdr = elf32_getshdr(scn)) == NULL)
failure();
(void)printf("[%d] %s\n", cnt,
(char *)data->d_buf + shdr->sh_name);
}
} /* end main */

static void
failure()
{
(void)fprintf(stderr, "%s\n", elf_errmsg(elf_errno()));
exit(1);
}

I would also recommend checking out the elfutils library, which can be found here.



Related Topics



Leave a reply



Submit