How to Extract Only the Raw Contents of an Elf Section

How to extract only the raw contents of an ELF section?

Rather inelegant hack around objdump and dd:

IN_F=/bin/echo
OUT_F=./tmp1.bin
SECTION=.text

objdump -h $IN_F |
grep $SECTION |
awk '{print "dd if='$IN_F' of='$OUT_F' bs=1 count=$[0x" $3 "] skip=$[0x" $6 "]"}' |
bash

The objdump -h produces predictable output which contains section offset in the elf file. I made the awk to generate a dd command for the shell, since dd doesn't support hexadecimal numbers. And fed the command to shell.

In past I did all that manually, without making any scripts, since it is rarely needed.

How can I examine contents of a data section of an ELF file on Linux?


objdump -s -j .rodata exefile

gives a side-by-side hex/printable ASCII dump of the contents of the rodata section like:

Contents of section .rodata:
0000 67452301 efcdab89 67452301 efcdab89 gE#.....gE#.....
0010 64636261 68676665 64636261 68676665 dcbahgfedcbahgfe

It doesn't look like there's anything in there to control formatting, but it's a start. You could always undump the hex and feed it to od, I suppose :)

Extract read-only data sections from an archive/lib (ELF i guess?) for compression

Eureka! I figured it out!

While I fully acknowledge and appreciate the advice given in comments / other answers, it still bothered me that I speculated it should be relatively easy to "play the linker" and extract blobs of hardcoded data from e.g an object-file.

Well, it turns out to be relatively easy (for elf format objects anyways) just as expected by using readelf

To dump a symbol, I used two steps:

  1. Figure out the symbol index, by looking at the symbols in the object:

    $ readelf --syms company_logo.o

    Symbol table '.symtab' contains 17 entries:

    Num: Value Size Type Bind Vis Ndx Name

    0: 00000000 0 NOTYPE LOCAL DEFAULT UND

    1: 00000000 0 FILE LOCAL DEFAULT ABS company_logo.c

    2: 00000000 0 SECTION LOCAL DEFAULT 1

    3: 00000000 0 SECTION LOCAL DEFAULT 2

    4: 00000000 0 SECTION LOCAL DEFAULT 3

    5: 00000000 0 SECTION LOCAL DEFAULT 4

    6: 00000000 0 NOTYPE LOCAL DEFAULT 4 $d

    7: 00000000 0 SECTION LOCAL DEFAULT 6

    8: 00000000 0 SECTION LOCAL DEFAULT 7

    9: 00000000 0 SECTION LOCAL DEFAULT 9

    10: 00000000 0 SECTION LOCAL DEFAULT 10

    11: 00000000 0 SECTION LOCAL DEFAULT 12

    12: 00000000 0 SECTION LOCAL DEFAULT 13

    13: 00000000 0 SECTION LOCAL DEFAULT 14

    14: 00000000 0 SECTION LOCAL DEFAULT 15

    15: 00000000 12 OBJECT GLOBAL DEFAULT 4 company_logo

    16: 00000000 21879 OBJECT GLOBAL DEFAULT 6 company_logo_map

  2. Dump the contents of the symbol.

Now company_logo_mapwas my target, so use its index 6, as follows:

`readelf --hex-dump=6 company_logo.o`

` `

`Hex dump of section '.rodata.company_logo_map':`

` 0x00000000 00000000 00000000 00000000 00000000 ................`

` 0x00000010 00000000 00000000 00000000 00000000 ................`

` 0x00000020 00000000 00000000 00000000 00000000 ................`

` 0x00000030 00000000 00000000 00000000 00000000 ................`

` ... lots more data here`

How do I get the real position of a section in an ELF archive file?


the File Offset column appears to be relative to the object file position in the archive

The file offset you get from objdump is relative to the beginning of the individual object file. You can think of an archive library as a bookshelf, and the ouput of objdump -h as the index within each individual book. You wouldn't expected the index to change depending on which other books are on the shelf, or when you take the book from the shelf. Similarly, the object file itself (and the output of objdump -h) does not change when you put into the library, or extract it out again (you get bit-identical copy).

I expected I could use dd to extract binary information from the archive file

You could use dd, but you'd have to first find the position of each individual object file in the archive. That's not too difficult: the format of UNIX archive files is documented. But the format can change depending on which UNIX variant you use, and it's not really necessary for the task you want to perform.

How do I do this with an archive?

If you know that .mysection has identical contents in all object files in libmylib.a (as would be the case for the objcopy --add-section command you gave), then extract one object, from the archive, then extract the section:

firstobj=$(ar t libmylib.a | grep '\.o$' | head -1)
ar x libmylib.a $firstobj
# use objdump -h and dd to extract section context.
# or use "readelf -p .mysection $firstobj
rm -f $firstobj

If the contents of .mysection may be different in different object files, extract them to a temporary directory:

mkdir tmp.$$ && cd tmp.$$ && ar x ../libmylib.a
for obj in $(find . -type f); do
# extract .mysection from $obj
done
cd .. && rm -rf tmp.$$

objdump to extract contents of text segment to a binary format

We have to specify the file format explicitly using the -I.


objcopy -I #file type format# -j #ELF segment contents to copy# -O #data type to output, binary, etc# #input file# #output file#

eg.

 
objcopy -I elf32-little -j .text -O binary firmware.ko content.bin

c# capture raw data from a GNU objcopy process, that dumps to a file

On Windows there is a device "CON" which you might leverage.

objcopy file "someFile" --dump-section .text=CON

I did not test it, because I do not have OBJCOPY, but it worked with OpenSSL. So it should output everything to the console.

How to see the GNU debuglink value of an ELF file?

Something like this should work:

objcopy --output-target=binary --set-section-flags .gnu_debuglink=alloc \
--only-section=.gnu_debuglink helloworld helloworld.dbg

--output-target=binary avoids adding ELF headers. --set-section-flags .gnu_debuglink=alloc is needed because objcopy only writes allocated sections by default (with the binary emulation). And --only-section=.gnu_debuglink finally identifies the answer. See this earlier answer.

Note that the generated file may have a trailing NUL byte and four bytes of CRC, so some post-processing is needed to extract everything up to the first NUL byte (perhaps using head -z -n 1 helloworld.dbg | tr -d '\0' or something similar).

DT_USED entry in .dynamic section of ELF file

In general, when looking at Solaris dynamic linker features, it is possible to find more information in the public Illumos sources (which were once derived from OpenSolaris). In this case, it seems that DT_USED is always treated like DT_NEEDED, so they are the essentially same thing. One of the header files, usr/src/uts/common/sys/link.h also contains this:

/*
* DT_* entries between DT_HIPROC and DT_LOPROC are reserved for processor
* specific semantics.
*
* DT_* encoding rules apply to all tag values larger than DT_LOPROC.
*/
#define DT_LOPROC 0x70000000 /* processor specific range */
#define DT_AUXILIARY 0x7ffffffd /* shared library auxiliary name */
#define DT_USED 0x7ffffffe /* ignored - same as needed */
#define DT_FILTER 0x7fffffff /* shared library filter name */
#define DT_HIPROC 0x7fffffff

There may have been planned something here, but it doesn't seem to be implemented (or it used to be and no longer is).



Related Topics



Leave a reply



Submit