How to Dump Part of Binary File

How to dump part of binary file

In a single pipe:

xxd -c1 -p file |
  awk -v b="ffd8ffd0" -v e="aaffd9" '
    found == 1 {
      print $0
      str = str $0
      if (str == e) {found = 0; exit}
      if (length(str) == length(e)) str = substr(str, 3)}
    found == 0 {
      str = str $0
      if (str == b) {found = 1; print str; str = ""}
      if (length(str) == length(b)) str = substr(str, 3)}
    END{ exit found }' |
  xxd -r -p > new_file
test ${PIPESTATUS[1]} -eq 0 || rm new_file

The idea is to use awk between two xxd to select the part of the file that is needed. Once the 1st pattern is found, awk prints the bytes until the 2nd pattern is found and exit.

The case where the 1st pattern is found but the 2nd is not must be taken into account. It is done in the END part of the awk script, which return a non-zero exit status. This is catch by bash's ${PIPESTATUS[1]} where I decided to delete the new file.

Note that en empty file also mean that nothing has been found.

How to get a hex dump from a binary file into C

if(fp == NULL) printf("File loading error number %i\n", errno);

When you detect an error, do not just print a message. Either exit the program or do something to correct for the error.

char buffer[10];

Use unsigned char for working with raw data. char may be signed, which can cause undesired effects.

fread(buffer, strlen(buffer)+1, 1, fp);

buffer has not been initialized at this point, so the behavior of strlen(buffer) is not defined by the C standard. In any case, you do not want to use the length of the string currently in buffer as the size for fread. You want the size of the array. So use sizeof buffer (without the +1).

for(int i=0;i<10;i++)

Do not iterate to ten. Iterate to the number of bytes put into the buffer by fread. fread returns size_t value that is the number of items read. If you use it as size_t n = fread(buffer, 1, sizeof buffer, fp);, the number of items (in n) will be the number of bytes read, since having 1 for the second argument says each item to read is one byte.

printf("%x\n", buffer[i]);

To print an unsigned char, use %hhx. Because your buffer had signed char elements, some of them were negative. When used in this printf, they were promoted to negative int values. Then, because of the %x, printf attempted to print them as unsigned int values. All the extra bits from the negative values in two’s complement form showed up.

How can I extract the text portion of a binary file in Linux/Bash?

Use the strings utility - that's exactly what it's designed for.

How do I extract a single chunk of bytes from within a file?

Try dd:

dd skip=102567 count=253 if=input.binary of=output.binary bs=1

The option bs=1 sets the block size, making dd read and write one byte at a time. The default block size is 512 bytes.

The value of bs also affects the behavior of skip and count since the numbers in skip and count are the numbers of blocks that dd will skip and read/write, respectively.

How to interpret a dump of binary file?

If you compiled with debug info, then objdump -d -S to interleave source lines with asm. gdb can use external debug symbols (like from linux-image-4.16.0-1-amd64-dbg) but I don't think that's useful for disassembling kernel modules.

I'm not sure how to tell objdump to look for / use them. See https://www.technovelty.org/code/separate-debug-info.html for some more info about separate debug info, but it doesn't say anything about objdump -S using them, only gdb.

Otherwise, ACM_CTRL_DTR | ACM_CTRL_RTS is 0x3, and there's a test/je that skips over a couple mov $3, %eax / mov $3, %ecx instructions when something %esi is zero, so that's if(raise) branch. The 2nd integer function arg is passed in RSI/ESI in the x86-64 System V calling convention, so that's raise unless it's been clobbered by an earlier call or some other thing you hid with ....

The two mov instructions are probably explained by assigning val to something else, but I haven't tried to fully follow the logic.

Hexdump text and binary files in C

You have a couple of errors in logic. First as specified in the comment, all characters are equally important within a binary file. There is no need (and you shouldn't) test if(buff[i] >= 33 && buff[i] <= 255 || buff[i] != 00) for your binary output.

The proper declarations for main are int main (void) and int main (int argc, char **argv) (which you will see written with the equivalent char *argv[]). See: C11 Standard §5.1.2.2.1 Program startup p1 (draft n1570). See also: See What should main() return in C and C++?

Next with your binary output, you are attempting to print an unsigned value with %02x, but your are passing a signed character. If the char value is negative, you are trying to output the sign-extended value with outputs the full width of the unsigned value (02x will pad the field to 2 characters, but does not prevent more than two characters from printing). You have a couple of options, first use the hh length modifier to limit the type to 1-byte, and second simply cast the value to (unsigned char), e.g.

            printf("%02hhx ", (unsigned char)buff[i]);

You logic is also a bit cumbersome. You should use if ... else if ... else to handle your binary cases. Further, you are outputting two spaces when either i >= read || buff[i] == 0, so you may as well combine the test.

A short rewrite could look something like the following (which will read from the file given as the 1st argument -- or from stdin if no argument is given)

#include <stdio.h>

#define OFFSET 16

int main (int argc, char const *argv[]) 
{
    char buff[OFFSET] = "";
    int read, address = 0, i;
    FILE *fp = argc > 1 ? fopen (argv[1], "rb") : stdin;

    if (!fp) {
        perror ("fopen");
        return 1;
    }

    while ((read = fread(buff, 1, sizeof buff, fp)) > 0) {
        printf("%08x ", address);
        address += OFFSET;

        for (i = 0; i < OFFSET; i++)    /* print hex values */
            if (i >= read || buff[i] == 0)
                printf("   ");
            else
                printf("%02hhx ", (unsigned char)buff[i]);

        fputs ("| ", stdout); /* optional separator before ASCII */

        for (i = 0; i < OFFSET; i++)    /* print ascii values */
            printf("%c", (buff[i] >= ' ' && buff[i] <= '~' ? buff[i] : ' '));
        putchar ('\n'); /* use putchar to output single character */
    }

    if (fp != stdin)
        fclose (fp);
}

(note: if your compiler does not support the hh prefix, the cast itself will suffice)

Look things over and let me know if you have further questions.

C++ Writing Binary Dump and Writing to Binary from Dump

As far as I can tell the problem is the bit order in the dump of the bitset, the disk dump from the first fragment of code:

11001010

means that the first bit is the most significant of the byte, so to decode this you should do in the loop something like this (tested):

unsigned char *p = memblock;
//For each character until end of file:
for (int i=0; i<size/8; i++)  
{
    uint8_t byte = 0;
    for (int j = 7; j >= 0; j--) 
        if (*p++ == '1')
            byte |= (1 << j);

    output.write((char*) &byte, 1);
}

Note that you cycle size/8 since every 8 chars encode a byte and the fact that the inner loop iterates from 7 to 0 to reflect the fact that the first bit is the more significant.

Interesting binary dump of executable file

When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:

readelf -a bindump | less

For example, section .text contains CPU instructions, .data global variables, .bss uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt and .got which are jump tables, debugging information, etc.

Btw. it is much more convenient to examine the binary content of files with hexdump:

hexdump -C bindata | less

There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.

Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off – offset in the file is important here):

> readelf -a bindata
...
Section Headers:
[Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
...
[28] .shstrtab         STRTAB          00000000 00074c 000106 00      0   0  1
[29] .symtab           SYMTAB          00000000 000d2c 000440 10     30  45  4
...

You can examine the content of an individual section with -x:

> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................

You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:

Symbol table '.symtab' contains 68 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 08048134     0 SECTION LOCAL  DEFAULT    1 
     2: 08048148     0 SECTION LOCAL  DEFAULT    2 
     3: 08048168     0 SECTION LOCAL  DEFAULT    3 
     4: 0804818c     0 SECTION LOCAL  DEFAULT    4 
     ...

According to the specification (page 1-18), each entry has the following format:

typedef struct {
    Elf32_Word st_name;
    Elf32_Addr st_value;
    Elf32_Word st_size;
    unsigned char st_info;
    unsigned char st_other;
    Elf32_Half st_shndx;
} Elf32_Sym;

Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.

How to Dump Part of Binary File