How to dump part of binary file
In a single pipe:
xxd -c1 -p file |
awk -v b="ffd8ffd0" -v e="aaffd9" '
found == 1 {
print $0
str = str $0
if (str == e) {found = 0; exit}
if (length(str) == length(e)) str = substr(str, 3)}
found == 0 {
str = str $0
if (str == b) {found = 1; print str; str = ""}
if (length(str) == length(b)) str = substr(str, 3)}
END{ exit found }' |
xxd -r -p > new_file
test ${PIPESTATUS[1]} -eq 0 || rm new_file
The idea is to use awk
between two xxd
to select the part of the file that is needed. Once the 1st pattern is found, awk
prints the bytes until the 2nd pattern is found and exit.
The case where the 1st pattern is found but the 2nd is not must be taken into account. It is done in the END
part of the awk
script, which return a non-zero exit status. This is catch by bash
's ${PIPESTATUS[1]}
where I decided to delete the new file.
Note that en empty file also mean that nothing has been found.
How to get a hex dump from a binary file into C
if(fp == NULL) printf("File loading error number %i\n", errno);
When you detect an error, do not just print a message. Either exit the program or do something to correct for the error.
char buffer[10];
Use unsigned char
for working with raw data. char
may be signed, which can cause undesired effects.
fread(buffer, strlen(buffer)+1, 1, fp);
buffer
has not been initialized at this point, so the behavior of strlen(buffer)
is not defined by the C standard. In any case, you do not want to use the length of the string currently in buffer
as the size for fread
. You want the size of the array. So use sizeof buffer
(without the +1
).
for(int i=0;i<10;i++)
Do not iterate to ten. Iterate to the number of bytes put into the buffer by fread
. fread
returns size_t
value that is the number of items read. If you use it as size_t n = fread(buffer, 1, sizeof buffer, fp);
, the number of items (in n
) will be the number of bytes read, since having 1
for the second argument says each item to read is one byte.
printf("%x\n", buffer[i]);
To print an unsigned char
, use %hhx
. Because your buffer
had signed char
elements, some of them were negative. When used in this printf
, they were promoted to negative int
values. Then, because of the %x
, printf
attempted to print them as unsigned int
values. All the extra bits from the negative values in two’s complement form showed up.
How can I extract the text portion of a binary file in Linux/Bash?
Use the strings
utility - that's exactly what it's designed for.
How do I extract a single chunk of bytes from within a file?
Try dd
:
dd skip=102567 count=253 if=input.binary of=output.binary bs=1
The option bs=1
sets the block size, making dd
read and write one byte at a time. The default block size is 512 bytes.
The value of bs
also affects the behavior of skip
and count
since the numbers in skip
and count
are the numbers of blocks that dd
will skip and read/write, respectively.
How to interpret a dump of binary file?
If you compiled with debug info, then objdump -d -S
to interleave source lines with asm. gdb
can use external debug symbols (like from linux-image-4.16.0-1-amd64-dbg
) but I don't think that's useful for disassembling kernel modules.
I'm not sure how to tell objdump
to look for / use them. See https://www.technovelty.org/code/separate-debug-info.html for some more info about separate debug info, but it doesn't say anything about objdump -S
using them, only gdb.
Otherwise, ACM_CTRL_DTR | ACM_CTRL_RTS
is 0x3
, and there's a test/je that skips over a couple mov $3, %eax
/ mov $3, %ecx
instructions when something %esi
is zero, so that's if(raise)
branch. The 2nd integer function arg is passed in RSI/ESI in the x86-64 System V calling convention, so that's raise
unless it's been clobbered by an earlier call
or some other thing you hid with ...
.
The two mov
instructions are probably explained by assigning val
to something else, but I haven't tried to fully follow the logic.
Hexdump text and binary files in C
You have a couple of errors in logic. First as specified in the comment, all characters are equally important within a binary file. There is no need (and you shouldn't) test if(buff[i] >= 33 && buff[i] <= 255 || buff[i] != 00)
for your binary output.
The proper declarations for main
are int main (void)
and int main (int argc, char **argv)
(which you will see written with the equivalent char *argv[]
). See: C11 Standard §5.1.2.2.1 Program startup p1 (draft n1570). See also: See What should main() return in C and C++?
Next with your binary output, you are attempting to print an unsigned value with %02x
, but your are passing a signed character. If the char
value is negative, you are trying to output the sign-extended value with outputs the full width of the unsigned value (02x
will pad the field to 2 characters, but does not prevent more than two characters from printing). You have a couple of options, first use the hh
length modifier to limit the type to 1-byte, and second simply cast the value to (unsigned char)
, e.g.
printf("%02hhx ", (unsigned char)buff[i]);
You logic is also a bit cumbersome. You should use if ... else if ... else
to handle your binary cases. Further, you are outputting two spaces when either i >= read || buff[i] == 0
, so you may as well combine the test.
A short rewrite could look something like the following (which will read from the file given as the 1st argument -- or from stdin
if no argument is given)
#include <stdio.h>
#define OFFSET 16
int main (int argc, char const *argv[])
{
char buff[OFFSET] = "";
int read, address = 0, i;
FILE *fp = argc > 1 ? fopen (argv[1], "rb") : stdin;
if (!fp) {
perror ("fopen");
return 1;
}
while ((read = fread(buff, 1, sizeof buff, fp)) > 0) {
printf("%08x ", address);
address += OFFSET;
for (i = 0; i < OFFSET; i++) /* print hex values */
if (i >= read || buff[i] == 0)
printf(" ");
else
printf("%02hhx ", (unsigned char)buff[i]);
fputs ("| ", stdout); /* optional separator before ASCII */
for (i = 0; i < OFFSET; i++) /* print ascii values */
printf("%c", (buff[i] >= ' ' && buff[i] <= '~' ? buff[i] : ' '));
putchar ('\n'); /* use putchar to output single character */
}
if (fp != stdin)
fclose (fp);
}
(note: if your compiler does not support the hh
prefix, the cast itself will suffice)
Look things over and let me know if you have further questions.
C++ Writing Binary Dump and Writing to Binary from Dump
As far as I can tell the problem is the bit order in the dump of the bitset, the disk dump from the first fragment of code:
11001010
means that the first bit is the most significant of the byte, so to decode this you should do in the loop something like this (tested):
unsigned char *p = memblock;
//For each character until end of file:
for (int i=0; i<size/8; i++)
{
uint8_t byte = 0;
for (int j = 7; j >= 0; j--)
if (*p++ == '1')
byte |= (1 << j);
output.write((char*) &byte, 1);
}
Note that you cycle size/8 since every 8 chars encode a byte and the fact that the inner loop iterates from 7 to 0 to reflect the fact that the first bit is the more significant.
Interesting binary dump of executable file
When you compile a program into an executable on Linux (and a number of other unix systems), it is written in the ELF format. The ELF format has a number of sections, which you can examine with readelf or objdump:
readelf -a bindump | less
For example, section .text
contains CPU instructions, .data
global variables, .bss
uninitialized global variables (it is actually empty in the ELF file itself, but is created in the main memory when the program is executed), .plt
and .got
which are jump tables, debugging information, etc.
Btw. it is much more convenient to examine the binary content of files with hexdump:
hexdump -C bindata | less
There you can see that starting with offset 0x850 (approx. line 171 in your dump) there is a lot of zeros, and you can also see the ASCII representation on the right.
Let us look at which sections correspond to the block of your interest between 0x850 and 0x1160 (the field Off
– offset in the file is important here):
> readelf -a bindata
...
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[28] .shstrtab STRTAB 00000000 00074c 000106 00 0 0 1
[29] .symtab SYMTAB 00000000 000d2c 000440 10 30 45 4
...
You can examine the content of an individual section with -x:
> readelf -x .symtab bindump | less
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 34810408 00000000 03000100 ....4...........
0x00000020 00000000 48810408 00000000 03000200 ....H...........
0x00000030 00000000 68810408 00000000 03000300 ....h...........
0x00000040 00000000 8c810408 00000000 03000400 ................
0x00000050 00000000 b8810408 00000000 03000500 ................
0x00000060 00000000 d8810408 00000000 03000600 ................
You would see that there are many zeros. The section is composed of 18-byte values (= one line in the -x output) defining symbols. From readelf -a
you can see that it has 68 entries, and first 27 of them (excl. the very first one) are of type SECTION:
Symbol table '.symtab' contains 68 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 08048134 0 SECTION LOCAL DEFAULT 1
2: 08048148 0 SECTION LOCAL DEFAULT 2
3: 08048168 0 SECTION LOCAL DEFAULT 3
4: 0804818c 0 SECTION LOCAL DEFAULT 4
...
According to the specification (page 1-18), each entry has the following format:
typedef struct {
Elf32_Word st_name;
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
Without going into too much detail here, I think what matters here is that st_name and st_size are both zeros for these SECTION entries. Both are 32-bit numbers, which means lots of zeros in this particular section.
Related Topics
What Exactly Is Sudo Bang Bang
How to Change File Permissions in Ubuntu
Mongo: Couldn't Connect to Server 127.0.0.1:27017 at Src/Mongo/Shell/Mongo.Js:145
What Happens After a Packet Is Captured
How to Count Occurrences of a Word in All the Files of a Directory
Sanitize Environment with Command or Bash Script
Does Linux Kill Background Processes If We Close the Terminal from Which It Has Started
How I Should Run My Golang Process in Background
Difference Between Checkout and Export in Svn
Android Studio Error After ./Studio.Sh
Setting Environment Variable Globally Without Restarting Ubuntu
How the Util of iOStat Is Computed
Why Does Perf Stat Show "Stalled-Cycles-Backend" as <Not Supported>
How to Create a Waveform Image of an Mp3 in Linux
Linux - Replacing Spaces in the File Names
In a Linux Shell How to Process Each Line of a Multiline String