How to Grep for Presence of Specific Hex Bytes in Files

How to grep for presence of specific hex bytes in files?

Check the post again. FrOsT is not including the '<' and '>' in his actual grep command. He only used the carats to enclose an example statement. His actual statement looks like this:

"\x01\x02"

not:

"<\x01\x02>"

I have a C source file on my computer that begins with the line:

#include <stdio.h>

When I run

grep -obUaP '\x69\x6E\x63\x6C\x75\x64\x65' io.c

I get

1:include

That is, the line number followed by only the string matching the pattern.

You may want to run

man grep

and find out what all those options mean.

Using grep to search for hex strings in a file

We tried several things before arriving at an acceptable solution:

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....

root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....

Then found we could get usable results with

xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd

Note that using a simple search target like 'DF' will incorrectly match characters that span across byte boundaries, i.e.

xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
--------------------^^

So we use an ORed regexp to search for ' DF' OR 'DF ' (the searchTarget preceded or followed by a space char).

The final result seems to be

xxd -u -ps -c 10000000000 DumpFile > DumpFile.hex
egrep ' DF|DF ' Dumpfile.hex

0001020: 0089 0424 8D95 D8F5 FFFF 89F0 E8DF F6FF ...$............
-----------------------------------------^^
0001220: 0C24 E871 0B00 0083 F8FF 89C3 0F84 DF03 .$.q............
--------------------------------------------^^

How do I grep for special character(control characters) using hex representation

byte

What you need to do first is to create inside a variable the exact byte that you want to search.

Something like any of this:

a=$(echo -e '\xc0)
a=$'\xc0'
a=$(printf '\xc0')
a=$(echo -e '\300') # 300 is 0xC0 in octal
a=$'\300'
a=$(printf '\300')
a=$(echo "c0" | xxd -r -p)

I could try to come up with some other ways, but I hope you get the idea.

Then, you could try to search for the byte with grep:

echo $'Testing this: \xC0 byte' |  grep "$a"

And, if you use a locale with utf-8 (as is the most common) that will fail.
If you change to a ISO-8859-1 locale, that will work:

LC_ALL=en_US.iso88591 echo $'Testing this: \xC0 byte' |
LC_ALL=en_US.iso88591 grep -P "$a"

Or, if you don't mind starting a new bash instance:

$ bash
$ export LC_ALL=en_US.iso88591
$ echo $'Testing this: \xC0 byte' | grep -P "$a"

And just return to the old bash environment by executing exit.

This might work or not depending on your system.

Let's explore the other side: characters.

character

There is a very very important twist that you should understand.

A byte is not a character. Well, sometimes, by sheer luck, it is.

But beside those 128 ASCII characters in which a byte is a character (not in UTF-16 or UTF-32. And let's also forget about EBCDIC), all 1,114,112 (17 × 65,536) UNICODE code points have more than one byte 1.

In that case, you should ask for the UNICODE code point of hex 0xC0.

In modern bash, like this:

$ printf '\U00C0`
À

Which is this character: LATIN CAPITAL LETTER A WITH GRAVE

That will be encoded as one byte if the locale is ISO-8859-1 (and ISO-8859-15, at least) and as two bytes if the locale is utf-8.

$ a=$(printf '\UC0')
$ printf 'Testing \U00C0 character' | grep -P "$a"
Testing À character

It also will work if you change the LC_ALL variable. Well, I mean that grep will detect the character, but the printed line may fail to render correctly the character due to the changed locale.

If the file has this character and the encoding of the file is correct. Grep will work with the value of the character in a variable.

How to grep a text file which contains some binary data?

You could run the data file through cat -v, e.g

$ cat -v tmp/test.log | grep re
line1 re ^@^M
line3 re^M

which could be then further post-processed to remove the junk; this is most analogous to your query about using tr for the task.

-v simply tells cat to display non-printing characters.

Adding Bytes to file using Hex Editor

It likely has to be the same length or shorter (e.g. padded with nulls) because of pointers within the file itself. If a game file is expecting a structure or function at index XXXX, and you shift everything by five bytes, then it's not going to work. How to fix it? You would need intimate knowledge of the game file format. Then you could go about revising what else needs to be revised.

As an aside, Windows DLLs keep their strings and dialogs in a separate resource area, and are surprisingly easy to revise using a resource editor!

Match two strings in one line with grep

You can use

grep 'string1' filename | grep 'string2'

Or

grep 'string1.*string2\|string2.*string1' filename

Portable way to get file size (in bytes) in the shell

wc -c < filename (short for word count, -c prints the byte count) is a portable, POSIX solution. Only the output format might not be uniform across platforms as some spaces may be prepended (which is the case for Solaris).

Do not omit the input redirection. When the file is passed as an argument, the file name is printed after the byte count.

I was worried it wouldn't work for binary files, but it works OK on both Linux and Solaris. You can try it with wc -c < /usr/bin/wc. Moreover, POSIX utilities are guaranteed to handle binary files, unless specified otherwise explicitly.



Related Topics



Leave a reply



Submit