Binary grep on Linux?
One-Liner Input
Here’s the shorter one-liner version:
% perl -ln0e 'print tell' < inputfile
And here's a slightly longer one-liner:
% perl -e '($/,$\) = ("\0","\n"); print tell while <STDIN>' < inputfile
The way to connect those two one-liners is by uncompiling the first one’s program:
% perl -MO=Deparse,-p -ln0e 'print tell'
BEGIN { $/ = "\000"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
chomp($_);
print(tell);
}
Programmed Input
If you want to put that in a file instead of a calling it from the command line, here’s a somewhat more explicit version:
#!/usr/bin/env perl
use English qw[ -no_match_vars ];
$RS = "\0"; # input separator for readline, chomp
$ORS = "\n"; # output separator for print
while (<STDIN>) {
print tell();
}
And here’s the really long version:
#!/usr/bin/env perl
use strict;
use autodie; # for perl5.10 or better
use warnings qw[ FATAL all ];
use IO::Handle;
IO::Handle->input_record_separator("\0");
IO::Handle->output_record_separator("\n");
binmode(STDIN); # just in case
while (my $null_terminated = readline(STDIN)) {
# this just *past* the null we just read:
my $seek_offset = tell(STDIN);
print STDOUT $seek_offset;
}
close(STDIN);
close(STDOUT);
One-Liner Output
BTW, to create the test input file, I didn’t use your big, long Python script; I just used this simple Perl one-liner:
% perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile
You’ll find that Perl often winds up being 2-3 times shorter than Python to do the same job. And you don’t have to compromise on clarity; what could be simpler that the one-liner above?
Programmed Output
I know, I know. If you don’t already know the language, this might be clearer:
#!/usr/bin/env perl
@values = (
0, 0, 0, 0, 2,
4, 6, 8, 0, 1,
3, 0, 5, 20,
);
print pack("C*", @values);
although this works, too:
print chr for @values;
as does
print map { chr } @values;
Although for those who like everything all rigorous and careful and all, this might be more what you would see:
#!/usr/bin/env perl
use strict;
use warnings qw[ FATAL all ];
use autodie;
binmode(STDOUT);
my @octet_list = (
0, 0, 0, 0, 2,
4, 6, 8, 0, 1,
3, 0, 5, 20,
);
my $binary = pack("C*", @octet_list);
print STDOUT $binary;
close(STDOUT);
TMTOWTDI
Perl supports more than one way to do things so that you can pick the one that you’re most comfortable with. If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.
You can find documentation for Perl on your own system. Just type
% man perl
% man perlrun
% man perlvar
% man perlfunc
etc at your shell prompt. If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfunc from http://perldoc.perl.org.
Grep 'binary file matches'. How to get normal grep output?
Try:
grep --text
or
grep -a
for short. This is equivalent to --binary-files=text
and it should show the matches in binary files.
How to grep a text file which contains some binary data?
You could run the data file through cat -v
, e.g
$ cat -v tmp/test.log | grep re
line1 re ^@^M
line3 re^M
which could be then further post-processed to remove the junk; this is most analogous to your query about using tr
for the task.
-v
simply tells cat
to display non-printing characters.
Using grep to extract very specific strings from binary file
If your grep
supports -P
option, would you please try:
grep -a -Po "7[[:alnum:]]+(?=M-)" file
- The
-a
option forcesgrep
to read the input as a text file. - The
-P
option enables the perl-compatible regex. - The
-o
option tellsgrep
to print only the matched substring(s). - The pattern
(?=M-)
is a zero-width lookahead assertion (introduced in
Perl) without including it in the result.
Alternatively you can also say with sed
:
sed 's/M-/\n/g' file | sed -n 's/.*\(7[[:alnum:]]\+\).*/\1/p'
- The first
sed
command splits the input file into miltiple lines by
replacing the substringM-
with a newline.
It has two benefits: it breaks the lines to allow multiple matches withsed
and excludes the unnecessary portionM-
from the input. - The next
sed
command extracts the desired pattern from the input.
It assumes your sed
accepts \n
in the replacement, which is
a GNU extension (not POSIX compliant). Otherwise please try (in case you are working on bash):
sed 's/M-/\'$'\n''/g' file | sed -n 's/.*\(7[[:alnum:]]\+\).*/\1/p'
[UPDATE]
(The requirement has been updated by the OP and the followings are solutions according to it.)
Let me assume the string which starts with 7
and ends with M-
is always followed
by another (no more and no less than one) string which starts with 5x
and ends
with ^
(ascii caret character) with junks in between.
Then would you please try the following:
grep -aPo "7[[:alnum:]]+M-.*?5x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|5x[[:alnum:]]+(?=\^)"
- It executes the task in two steps (two cascaded greps).
- The 1st grep narrows down the input data into the candidate substring
which will include the desired two sequences and junks in between. - The regex
.*?
in between matches any (ascii or binary) characters
except for a newline character.
The trailing?
enables theshortest match
which avoids the overrun due to thegreedy
nature of regex. The regex is intended to match junks in between. - The 2nd grep includes two regex's merged with a pipe
|
meaning logicalOR
.
Then it extracts two desired sequences.
A potential problem of grep
solution is that grep
is a line oriented command
and cannot include the newline character in the matched string.
If a newline character is included in the junks in between
(I'm not sure about the possibility), the above solution will fail.
As a workaround, perl
will provide flexible manipulations with binary data.
perl -0777 -ne '
while (/(7[[:alnum:]]+)M-.*?(5x[[:alnum:]]+)\^/sg) {
printf("%s\n%s\n", $1, $2);
}
' file
- The regex is mostly same as that of
grep
because the-P
option ofgrep
means
perl-compatible. - It can capture multiple patterns at once in variables
$1
and$2
hence just one regex is enough. - The
-0777
option to theperl
command tellsperl
to slurp all data
at once. - The
s
option at the end the regex makes a dot match a newline character. - The
g
option enables theglobal
(multiple) match.
[UPDATE2]
In order to make the regex match either 5x
or 6x
, replace 5x
with (5|6)x
.
Namely:
grep -aPo "7[[:alnum:]]+M-.*?(5|6)x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|(5|6)x[[:alnum:]]+(?=\^)"
As mentioned before, the pipe |
means OR
. The OR
operator has the lowest priority in the evaluation, hence you need to enclose them with parens in this case.
If there is a possibility any other number than 5 or 6 may appear, it will be safer to put [[:digit:]]
instead, which matches any one digit betweeen 0 and 9:
grep -aPo "7[[:alnum:]]+M-.*?[[:digit:]]x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|[[:digit:]]x[[:alnum:]]+(?=\^)"
[UPDATE3]
(Answering the OP's requirement on March 9th)
Let me start with a perl
code which regex will be relatively easier
to explain.
perl -0777 -ne 'while (/(1(.{3}).+)k([AB].*)[\013 ]\2/g){print "$1 $3\n"}' file
Output:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc B4m4zT7Yg042KIDYUE82e893hY
1zzzsx9YPkr8Rvs75tJYWZq3eo8Pgwbc A2m4zT7Yg042KIDYUE82e893hY
[Explanation of regex]
(1(.{3}).+)k([AB].*)[\013 ]\2
( start of the 1st capture group referred by $1 later
1 literal "1"
( start of the 2nd capture group referred by \2 later
.{3} a sequence of the identical three characters such as ppp or zzz
) end of the 2nd capture group
.+ followed by any characters with "greedy" match which may include the 1st "k"
) end of the 1st capture group
k literal "k"
( start of the 3rd capture group referred by $3 later
[AB].* the character "A" or "B" followed by any characters
) end of the 3rd capture group
[\013 ] followed by ^K or a whitespace
\2 followed by the capture group 2 previously assigned
When implementing it with grep
, we will encounter a limitation of grep
.
Although we want to extract multiple patterns from the input file,
the -e
option (which can specify multiple search patterns) does not
work with -P
option. Then we need to split the regex into two patterns
such as:
grep -Po "(1(.{3}).+)(?=k([AB].*)[\013 ]\2)" file
grep -Po "(1(.{3}).+)k\K([AB].*)(?=[\013 ]\2)" file
And the result will be:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc
1zzzsx9YPkr8Rvs75tJYWZq3eo8Pgwbc
B4m4zT7Yg042KIDYUE82e893hY
A2m4zT7Yg042KIDYUE82e893hY
Please be noted the order of output is not same as the order of appearance in the original file.
Another option will be to introduce ripgrep
or rg
which is a fast
and versatile version of grep
. You may need to install ripgrep withsudo apt install ripgrep
or using other package handling tool.
An advantage of ripgrep
is it supports -r
(replace) option in which
you can make use of the backreferences:
rg -N -Po "(1(.{3}).+)k([AB].*)[\013 ]\2" -r '$1 $3' file
The -r '$1 $3'
option prints the 1st and the 3rd capture groups and the result will be the same as perl
.
grep offset of ascii string from binary file
You could use strings
for this:
strings -a -t x filename | grep foobar
Tested with GNU binutils.
For example, where in /bin/ls
does --help
occur:
strings -a -t x /bin/ls | grep -- --help
Output:
14938 Try `%s --help' for more information.
162f0 --help display this help and exit
How to suppress binary file matching results in grep
There are three options, that you can use. -I
is to exclude binary files in grep. Other are for line numbers and file names.
grep -I -n -H
-I -- process a binary file as if it did not contain matching data;
-n -- prefix each line of output with the 1-based line number within its input file
-H -- print the file name for each match
So this might be a way to run grep:
grep -InH your-word *
Diff command along with Grep gives Binary file (standard input) matches
From man grep:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
--binary-files=TYPE
If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is
binary, and grep normally outputs either a one-line message saying
that a binary file matches, or no message if there is no match. If
TYPE is without-match, grep assumes that a binary file does not match;
this is equivalent to the -I option. If TYPE is text, grep processes a
binary file as if it were text; this is equivalent to the -a option.
Warning: grep --binary-files=text might output binary garbage, which
can have nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands.
grep
scans the file, and if it finds any unreadable characters, it assumes the file is in binary. Add -a
switch to grep
to make it treat the file a readable text. Most probably your input files contain some unreadable characters.
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<"
Wouldn't be comm -13 <(...) <(...)
faster and simpler?
Related Topics
How to Untar a Tar.Bz File in Unix
What's the Difference Between Tempfile and Mktemp
Using Assertion in the Linux Kernel
Pipe Output to Use as the Search Specification for Grep on Linux
How to Get the First Column of Comm Output
Cmake Doesn't Know Where Is Qt4 Qmake
Bash: Add String to the End of the File Without Line Break
Stripping Single and Double Quotes in a String Using Bash/Standard Linux Commands Only
Understanding Bash Short-Circuiting
Bash File Is Running Fine in Windows for Testng But It Is Not Working in Linux/Mac
Why Does This Code Crash with Address Randomization On
Doesn't Sh Support Process Substitution <(...)
In Order to Write Pci Ethernet Driver. How to Implement Mmap in the Pci Ethernet Driver
Should Linux Cron Jobs Be Specified with an "&" to Indicate to Run in Background
About Fork and Execve System Call