Byte[] Array Pattern Search

Most efficient way to find pattern in byte array

You want to be using a for loop to check your array. The reason why your code is slow is rather simple.

Decompilation shows why:

public static IEnumerable<TSource> Skip<TSource>(this IEnumerable<TSource> source, int count)
{
if (source == null)
throw Error.ArgumentNull("source");
else
return Enumerable.SkipIterator<TSource>(source, count);
}

private static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (count > 0 && enumerator.MoveNext())
--count;
if (count <= 0)
{
while (enumerator.MoveNext())
yield return enumerator.Current;
}
}
}

For each for you're looping you're performing a skip, basically unnecessairily iterating over your array again.

SOME Linq operations contain optimizations to use indexers when possible - skip is not one of them unfortunately.

PS:

If i was you i'd change your code to something like

var search = new byte[] {0xff, 0xd8};
var current = new byte[2];
var maxSearchRange = bytes.Length -1;
for (var index = 0; index < maxSearchRange; index++)
{
current[0] = bytes[index];
current[1] = bytes[index+1];

if((search).SequenceEqual(current))
...

Find byte sequence within a byte array

You can use the Boyer-Moore algorithm to efficiently search for a sequence of bytes in an array of bytes.

Here's a C# version I converted from the Java version from the Wikipedia entry on Boyer-Moore.

public sealed class BoyerMoore
{
readonly byte[] needle;
readonly int[] charTable;
readonly int[] offsetTable;

public BoyerMoore(byte[] needle)
{
this.needle = needle;
this.charTable = makeByteTable(needle);
this.offsetTable = makeOffsetTable(needle);
}

public IEnumerable<int> Search(byte[] haystack)
{
if (needle.Length == 0)
yield break;

for (int i = needle.Length - 1; i < haystack.Length;)
{
int j;

for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
{
if (j != 0)
continue;

yield return i;
i += needle.Length - 1;
break;
}

i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
}
}

static int[] makeByteTable(byte[] needle)
{
const int ALPHABET_SIZE = 256;
int[] table = new int[ALPHABET_SIZE];

for (int i = 0; i < table.Length; ++i)
table[i] = needle.Length;

for (int i = 0; i < needle.Length - 1; ++i)
table[needle[i]] = needle.Length - 1 - i;

return table;
}

static int[] makeOffsetTable(byte[] needle)
{
int[] table = new int[needle.Length];
int lastPrefixPosition = needle.Length;

for (int i = needle.Length - 1; i >= 0; --i)
{
if (isPrefix(needle, i + 1))
lastPrefixPosition = i + 1;

table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
}

for (int i = 0; i < needle.Length - 1; ++i)
{
int slen = suffixLength(needle, i);
table[slen] = needle.Length - 1 - i + slen;
}

return table;
}

static bool isPrefix(byte[] needle, int p)
{
for (int i = p, j = 0; i < needle.Length; ++i, ++j)
if (needle[i] != needle[j])
return false;

return true;
}

static int suffixLength(byte[] needle, int p)
{
int len = 0;

for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j)
++len;

return len;
}
}

Here's some console app test code for it:

public static void Main()
{
byte[] haystack = new byte[10000];
byte[] needle = { 0x00, 0x69, 0x73, 0x6F, 0x6D };

// Put a few copies of the needle into the haystack.

for (int i = 1000; i <= 9000; i += 1000)
Array.Copy(needle, 0, haystack, i, needle.Length);

var searcher = new BoyerMoore(needle);

foreach (int index in searcher.Search(haystack))
Console.WriteLine(index);
}

Note how the Search() method returns the indices of all the locations of the start of needle inside haystack.

If you just wanted the count, you could just do:

int count = new BoyerMoore(needle).Search(haystack).Count();

For your second question: I assume you are asking about finding the longest repeated sequence of bytes?

That's a much more complicated - and very different - question. If you want an answer for that, you should ask a separate question for it, but you should read the Wikipedia entry on the "longest repeated substring problem".

How to search from a string array in a byte array?

Not sure if I understood your intentions correctly. But that is what comes to my mind

//String array with some of the patterns I'm looking for in the byte array
string[] patterns = { "05805A6C", "0580306C", "05801B6C" };

foreach (string p in patterns)
{
int i=0;
int indice = 0;

// teminate loop when no more occurrence is found;
// using a for loop with i++ is probably wrong since
// it skips one additional character after a found pattern
while (indice!=-1)
{
// index if the pattern is found AFTER i position, -1 if not
indice = hex.IndexOf(p, i);

//Do some calculations to get the offset I desire to register
i = indice+ 8; // skip the pattern occurrence itself
int index = (i / 2);

//Transform the index into hexadecimal
string outputHex = int.Parse(index.ToString()).ToString("X");

//Output the index as an hexadecimal offset address
MessageBox.Show("0x" + outputHex);
}
}

By dealing with the patterns separately you also get a more orderly output. Plus you can define a dedicated method for the single pattern search.

Edit: as to your question about ordering (I suppose you mean reorder from greatest to least, right?), change the code like so

//String array with some of the patterns I'm looking for in the byte array
string[] patterns = { "05805A6C", "0580306C", "05801B6C" };

foreach (string p in patterns)
{
List<int> allIndices = new List<int>();

int i=0;
int indice = 0;

// teminate loop when no more occurrence is found;
// using a for loop with i++ is probably wrong since
// it skips one additional character after a found pattern
while (indice!=-1)
{
// index if the pattern is found AFTER i position, -1 if not
indice = hex.IndexOf(p, i);

i = indice+ 8; // skip the pattern occurrence itself

// temporarily store the occured indices
if (indice != -1) allIndices.Add(i);
}

// does what it says :-)
allIndices.Reverse();

// separate loop for the output
foreach (int j in allIndices)
{
//Do some calculations to get the offset I desire to register
int index = (j / 2);

//Transform the index into hexadecimal
string outputHex = int.Parse(index.ToString()).ToString("X");

//Output the index as an hexadecimal offset address
MessageBox.Show("0x" + outputHex);
}
}

How to search in a BYTE array for a pattern?

Since you're in C++, do it the C++ way:

char a[] = { 0, 0, 0, 0xFC };
char Buffer[20000] = ...

std::string needle(a, a + 4);
std::string haystack(Buffer, Buffer + 20000); // or "+ sizeof Buffer"

std::size_t n = haystack.find(needle);

if (n == std::string::npos)
{
// not found
}
else
{
// position is n
}

You can also use an algorithm to search the array directly:

#include <algorithm>
#include <iterator>

auto it = std::search(
std::begin(Buffer), std::end(Buffer),
std::begin(a), std::end(a));

if (it == std::end(Buffer))
{
// not found
}
else
{
// subrange found at std::distance(std::begin(Buffer), it)
}

Or, in C++17, you can use a string view:

std::string_view sv(std::begin(Buffer), std::end(Buffer));

if (std::size_t n = sv.find(needle); n != sv.npos)
{
// found at position n
}
else
{
// not found
}

How do I find Byte pattern in a byte array in C#?

You said you're not a good programmer. I'll make some basic suggestions then.

First, for any problem, break it down. Try to break it up into simpler sub-problems that you can solve, or think will be easier to solve. You said you wanted to check for patterns in a file, and then maybe modify some of the data and write it back out. As a first step, I'd say these are the major sub-problems you need to solve in order to get to a complete solution.

  • read the file
  • search for patterns in the binary data
  • make the modifications
  • write the modified data out.

You focused mostly on "search for patterns" part, so I will focus further comments on that. All those other pieces can also be tricky, but you can either solve them yourself or look for hints on stackoverflow or in other places for how other people solved them.

ok, now, regarding "search for patterns" . In my reading of your description,

It seems to me the first byte in your byte array is the text length. In C# code, a data array that exhibits your pattern could be declared explicitly like this:

byte[] data = new byte[] { 9, 0x49, 0x6E, 0x76, 0x65, 0x6E, 0x74,
0x6F, 0x72, 0x79, 0x0A, 0, 0, 0, 2, 1, 0 } ;

of course, you won't be declaring the array explicitly. You'll be creating or filling array via the file-read operation. The file-read is still an open problem at this point, but that's ok - you can solve it separately. And, the effect is the same - when you have successfully read the file, you will have a byte array that looks like the array that is explicitly declared above.

ok, now how do you manipulate that data? The first byte is the length of string data. The next N bytes are the string data. Then you have some other stuff.

you didn't describe exactly what "modifications" you want to make, but with the information you have provided, it should be easy to walk through the data.

If you want to get a "slice" of the data in an array, you can do this .

int length = (int) data[0];
byte[] s = new byte[length];
Array.Copy(data, 0, s, 0, length);
// now, s contains N bytes, representing the string

The next thing you might want to do is get an actual string out of those bytes you've sliced out. To do that, use the Encoding class:

String word = System.Text.Encoding.UTF8.GetString(s);

I don't understand the pattern for the data that follows the string data. In some cases it's a 00, in some cases it's 0A.

But maybe you know the pattern. Follow along the same way and you can walk through all the data for one "record". When you get to the end of the record, then start again and process the next one. The key to getting to the next "Record" is knowing how large a slice of data to take out of the data you have read from the file.


This brings us back to the "reading the file" sub-problem. One way to get the right amount of data from the file is to read only the first byte, then read N bytes further in the file. (See System.IO.Stream.Read).

byte[] size = new byte[1];
var fileStream = File.OpenRead(path);
int offset = 0;
int n;
n = fileStream.Read(size, offset, 1);
// size[0] now contains the byte of data indicating the number of bytes to follow

At this point you can read the next N bytes:

int length = (int)size[0];
byte[] stringData = new byte[length];
n = fileStream.Read(stringData, offset, length);

Now you can get the string.

String word = System.Text.Encoding.UTF8.GetString(stringData);

...and so on. Read all the trailing bytes. At that point the cursor that is implicitly held by the fileStream points to the next record in the file.

Find indexOf a byte array within another byte array

Java strings are composed of 16-bit chars, not of 8-bit bytes. A char can hold a byte, so you can always make your byte arrays into strings, and use indexOf: ASCII characters, control characters, and even zero characters will work fine.

Here is a demo:

byte[] big = new byte[] {1,2,3,0,4,5,6,7,0,8,9,0,0,1,2,3,4};
byte[] small = new byte[] {7,0,8,9,0,0,1};
String bigStr = new String(big, StandardCharsets.UTF_8);
String smallStr = new String(small, StandardCharsets.UTF_8);
System.out.println(bigStr.indexOf(smallStr));

This prints 7.

However, considering that your large array could be up to 10,000 bytes, and the small array is only ten bytes, this solution may not be the most efficient, for two reasons:

  • It requires copying your big array into an array that is twice as large (same capacity, but with char instead of byte). This triples your memory requirements.
  • String search algorithm of Java is not the fastest one available. You may get sufficiently faster if you implement one of the advanced algorithms, for example, the Knuth–Morris–Pratt one. This could potentially bring the execution speed down by a factor of up to ten (the length of the small string), and will require additional memory that is proportional to the length of the small string, not the big string.

C# How do I find a byte[] pattern at a specific offset

You will need a third parameter offset.

  • Start searching at offset
  • You can short-circuit the search as shown below, if the needle is too long compared to the haystack and offset.
  • Instead of copying arrays, which will take up more memory and probably also time, just search inside the arrays themselves.
  • This is still O(m * n) worst-case time complexity. You can make it O(m) by using a more sophisticated algorithm like Knuth-Morris-Pratt (but it is much more complicated and the overhead will not be worth it, for small inputs).
        public static int Search(byte[] needle, byte[] haystack, int offset){
if (haystack.Length - offset < needle.Length) {
return -1; // can't possibly find b/c needle is too long
}
for (int i = offset; i < haystack.Length; ++i) {
for (int j = 0; j < needle.Length; ++j) {
if ((i + j) >= haystack.Length || needle[j] != haystack[i + j]) {
break; // found non-match for attempt
}
if (j == needle.Length - 1) {
return i; // matched all
}
}
}
return -1;
}

The above returns 4 for your example input.



Related Topics



Leave a reply



Submit