How to Check If One Two-Dimensional Numpy Array Contains a Specific Pattern of Values Inside It

How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?

Approach #1

This approach derives from a solution to Implement Matlab's im2col 'sliding' in python that was designed to rearrange sliding blocks from a 2D array into columns. Thus, to solve our case here, those sliding blocks from field_array could be stacked as columns and compared against column vector version of match_array.

Here's a formal definition of the function for the rearrangement/stacking -

def im2col(A,BLKSZ):   

# Parameters
M,N = A.shape
col_extent = N - BLKSZ[1] + 1
row_extent = M - BLKSZ[0] + 1

# Get Starting block indices
start_idx = np.arange(BLKSZ[0])[:,None]*N + np.arange(BLKSZ[1])

# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)

# Get all actual indices & index into input array for final output
return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel())

To solve our case, here's the implementation based on im2col -

# Get sliding blocks of shape same as match_array from field_array into columns
# Then, compare them with a column vector version of match array.
col_match = im2col(field_array,match_array.shape) == match_array.ravel()[:,None]

# Shape of output array that has field_array compared against a sliding match_array
out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1

# Now, see if all elements in a column are ONES and reshape to out_shape.
# Finally, find the position of TRUE indices
R,C = np.where(col_match.all(0).reshape(out_shape))

The output for the given sample in the question would be -

In [151]: R,C
Out[151]: (array([6]), array([3]))

Approach #2

Given that opencv already has template matching function that does square of differences, you can employ that and look for zero differences, which would be your matching positions. So, if you have access to cv2 (opencv module), the implementation would look something like this -

import cv2
from cv2 import matchTemplate as cv2m

M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
R,C = np.where(M==0)

giving us -

In [204]: R,C
Out[204]: (array([6]), array([3]))

Benchmarking

This section compares runtimes for all the approaches suggested to solve the question. The credit for the various methods listed in this section goes to their contributors.

Method definitions -

def seek_array(search_in, search_for, return_coords = False):
si_x, si_y = search_in.shape
sf_x, sf_y = search_for.shape
for y in xrange(si_y-sf_y+1):
for x in xrange(si_x-sf_x+1):
if numpy.array_equal(search_for, search_in[x:x+sf_x, y:y+sf_y]):
return (x,y) if return_coords else True
return None if return_coords else False

def skimage_based(field_array,match_array):
windows = view_as_windows(field_array, match_array.shape)
return (windows == match_array).all(axis=(2,3)).nonzero()

def im2col_based(field_array,match_array):
col_match = im2col(field_array,match_array.shape)==match_array.ravel()[:,None]
out_shape = np.asarray(field_array.shape) - np.asarray(match_array.shape) + 1
return np.where(col_match.all(0).reshape(out_shape))

def cv2_based(field_array,match_array):
M = cv2m(field_array.astype('uint8'),match_array.astype('uint8'),cv2.TM_SQDIFF)
return np.where(M==0)

Runtime tests -

Case # 1 (Sample data from question):

In [11]: field_array
Out[11]:
array([[ 24, 25, 26, 27, 28, 29, 30, 31, 23],
[ 33, 34, 35, 36, 37, 38, 39, 40, 32],
[-39, -38, -37, -36, -35, -34, -33, -32, -40],
[-30, -29, -28, -27, -26, -25, -24, -23, -31],
[-21, -20, -19, -18, -17, -16, -15, -14, -22],
[-12, -11, -10, -9, -8, -7, -6, -5, -13],
[ -3, -2, -1, 0, 1, 2, 3, 4, -4],
[ 6, 7, 8, 4, 5, 6, 7, 13, 5],
[ 15, 16, 17, 8, 9, 10, 11, 22, 14]])

In [12]: match_array
Out[12]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

In [13]: %timeit seek_array(field_array, match_array, return_coords = False)
1000 loops, best of 3: 465 µs per loop

In [14]: %timeit skimage_based(field_array,match_array)
10000 loops, best of 3: 97.9 µs per loop

In [15]: %timeit im2col_based(field_array,match_array)
10000 loops, best of 3: 74.3 µs per loop

In [16]: %timeit cv2_based(field_array,match_array)
10000 loops, best of 3: 30 µs per loop

Case #2 (Bigger random data):

In [17]: field_array = np.random.randint(0,4,(256,256))

In [18]: match_array = field_array[100:116,100:116].copy()

In [19]: %timeit seek_array(field_array, match_array, return_coords = False)
1 loops, best of 3: 400 ms per loop

In [20]: %timeit skimage_based(field_array,match_array)
10 loops, best of 3: 54.3 ms per loop

In [21]: %timeit im2col_based(field_array,match_array)
10 loops, best of 3: 125 ms per loop

In [22]: %timeit cv2_based(field_array,match_array)
100 loops, best of 3: 4.08 ms per loop

Find matching rows in 2 dimensional numpy array

You need the np.where function to get the indexes:

>>> np.where((vals == (0, 1)).all(axis=1))
(array([ 3, 15]),)

Or, as the documentation states:

If only condition is given, return condition.nonzero()

You could directly call .nonzero() on the array returned by .all:

>>> (vals == (0, 1)).all(axis=1).nonzero()
(array([ 3, 15]),)

To dissassemble that:

>>> vals == (0, 1)
array([[ True, False],
[False, False],
...
[ True, False],
[False, False],
[False, False]], dtype=bool)

and calling the .all method on that array (with axis=1) gives you True where both are True:

>>> (vals == (0, 1)).all(axis=1)
array([False, False, False, True, False, False, False, False, False,
False, False, False, False, False, False, True, False, False,
False, False, False, False, False, False], dtype=bool)

and to get which indexes are True:

>>> np.where((vals == (0, 1)).all(axis=1))
(array([ 3, 15]),)

or

>>> (vals == (0, 1)).all(axis=1).nonzero()
(array([ 3, 15]),)

I find my solution a bit more readable, but as unutbu points out, the following may be faster, and returns the same value as (vals == (0, 1)).all(axis=1):

>>> (vals[:, 0] == 0) & (vals[:, 1] == 1)

Finding Patterns in a Numpy Array

Couldn't you simply use np.where (assuming this is the optimal way to find an element) and then only check pattens which satisfy the first condition.

import numpy as np
values = np.array([0,1,2,1,2,4,5,6,1,2,1])
searchval = [1,2]
N = len(searchval)
possibles = np.where(values == searchval[0])[0]

solns = []
for p in possibles:
check = values[p:p+N]
if np.all(check == searchval):
solns.append(p)

print(solns)

Two Dimensional Pattern Matching with Wildcards in Numpy Arrays

This function takes an input_array, a pattern and a function that lets you identify the wildcard. Here I have used np.nan as wildcard but it could be anything, giving that you can make your own wildcard_function.

It works for arrays of any dimension (1 or more). I have tested it for your example and it seems ok.

from itertools import product
import numpy as np

def match_pattern(input_array, pattern, wildcard_function=np.isnan):

pattern_shape = pattern.shape
input_shape = input_array.shape

is_wildcard = wildcard_function(pattern) # This gets a boolean N-dim array

if len(pattern_shape) != len(input_shape):
raise ValueError("Input array and pattern must have the same dimension")

shape_difference = [i_s - p_s for i_s, p_s in zip(input_shape, pattern_shape)]

if any((diff < -1 for diff in shape_difference)):
raise ValueError("Input array cannot be smaller than pattern in any dimension")

dimension_iterators = [range(0, s_diff + 1) for s_diff in shape_difference]

# This loop will iterate over every possible "window" given the shape of the pattern
for start_indexes in product(*dimension_iterators):
range_indexes = [slice(start_i, start_i + p_s) for start_i, p_s in zip(start_indexes, pattern_shape)]
input_match_candidate = input_array[range_indexes]

# This checks that for the current "window" - the candidate - every element is equal
# to the pattern OR the element in the pattern is a wildcard
if np.all(
np.logical_or(
is_wildcard, (input_match_candidate == pattern)
)
):
return True

return False

how to find indices of a 2d numpy array occuring in another 2d array

What you want is:

sum([row in small_array for row in big_array])

Example:

import numpy as np
big_array = np.array([[1., 2., 1.2], [5., 3., 0.12], [-1., 14., 0.], [-9., 0., 13.]])
small_array= np.array([[5., 3., 0.12], [-1., 14., 0.]])

result = sum([row in small_array for row in big_array])
print(result)

2


Edit (after clarifications):

A pythonic solution:

[i for i, brow in enumerate(big_array) for srow in small_array if all(srow == brow)]

Example:

big_array = np.array([[1., 2., 1.2], [5., 3., 0.12], [-1., 14., 0.], [-9., 0., 13.]])
small_array= np.array([[5., 3., 0.12], [-1., 14., 0.]])

result = [i for i, brow in enumerate(big_array) for srow in small_array if all(srow == brow)]

print(result)

[1, 2]

Note: you could probably do something better with np.where, if you have huge arrays you should look it up

Finding entries containing a substring in a numpy array?

We can use np.core.defchararray.find to find the position of foo string in each element of bar, which would return -1 if not found. Thus, it could be used to detect whether foo is present in each element or not by checking for -1 on the output from find. Finally, we would use np.flatnonzero to get the indices of matches. So, we would have an implementation, like so -

np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)

Sample run -

In [91]: bar
Out[91]:
array(['aaa', 'aab', 'aca'],
dtype='|S3')

In [92]: foo
Out[92]: 'aa'

In [93]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[93]: array([0, 1])

In [94]: bar[2] = 'jaa'

In [95]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[95]: array([0, 1, 2])

How to check if a subarray exists in a 2D array in python?

I think you want to perform a 2d cross-correlation
(Here is the scipy cross-correlation documentation)

It returns an array with each value being the similitude between the patern and the image at the coordinates of this value. If it's 1, that means the image is exatly the same as the pattern.

Check if 2d array exists in 3d array in Python?

There is a way in numpy , you can do with np.all

a = np.random.rand(3, 1, 2)
b = a[1][0]
np.all(np.all(a == b, 1), 1)
Out[612]: array([False, True, False])

Solution from bnaecker

np.all(a == b, axis=(1, 2))

If only want to check exit or not

np.any(np.all(a == b, axis=(1, 2)))


Related Topics



Leave a reply



Submit