Fastest 2D Convolution or Image Filter in Python

Fastest 2D convolution or image filter in Python

It really depends on what you want to do... A lot of the time, you don't need a fully generic (read: slower) 2D convolution... (i.e. If the filter is separable, you use two 1D convolutions instead... This is why the various scipy.ndimage.gaussian, scipy.ndimage.uniform, are much faster than the same thing implemented as a generic n-D convolutions.)

At any rate, as a point of comparison:

t = timeit.timeit(stmt='ndimage.convolve(x, y, output=x)', number=1,
setup="""
import numpy as np
from scipy import ndimage
x = np.random.random((2048, 2048)).astype(np.float32)
y = np.random.random((32, 32)).astype(np.float32)
""")
print t

This takes 6.9 sec on my machine...

Compare this with fftconvolve

t = timeit.timeit(stmt="signal.fftconvolve(x, y, mode='same')", number=1,
setup="""
import numpy as np
from scipy import signal
x = np.random.random((2048, 2048)).astype(np.float32)
y = np.random.random((32, 32)).astype(np.float32)
""")
print t

This takes about 10.8 secs. However, with different input sizes, using fft's to do a convolution can be considerably faster (Though I can't seem to come up with a good example, at the moment...).

Python image convolution using NumPy only

Here is a faster method using strides (note that view_as_windows uses numpy strides under the hood. If you have to strictly use numpy, simply use strides from numpy package. You can refer to @Divakar's answer for equivalent implementation of view_as_windows in numpy):

from skimage.util import view_as_windows
out = view_as_windows(new_img, (Hk,Wk))
out = np.einsum('ijkl,kl->ij',out,H)

output:

out: 
[[  0  -5 -10 -19]
 [-10  -7  -2 -18]
 [  5  -7  11   7]
 [-12  -9  11 -18]]

timing for your input:

#Method 1
[*] process time : 0.000615

#Method 2
[*] process time : 0.000475

#proposed method in this answer
[*] process time : 0.000348

timing for larger arrays using benchit:

(Method 2 has errors for larger arrays)

Method 3 is orders of magnitude faster than mehtod 1.

Sample Image

Fast 2D Convolution in C

If you're running on x86 only then consider using SSE or AVX SIMD optimisation. For double data the throughput improvement will be modest, but if you can switch to float then you may be able to get to around 4x improvement with SSE or 8x with AVX. There are a number of questions and answers about this very topic on StackOverflow already from which you may be able to get some ideas on the implementation. Alternatively there are also many libraries available which include high performance 2D convolution (filtering) routines, and these typically exploit SIMD for performance, e.g. Intel's IPP (commercial), or OpenCV (free).

Another possibility is to exploit multiple cores - split your image into blocks and run each block in its own thread. E.g. if you have a 4 core CPU then split your image into 4 blocks. (See pthreads).

You can of course combine both of the above ideas, if you really want to fully optimise this operation.

Some small optimisations which you can apply to your current code, and to any future implementations (e.g. SIMD):

if your kernels are symmetric (or odd-symmetric) then you can reduce the number of operations by adding (subtracting) symmetric input values and performing one multiply rather than two
for the separable case, rather than allocating a full frame temporary buffer, consider using a "strip-mining" approach - allocate a smaller buffer, which is full width, but a relatively small number of rows, then process your image in "strips", alternately applying the horizontal kernel and the vertical kernel. The advantage of this is that you have a much more cache-friendly access pattern and a smaller memory footprint.

A few comments on coding style:

the register keyword has been redundant for many years, and modern compilers will emit a warning if you try to you use it - save yourself some noise (and some typing) by ditching it
casting the result of malloc in C is frowned upon - it's redundant and potentially dangerous.
make any input parameters const (i.e. read-only) and use restrict for any parameters which can never be aliased (e.g. a and result) - this can not only help to avoid programming errors (at least in the case of const), but in some cases it can help the compiler to generate better optimised code (particularly in the case of potentially aliased pointers).

Convolve2d just by using Numpy

You could generate the subarrays using as_strided:

import numpy as np

a = np.array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

sub_shape = (3,3)
view_shape = tuple(np.subtract(a.shape, sub_shape) + 1) + sub_shape
strides = a.strides + a.strides

sub_matrices = np.lib.stride_tricks.as_strided(a,view_shape,strides)

To get rid of your second "ugly" sum, alter your einsum so that the output array only has j and k. This implies your second summation.

conv_filter = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])
m = np.einsum('ij,ijkl->kl',conv_filter,sub_matrices)

# [[ 6  7  8]
#  [11 12 13]
#  [16 17 18]]

FFT-based 2D convolution and correlation in Python

I found scipy.signal.fftconvolve, as also pointed out by magnus, but didn't realize at the time that it's n-dimensional. Since it's built-in and produces the right values, it seems like the ideal solution.

From Example of 2D Convolution:

In [1]: a = asarray([[ 1, 2, 3],
   ...:              [ 4, 5, 6],
   ...:              [ 7, 8, 9]])

In [2]: b = asarray([[-1,-2,-1],
   ...:              [ 0, 0, 0],
   ...:              [ 1, 2, 1]])

In [3]: scipy.signal.fftconvolve(a, b, mode = 'same')
Out[3]: 
array([[-13., -20., -17.],
       [-18., -24., -18.],
       [ 13.,  20.,  17.]])

Correct! The STSCI version, on the other hand, requires some extra work to make the boundaries correct?

In [4]: stsci.convolve2d(a, b, fft = True)
Out[4]: 
array([[-12., -12., -12.],
       [-24., -24., -24.],
       [-12., -12., -12.]])

(The STSCI method also requires compiling, which I was unsuccessful with (I just commented out the non-python parts), has some bugs like this and modifying the inputs ([1, 2] becomes [[1, 2]]), etc. So I changed my accepted answer to the built-in fftconvolve() function.)

Correlation, of course, is the same thing as convolution, but with one input reversed:

In [5]: a
Out[5]: 
array([[3, 0, 0],
       [2, 0, 0],
       [1, 0, 0]])

In [6]: b
Out[6]: 
array([[3, 2, 1],
       [0, 0, 0],
       [0, 0, 0]])

In [7]: scipy.signal.fftconvolve(a, b[::-1, ::-1])
Out[7]: 
array([[ 0., -0.,  0.,  0.,  0.],
       [ 0., -0.,  0.,  0.,  0.],
       [ 3.,  6.,  9.,  0.,  0.],
       [ 2.,  4.,  6.,  0.,  0.],
       [ 1.,  2.,  3.,  0.,  0.]])

In [8]: scipy.signal.correlate2d(a, b)
Out[8]: 
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [3, 6, 9, 0, 0],
       [2, 4, 6, 0, 0],
       [1, 2, 3, 0, 0]])

and the latest revision has been sped up by using power-of-two sizes internally (and then I sped it up more by using real FFT for real input and using 5-smooth lengths instead of powers of 2 :D ).

Fastest 2D Convolution or Image Filter in Python