Implement Matlab's Im2Col 'Sliding' in Python

Implement MATLAB's im2col 'sliding' in Python

Approach #1

We could use some broadcasting here to get all the indices of all those sliding windows in one go and thus with indexing achieve a vectorized solution. This is inspired by Efficient Implementation of im2col and col2im.

Here's the implementation -

def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
    # Parameters
    M,N = A.shape
    col_extent = N - BSZ[1] + 1
    row_extent = M - BSZ[0] + 1
    
    # Get Starting block indices
    start_idx = np.arange(BSZ[0])[:,None]*N + np.arange(BSZ[1])
    
    # Get offsetted indices across the height and width of input array
    offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)
    
    # Get all actual indices & index into input array for final output
    return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel()[::stepsize])

Approach #2

Using newly gained knowledge of NumPy array strides that lets us create such sliding windows, we would have another efficient solution -

def im2col_sliding_strided(A, BSZ, stepsize=1):
    # Parameters
    m,n = A.shape
    s0, s1 = A.strides    
    nrows = m-BSZ[0]+1
    ncols = n-BSZ[1]+1
    shp = BSZ[0],BSZ[1],nrows,ncols
    strd = s0,s1,s0,s1
    
    out_view = np.lib.stride_tricks.as_strided(A, shape=shp, strides=strd)
    return out_view.reshape(BSZ[0]*BSZ[1],-1)[:,::stepsize]

Approach #3

The strided method listed in the previous approach has been incorporated into scikit-image module for a less messier, like so -

from skimage.util import view_as_windows as viewW

def im2col_sliding_strided_v2(A, BSZ, stepsize=1):
    return viewW(A, (BSZ[0],BSZ[1])).reshape(-1,BSZ[0]*BSZ[1]).T[:,::stepsize]

Sample runs -

In [106]: a      # Input array
Out[106]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [107]: im2col_sliding_broadcasting(a, (2,3))
Out[107]: 
array([[ 0,  1,  2,  5,  6,  7, 10, 11, 12],
       [ 1,  2,  3,  6,  7,  8, 11, 12, 13],
       [ 2,  3,  4,  7,  8,  9, 12, 13, 14],
       [ 5,  6,  7, 10, 11, 12, 15, 16, 17],
       [ 6,  7,  8, 11, 12, 13, 16, 17, 18],
       [ 7,  8,  9, 12, 13, 14, 17, 18, 19]])

In [108]: im2col_sliding_broadcasting(a, (2,3), stepsize=2)
Out[108]: 
array([[ 0,  2,  6, 10, 12],
       [ 1,  3,  7, 11, 13],
       [ 2,  4,  8, 12, 14],
       [ 5,  7, 11, 15, 17],
       [ 6,  8, 12, 16, 18],
       [ 7,  9, 13, 17, 19]])

Runtime test

In [183]: a = np.random.randint(0,255,(1024,1024))

In [184]: %timeit im2col_sliding(img, (8,8), skip=1)
     ...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=1)
     ...: %timeit im2col_sliding_strided(img, (8,8), stepsize=1)
     ...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=1)
     ...: 
1 loops, best of 3: 1.29 s per loop
1 loops, best of 3: 226 ms per loop
10 loops, best of 3: 84.5 ms per loop
10 loops, best of 3: 111 ms per loop

In [185]: %timeit im2col_sliding(img, (8,8), skip=4)
     ...: %timeit im2col_sliding_broadcasting(img, (8,8), stepsize=4)
     ...: %timeit im2col_sliding_strided(img, (8,8), stepsize=4)
     ...: %timeit im2col_sliding_strided_v2(img, (8,8), stepsize=4)
     ...: 
1 loops, best of 3: 1.31 s per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 84.4 ms per loop
10 loops, best of 3: 109 ms per loop

Around 16x speedup there with the strided method over the original loopy version!

Implement MATLAB's col2im 'sliding' in Python

For im2col 'sliding'version, we already have few efficient implementations as posted here.

For col2im 'sliding'version, it would just be a reshape away -

def col2im_sliding(B, block_size, image_size):
    m,n = block_size
    mm,nn = image_size
    return B.reshape(nn-n+1,mm-m+1).T 
      # Or simply B.reshape(nn-n+1,-1).T
      # Or B.reshape(mm-m+1,nn-n+1,order='F')

Test runs

MATLAB run -

>> arr = 1:15;
>> arr
arr =
     1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
>> col2im(arr, [3,3], [7,5],'sliding')
ans =                
     1     6    11  %// (7-3+1, 5-3+1) shaped array
     2     7    12
     3     8    13
     4     9    14
     5    10    15

NumPy/Python run -

In [116]: arr
Out[116]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [117]: col2im_sliding(arr, [3,3], [7,5])
Out[117]: 
array([[ 1,  6, 11],  # (7-3+1, 5-3+1) shaped array
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 5, 10, 15]])

Setup sliding windows as columns (IM2COL from MATLAB) in multi-dimensional array - Python

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.

from skimage.util.shape import view_as_windows

W1,W2 = 2,2 # window size

# create sliding windows along last two axes1
w = view_as_windows(arr,(1,1,W1,W2))[...,0,0,:,:]

# Merge the window axes (tha last two axes) and 
# merge the axes along which those windows were created (3rd and 4th axes)
outshp = arr.shape[:-2] + (W1*W2,) + ((arr.shape[-2]-W1+1)*(arr.shape[-1]-W2+1),)
out = w.transpose(0,1,4,5,2,3).reshape(outshp)

The last step forces a copy. So, skip it if possible.

Efficient Implementation of `im2col` and `col2im`

I can only hope that Mathworks guys don't sue you or me or Stackoverflow for that matter, trying to create vectorized implementations of their IP toolbox functions, as they have put price on that toolbox. But in any case, forgetting those issues, here are the implementations.

Replacement for im2col with 'sliding' option

I wasn't able to vectorize this until I sat down to write solution to another problem on Stackoverflow. So, I would strongly encourage to look into it too.

function out = im2col_sliding(A,blocksize)

nrows = blocksize(1);
ncols = blocksize(2);

%// Get sizes for later usages
[m,n] = size(A);

%// Start indices for each block
start_ind = reshape(bsxfun(@plus,[1:m-nrows+1]',[0:n-ncols]*m),[],1); %//'

%// Row indices
lin_row = permute(bsxfun(@plus,start_ind,[0:nrows-1])',[1 3 2]);  %//'

%// Get linear indices based on row and col indices and get desired output
out = A(reshape(bsxfun(@plus,lin_row,[0:ncols-1]*m),nrows*ncols,[]));

return;

Replacement for im2col with 'distinct' option

function out = im2col_distinct(A,blocksize)

nrows = blocksize(1);
ncols = blocksize(2);
nele = nrows*ncols;

row_ext = mod(size(A,1),nrows);
col_ext = mod(size(A,2),ncols);

padrowlen = (row_ext~=0)*(nrows - row_ext);
padcollen = (col_ext~=0)*(ncols - col_ext);

A1 = zeros(size(A,1)+padrowlen,size(A,2)+padcollen);
A1(1:size(A,1),1:size(A,2)) = A;

t1 = reshape(A1,nrows,size(A1,1)/nrows,[]);
t2 = reshape(permute(t1,[1 3 2]),size(t1,1)*size(t1,3),[]);
t3 =  permute(reshape(t2,nele,size(t2,1)/nele,[]),[1 3 2]);
out = reshape(t3,nele,[]);

return;

Some quick tests show that both these implementations particularly sliding one for small to decent sized input data and distinct for all datasizes perform much better than the in-built MATLAB function implementations in terms of runtime performance.

How to use

With in-built MATLAB function - 
B = im2col(A,[nrows ncols],'sliding')

With our custom function - 
B = im2col_sliding(A,[nrows ncols])

%// ------------------------------------

With in-built MATLAB function - 
B = im2col(A,[nrows ncols],'distinct')

With our custom function - 
B = im2col_distinct(A,[nrows ncols])

Overlapping sliding window over an image using blockproc or im2col?

OK, this is quite a complex question. I'll try and break this up into separate parts and will answer each question separately.

Question #1

blockproc can be used to implement my function on a sliding window using the BorderSize and TrimBorder arguments.
B = blockproc(A,[64,64],fun,'BorderSize',[5,5], 'TrimBorder', 'false');
I realize that this creates a block of [64 + 2*5, 64 + 2*5] and applies the function @fun on each block. But since I cannot go into my function @fun in debugging to verify proper operation I cannot be sure this is what I need. Is my above code correct for what I need? I know that I get a concatenated result in B but it should be on a overlapping sliding block. Will this achieve what I need?

After experimenting around with blockproc, this is indeed correct where you can use it to get sliding neighbourhood processing working. However, you're going to need an additional flag, which is PadPartialBlocks. The purpose of this flag is so that if you are extracting a block where you're at the outer edges of the image and you can't make a block of a specified size, this will zero-pad this partial block so that it conforms to the same size. Here's a small example to get this working with sliding windows. Supposing we had a matrix such that:

>> A = reshape(1:25,5,5)

A =

     1     6    11    16    21
     2     7    12    17    22
     3     8    13    18    23
     4     9    14    19    24
     5    10    15    20    25

Let's say we wanted to take the average of each 3 x 3 overlapping neighbourhood in the matrix above, and zero-padding those elements that go beyond the borders of the matrix. You would do this with blockproc:

B = blockproc(A, [1 1], @(x) mean(x.data(:)), 'BorderSize', [1 1], 'TrimBorder', false, 'PadPartialBlocks', true);

What's important to note is that the block size, which is 1 x 1 in this case and BorderSize which is 1 x 1 as well are set differently than what you'd expect for a 3 x 3 block. To go into why this is the case, we need some further insight on how BorderSize works. For a given centre of a block, BorderSize allows you to capture values / pixels beyond the dimensions of the originally sized block. For those locations that go beyond the borders of the matrix, we would pad these locations as zero by default. BorderSize allows us to capture 2M + 2N pixels more, where M and N are the horizontal and vertical border size you want. This would allow us to capture M more pixels both above and below the original block and N more pixels to the left and right of the original block.

Therefore, for the value of 1 in A, if the block size is 1 x 1, this means that the element consists of only 1, and if our BorderSize was 1 x 1. This means our final block would be:

0  0  0
0  1  6
0  2  7

Because our block size is 1, the next block would be centred at 6, and we would get a 3 x 3 grid of pixels and so on. It is also important that TrimBorder is set to false so that we can keep those pixels that were originally captured upon expansion of the block. The default is set to true. Finally, PadPartialBlocks is true to ensure that all blocks are the same size. When you run the above code, the result we get is:

B =

    1.7778    4.3333    7.6667   11.0000    8.4444
    3.0000    7.0000   12.0000   17.0000   13.0000
    3.6667    8.0000   13.0000   18.0000   13.6667
    4.3333    9.0000   14.0000   19.0000   14.3333
    3.1111    6.3333    9.6667   13.0000    9.7778

You can verify that we get the same result using nlfilter where we can apply the mean to 3 x 3 sliding neighbourhoods:

C = nlfilter(A, [3 3], @(x) mean(x(:)))

C =

    1.7778    4.3333    7.6667   11.0000    8.4444
    3.0000    7.0000   12.0000   17.0000   13.0000
    3.6667    8.0000   13.0000   18.0000   13.6667
    4.3333    9.0000   14.0000   19.0000   14.3333
    3.1111    6.3333    9.6667   13.0000    9.7778

As such, if you want to properly use blockproc for sliding operations, you need to be careful on how you set the block size and border size respectively. In this case, the general rule is to always set your block size to be 1 x 1, and allow BorderSize to specify the size of each block you want. Specifically, for a block of size K x K, you would set BorderSize to be floor(K/2) x floor(K/2) respectively. It would make things easy if K was odd.

For example, if you wanted a 5 x 5 mean filtering operation on a sliding window basis, you would set BorderSize to [2 2], as K = 5 and floor(K/2) = 2. Therefore, you would do this:

B = blockproc(A, [1 1], @(x) mean(x.data(:)), 'BorderSize', [2 2], 'TrimBorder', false, 'PadPartialBlocks', true)

B =

    2.5200    4.5600    7.2000    6.9600    6.1200
    3.6000    6.4000   10.0000    9.6000    8.4000
    4.8000    8.4000   13.0000   12.4000   10.8000
    4.0800    7.0400   10.8000   10.2400    8.8800
    3.2400    5.5200    8.4000    7.9200    6.8400

Replicating this with nlfilter with a size of 5 x 5 also gives:

C = nlfilter(A, [5 5], @(x) mean(x(:)))

C =

    2.5200    4.5600    7.2000    6.9600    6.1200
    3.6000    6.4000   10.0000    9.6000    8.4000
    4.8000    8.4000   13.0000   12.4000   10.8000
    4.0800    7.0400   10.8000   10.2400    8.8800
    3.2400    5.5200    8.4000    7.9200    6.8400

I was doing some timing tests, and it seems that blockproc used in this context is faster than nlfilter.

Question #2

The second is im2col. im2col(A,[m n],block_type) will divide the block into m by n blocks and arrange them in columns, so each column is a block? If so, how is the overlapping controlled? And if each block is a column can I successfully apply the dct2 function on each column? Because I doubt it will take vectors as input?

You are correct in that im2col transforms each pixel neighbourhood or block into a single column and the concatenation of these columns forms the output matrix. You can control whether the blocks are overlapping or are distinct by the block_type parameter. Specify distinct or sliding (which is default) to control this. You can also control the size of each neighbourhood with m and n.

However, if it is your goal to apply dct2 with the output of im2col, then you will not get what you desire. Specifically, dct2 takes the spatial location of each data point within your 2D data into account and is used as part of the transform. By transforming each pixel neighbourhood into a single column, the 2D spatial relationships that were originally there for each block are now gone. dct2 expects 2D spatial data, but you would be specifying 1D data instead. As such, im2col is probably not what you're looking for. If I understand what you want correctly, you'll want to use blockproc instead.

Hope this helps!

The im2col algorithm for ND input

Let me just start by saying that im2col is only intended for 2D matrices. The fact that it sometimes worked (and by that I mean returned a result without throwing an error) is just a happy coincidence.

Now I took a peek at edit im2col.m, and without studying the code too much, the first line of each of the distinct and sliding methods should give you an intuition of what's happening:

...
if strcmp(kind, 'distinct')
    [m,n] = size(a);
    ...
elseif strcmp(kind,'sliding')
    [ma,na] = size(a);
    ...
end
...

First recall that [s1,s2] = size(arr) where arr is a 3d array will collapse the size of 2nd and 3rd dimension into one size. Here's the relevant doc size:

[d1,d2,d3,...,dn] = size(X) returns the sizes of the dimensions of the array X, provided the number of output arguments n equals ndims(X). If n < ndims(X), di equals the size of the ith dimension of X for 0<i<n, but dn equals the product of the sizes of the remaining dimensions of X, that is, dimensions n through ndims(X).

So basically for an array of size M-by-N-by-P, the function instead thinks it's a matrix of size M-by-(N*P). Now MATLAB has some quirky indexing rules that lets you do things like:

>> x = reshape(1:4*3*2,4,3,2)
x(:,:,1) =
     1     5     9
     2     6    10
     3     7    11
     4     8    12
x(:,:,2) =
    13    17    21
    14    18    22
    15    19    23
    16    20    24
>> x(:,:)
ans =
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24

which is what I think ended up happening here. Here is an example to confirm the behavior of im2col on an RGB image:

% normal case (grayscale image)
>> M = magic(5);
>> B1 = im2col(M, [3 3], 'sliding');

% (RGB image)
>> MM = cat(3, M, M+50, M+100);
>> B2 = im2col(MM, [3 3], 'sliding');
>> B3 = im2col(reshape(MM, [5 5*3]), [3 3], 'sliding');
>> assert(isequal(B2,B3))

Note that B2 and B3 are equal, so basically think of the result of im2col on an array arr = cat(3,R,G,B) to be the same as that of arr = cat(2,R,G,B) (concatenated horizontally).

Interestingly, you won't get so lucky with "distinct" blocks method:

>> B1 = im2col(M, [3 3], 'distinct')    % works
% ..snip..

>> B2 = im2col(MM, [3 3], 'distinct')   % errors
Subscripted assignment dimension mismatch.
Error in im2col (line 59)
    aa(1:m,1:n) = a;

Now that we understand what was happening, let's think how to do this properly for 3D arrays.

In my opinion to implement im2col for color images, I would just run it on each color channel separately (each being a 2d matrix), and concatenate the result along the third dimension. So something like this wrapper function:

function B = im2col_rgb(img, sz, varargin)
    B = cell(1,size(img,3));
    for i=1:size(img,3)
        B{i} = im2col(img(:,:,i), sz, varargin{:});
    end
    B = cat(3, B{:});
end

Implement Matlab's Im2Col 'Sliding' in Python