Differencebetween 'Same' and 'Valid' Padding in Tf.Nn.Max_Pool of Tensorflow

What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?

I'll give an example to make it clearer:

x: input image of shape [2, 3], 1 channel
valid_pad: max pool with 2x2 kernel, stride 2 and VALID padding.
same_pad: max pool with 2x2 kernel, stride 2 and SAME padding (this is the classic way to go)

The output shapes are:

valid_pad: here, no padding so the output shape is [1, 1]
same_pad: here, we pad the image to the shape [2, 4] (with -inf and then apply max pool), so the output shape is [1, 2]

x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool

valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

valid_pad.get_shape() == [1, 1, 1, 1]  # valid_pad is [5.]
same_pad.get_shape() == [1, 1, 2, 1]   # same_pad is  [5., 6.]

How does tf.keras.layers.Conv2D with padding='same' and strides 1 behave?

Keras uses TensorFlow implementation of padding. All the details are available in the documentation here

First, consider the 'SAME' padding scheme. A detailed explanation of
the reasoning behind it is given in these notes. Here, we summarize
the mechanics of this padding scheme. When using 'SAME', the output
height and width are computed as:
out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))
The total padding applied along the height and width is computed as:
if (in_height % strides[1] == 0):
  pad_along_height = max(filter_height - strides[1], 0)
else:
  pad_along_height = max(filter_height - (in_height % strides[1]), 0)
if (in_width % strides[2] == 0):
  pad_along_width = max(filter_width - strides[2], 0)
else:
  pad_along_width = max(filter_width - (in_width % strides[2]), 0)
Finally, the padding on the top, bottom, left and right are:
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
Note that the division by 2 means that there might be cases when the
padding on both sides (top vs bottom, right vs left) are off by one.
In this case, the bottom and right sides always get the one additional
padded pixel. For example, when pad_along_height is 5, we pad 2 pixels
at the top and 3 pixels at the bottom. Note that this is different
from existing libraries such as cuDNN and Caffe, which explicitly
specify the number of padded pixels and always pad the same number of
pixels on both sides.
For the 'VALID' scheme, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))
and no padding is used.

Why use same padding with max pooling?

From https://keras.io/layers/convolutional/

"same" results in padding the input such that the output has the same
length as the original input.

From https://keras.io/layers/pooling/

pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

So, first let's start by asking why use padding at all? In the convolutional kernel context it is important since we don't want to miss each pixel being at the "center" of the kernel. There could be important behavior at the edges/corners of the image that a kernel is looking for. So we pad around the edges for Conv2D and as a result it returns the same size output as the input.

However, in the case of the MaxPooling2D layer we are padding for similar reasons, but the stride size is affected by your choice of pooling size. Since your pooling size is 2, your image will be halved each time you go through a pooling layer.

input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

So in the case of your tutorial example; your image dimensions will go from 28->14->7->4 with each arrow representing the pooling layer.

What does padding do in keras

With pad_sequences it simply indicates whether sequences that are shorter than the specified max length (or, if unspecified, the length of the longest sequence in xtrain) are filled with the padding value in the beginning or in the end of the sequence (such as to attain the desired length).

With Convolution2d it indicates whether to pad (i.e. extend the tensor (e.g. pixel matrix or feature map) with additional rows and columns outside the border) or not.

I suggest looking into the respective documentations.

Tensorflow: tf.nn.avg_pool() with 'SAME' padding does not average over padded pixels

Yes, pixels that are padded are not taken into account in the average. So with a 4x4 pooling, results computed in the middle of the image are averaged over 16 values, but values in the corner could use only 9 values if two edges are padded.

You can for example see it here in the source regarding the call to CuDNN, where the option CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING is selected for average padding. CuDNN also proposes CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING, which would take into account padded pixels in the average, but tensorflow does not exposes this option.

This could be a way in which average pooling behaves differently from (strided) convolution, especially for layers with a small spatial extent.

Note that the situation is similar with max pooling: padded pixels are ignored (or equivalently, virtually set to a value of -inf).

import tensorflow as tf

x = -tf.ones((1, 4, 4, 1))
max_pool = tf.nn.max_pool(x, (1, 4, 4, 1), (1, 1, 1, 1), 'SAME')
sess = tf.InteractiveSession()
print(max_pool.eval().squeeze())
# [[-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]
#  [-1. -1. -1. -1.]]

Clearly the documentation could be more explicit about it.