What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
I'll give an example to make it clearer:
x
: input image of shape [2, 3], 1 channelvalid_pad
: max pool with 2x2 kernel, stride 2 and VALID padding.same_pad
: max pool with 2x2 kernel, stride 2 and SAME padding (this is the classic way to go)
The output shapes are:
valid_pad
: here, no padding so the output shape is [1, 1]same_pad
: here, we pad the image to the shape [2, 4] (with-inf
and then apply max pool), so the output shape is [1, 2]
x = tf.constant([[1., 2., 3.],
[4., 5., 6.]])
x = tf.reshape(x, [1, 2, 3, 1]) # give a shape accepted by tf.nn.max_pool
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
valid_pad.get_shape() == [1, 1, 1, 1] # valid_pad is [5.]
same_pad.get_shape() == [1, 1, 2, 1] # same_pad is [5., 6.]
How does tf.keras.layers.Conv2D with padding='same' and strides 1 behave?
Keras uses TensorFlow implementation of padding. All the details are available in the documentation here
First, consider the 'SAME' padding scheme. A detailed explanation of
the reasoning behind it is given in these notes. Here, we summarize
the mechanics of this padding scheme. When using 'SAME', the output
height and width are computed as:out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))The total padding applied along the height and width is computed as:
if (in_height % strides[1] == 0):
pad_along_height = max(filter_height - strides[1], 0)
else:
pad_along_height = max(filter_height - (in_height % strides[1]), 0)
if (in_width % strides[2] == 0):
pad_along_width = max(filter_width - strides[2], 0)
else:
pad_along_width = max(filter_width - (in_width % strides[2]), 0)Finally, the padding on the top, bottom, left and right are:
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_leftNote that the division by 2 means that there might be cases when the
padding on both sides (top vs bottom, right vs left) are off by one.
In this case, the bottom and right sides always get the one additional
padded pixel. For example, when pad_along_height is 5, we pad 2 pixels
at the top and 3 pixels at the bottom. Note that this is different
from existing libraries such as cuDNN and Caffe, which explicitly
specify the number of padded pixels and always pad the same number of
pixels on both sides.For the 'VALID' scheme, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))and no padding is used.
Why use same padding with max pooling?
From https://keras.io/layers/convolutional/
"same" results in padding the input such that the output has the same
length as the original input.
From https://keras.io/layers/pooling/
pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
So, first let's start by asking why use padding at all? In the convolutional kernel context it is important since we don't want to miss each pixel being at the "center" of the kernel. There could be important behavior at the edges/corners of the image that a kernel is looking for. So we pad around the edges for Conv2D and as a result it returns the same size output as the input.
However, in the case of the MaxPooling2D layer we are padding for similar reasons, but the stride size is affected by your choice of pooling size. Since your pooling size is 2, your image will be halved each time you go through a pooling layer.
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
So in the case of your tutorial example; your image dimensions will go from 28->14->7->4 with each arrow representing the pooling layer.
What does padding do in keras
With pad_sequences
it simply indicates whether sequences that are shorter than the specified max length (or, if unspecified, the length of the longest sequence in xtrain
) are filled with the padding value in the beginning or in the end of the sequence (such as to attain the desired length).
With Convolution2d
it indicates whether to pad (i.e. extend the tensor (e.g. pixel matrix or feature map) with additional rows and columns outside the border) or not.
I suggest looking into the respective documentations.
Tensorflow: tf.nn.avg_pool() with 'SAME' padding does not average over padded pixels
Yes, pixels that are padded are not taken into account in the average. So with a 4x4
pooling, results computed in the middle of the image are averaged over 16 values, but values in the corner could use only 9
values if two edges are padded.
You can for example see it here in the source regarding the call to CuDNN, where the option CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING
is selected for average padding. CuDNN also proposes CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING
, which would take into account padded pixels in the average, but tensorflow does not exposes this option.
This could be a way in which average pooling behaves differently from (strided) convolution, especially for layers with a small spatial extent.
Note that the situation is similar with max pooling: padded pixels are ignored (or equivalently, virtually set to a value of -inf
).
import tensorflow as tf
x = -tf.ones((1, 4, 4, 1))
max_pool = tf.nn.max_pool(x, (1, 4, 4, 1), (1, 1, 1, 1), 'SAME')
sess = tf.InteractiveSession()
print(max_pool.eval().squeeze())
# [[-1. -1. -1. -1.]
# [-1. -1. -1. -1.]
# [-1. -1. -1. -1.]
# [-1. -1. -1. -1.]]
Clearly the documentation could be more explicit about it.
Related Topics
How to Sort Unicode Strings Alphabetically in Python
Finding Indices of Matches of One Array in Another Array
Convert String Date to Timestamp in Python
Most Recent Previous Business Day in Python
Replace() Method Not Working on Pandas Dataframe
Read File from Line 2 or Skip Header Row
Matplotlib: Save Plot to Numpy Array
List on Python Appending Always the Same Value
Converting Epoch Time into the Datetime
Python Numpy Valueerror: Operands Could Not Be Broadcast Together with Shapes
Pandas Groupby Multiple Fields Then Diff
Creating a Pandas Dataframe from a Numpy Array: How to Specify the Index Column and Column Headers
Windows Cmd Encoding Change Causes Python Crash
Pylint "Unable to Import" Error - How to Set Pythonpath