Numpy: Fix Array with Rows of Different Lengths by Filling the Empty Elements with Zeros

Numpy: Fix array with rows of different lengths by filling the empty elements with zeros

This could be one approach -

def numpy_fillna(data):
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])

    # Mask of valid places in each row
    mask = np.arange(lens.max()) < lens[:,None]

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=data.dtype)
    out[mask] = np.concatenate(data)
    return out

Sample input, output -

In [222]: # Input object dtype array
     ...: data = np.array([[1, 2, 3, 4],
     ...:                  [2, 3, 1],
     ...:                  [5, 5, 5, 5, 8 ,9 ,5],
     ...:                  [1, 1]])

In [223]: numpy_fillna(data)
Out[223]: 
array([[1, 2, 3, 4, 0, 0, 0],
       [2, 3, 1, 0, 0, 0, 0],
       [5, 5, 5, 5, 8, 9, 5],
       [1, 1, 0, 0, 0, 0, 0]], dtype=object)

Fill array with rows of different lenghts Python

I think this should do it:

def fill(a):
    length = max([len(i) for i in a])
    return [[0]*(length-len(i)) + i for i in a]

fill(mylist)
#[[0,0,1], [0,1,2], [1,2,3]]

python: padding with zero in the end of every array in Numpy array of arrays

For numpy.pad solution I think we need to ensure your input is exactly as you have it so we can get a proper solution. Then it will just be:

a=[
      np.asarray([1,2,3,4]),
      np.asarray([3,56]),
      np.asarray([8,4,8,4,9,33,55])
 ]

max_len = max([len(x) for x in a])

output = [np.pad(x, (0, max_len - len(x)), 'constant') for x in a]

print(output)

>>> [
     array([1, 2, 3, 4, 0, 0, 0]), 
     array([ 3, 56,  0,  0,  0,  0,  0]), 
     array([ 8,  4,  8,  4,  9, 33, 55])
    ]

python how to pad numpy array with zeros

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

but then be careful that you don't have offsets bigger than allowed. For x_offset = 2 for example this will fail.

If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I've found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Filling empty list with zero vector using numpy

Edit: I didn't realize that all of the non-empty features were the same length. If that is the case then you can just use the length of the first non-zero one. I added a function that does that.

f0 = [0,1,2]
f1 = []
f2 = [4,5,6]

features = [f0, f1, f2]

def get_nonempty_len(features):
    """
    returns the length of the first non-empty element
        of features.     
    """
    for f in features:
        if len(f) > 0:
            return len(f)
    return 0

def generate_matrix(features):
    rows = len(features)
    cols = get_nonempty_len(features)
    m = np.zeros((rows, cols))
    for i, f in enumerate(features):
        m[i,:len(f)]=f
    return m

print(generate_matrix(features))

Output looks like:

[[ 0.  1.  2.]
 [ 0.  0.  0.]
 [ 4.  5.  6.]]

Zero pad numpy array

For your use case you can use resize() method:

A = np.array([1,2,3,4,5])
A.resize(8)

This resizes A in place. If there are refs to A numpy throws a vale error because the referenced value would be updated too. To allow this add refcheck=False option.

The documentation states that missing values will be 0:

Enlarging an array: as above, but missing entries are filled with zeros

How to make a multidimension numpy array with a varying row size?

While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.

If you really want flexible Numpy arrays, use something like this:

numpy.array([[0,1,2,3], [2,3,4]], dtype=object)

However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).

NumPy array initialization (fill with identical values)

NumPy 1.8 introduced np.full(), which is a more direct method than empty() followed by fill() for creating an array filled with a certain value:

>>> np.full((3, 5), 7)
array([[ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.]])

>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

This is arguably the way of creating an array filled with certain values, because it explicitly describes what is being achieved (and it can in principle be very efficient since it performs a very specific task).

Numpy: Fix Array with Rows of Different Lengths by Filling the Empty Elements with Zeros