How to make a multidimension numpy array with a varying row size?
While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
numpy.array([[0,1,2,3], [2,3,4]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
numpy array containing multi-dimension numpy arrays with variable shape
np.array(alist)
will make an object dtype array if the list arrays differ in the first dimension. But in your case they differ in the 3rd, producing this error. In effect, it can't unambiguously determine where the containing dimension ends, and where the objects begin.
In [270]: alist = [np.ones((10,4,4,20),int), np.zeros((10,4,6,20),int)]
In [271]: arr = np.array(alist)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-271-3fd8e9bd05a9> in <module>
----> 1 arr = np.array(alist)
ValueError: could not broadcast input array from shape (10,4,4,20) into shape (10,4)
Instead we need to make an object array of the right size, and copy the list to it. Sometimes this copy still produces broadcasting errors, but here it seems to be ok:
In [272]: arr = np.empty(2, object)
In [273]: arr
Out[273]: array([None, None], dtype=object)
In [274]: arr[:] = alist
In [275]: arr
Out[275]:
array([array([[[[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]]])], dtype=object)
In [276]: arr[0].shape
Out[276]: (10, 4, 4, 20)
In [277]: arr[1].shape
Out[277]: (10, 4, 6, 20)
Storing multidimensional variable length array with h5py
The essence of your code is:
phn_mfccs = []
<loop several layers>
phn_mfcc = <some sort of array expanded by one dimension>
phn_mfccs.append(phn_mfcc)
At the end of loops phn_mfccs
is a list of arrays. I can't tell from the code what the dtype and shape is. Or whether it differs for each element of the list.
I'm not entirely sure what create_dataset
does when given a list of arrays. It may wrap it in np.array
.
mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)
What does np.array(phn_mfccs)
produce? Shape, dtype? If all the elements are arrays of the same shape and dtype it will produce a higher dimensional array. If they differ in shape, it will produce a 1d array with object dtype. Given the error message, I suspect the latter.
I've answered a few vlen
questions but haven't worked with it a lot
http://docs.h5py.org/en/latest/special.html
I vaguely recall that the 'ragged' dimension of a h5
array can only be 1d. So a phn_mfccs
object array that contains 1d float arrays of varying dimensions might work.
I might come up with a simple example. And I suggest you construct a simpler problem that we can copy-n-paste and experiement with. We don't need to know how you read the data from your directory. We just need to understand the content of the array (list) that you are trying to write.
A 2015 post on vlen arrays
Inexplicable behavior when using vlen with h5py
H5PY - How to store many 2D arrays of different dimensions
1d ragged arrays example
In [24]: f = h5py.File('vlen.h5','w')
In [25]: dt = h5py.special_dtype(vlen=np.dtype('float64'))
In [26]: dataset = f.create_dataset('vlen',(4,), dtype=dt)
In [27]: dataset.value
Out[27]:
array([array([], dtype=float64), array([], dtype=float64),
array([], dtype=float64), array([], dtype=float64)], dtype=object)
In [28]: for i in range(4):
...: dataset[i]=np.arange(i+3)
In [29]: dataset.value
Out[29]:
array([array([ 0., 1., 2.]), array([ 0., 1., 2., 3.]),
array([ 0., 1., 2., 3., 4.]),
array([ 0., 1., 2., 3., 4., 5.])], dtype=object)
If I try to write 2d arrays to dataset
I get an error
OSError: Can't prepare for writing data (Src and dest data spaces have different sizes)
The dataset
itself may be multidimensional, but the vlen
object has to be a 1d array of floats.
Convert list of lists with different lengths to a numpy array
you could make a numpy array with np.zeros and fill them with your list elements as shown below.
a = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
import numpy as np
b = np.zeros([len(a),len(max(a,key = lambda x: len(x)))])
for i,j in enumerate(a):
b[i][0:len(j)] = j
results in
[[ 1. 2. 3. 0.]
[ 4. 5. 0. 0.]
[ 6. 7. 8. 9.]]
Getting indices of different lengths to slice a multidimensional numpy array
I think this should work:
m = np.arange(F.shape[1]) < K
Rnew = R.copy()
Rnew[np.nonzero(m)[0], np.argsort(F)[m]] += 1
Since the first line uses broadcasting, np.tile()
is not needed.
Notice that there is a possible ambiguity of the results: since each row of F
has values that are repeated several times (e.g. 0.1 in the first row and -0.4 in the second) np.argsort()
may give different orderings of elements of F
, depending on how these equal values get sorted. This may change which entries of the matrix R
will get incremented. For example, instead of incrementing R[0, 7]
, R[0, 8]
, and R[1, 7]
, the code may increment the entries R[0, 2]
, R[0, 9]
and R[1, 1]
. To get unambiguous results you can specify that np.argsort()
must use a stable sorting algorithm, which will preserve the relative order of elements with equal values:
m = np.arange(F.shape[1]) < K
Rnew = R.copy()
Rnew[np.nonzero(m)[0], np.argsort(F, kind="stable")[m]] += 1
In this particular example this will increment the entries R[0, 2]
, R[0, 7]
and R[1, 1]
. You need to decide if this is the result that meets your needs.
How to access specific row of multidimensional NumPy array with different dimension?
It does not make sense to have different number of elements in different rows of a same matrix. To work around your problem, it is better to first fill all the missing elements in rows with 0 or NA so that number of elements in all rows are equal.
Please also look at answers in Numpy: Fix array with rows of different lengths by filling the empty elements with zeros. I am implementing one of the best solutions mentioned there for your problem.
import numpy as np
def numpy_fillna(data):
lens = np.array([len(i) for i in data])
mask = np.arange(lens.max()) < lens[:,None]
out = np.zeros(mask.shape, dtype=data.dtype)
out[mask] = np.concatenate(data)
return out
a =np.array([range(1,50),range(50,150)])
data=numpy_fillna(a)
print data[1,:]
Related Topics
Can Python Pickle Lambda Functions
What Does a for Loop Within a List Do in Python
How to Determine a Point Is Between Two Other Points on a Line Segment
Python - Is a Dictionary Slow to Find Frequency of Each Character
Coalesce Values from 2 Columns into a Single Column in a Pandas Dataframe
Cleanest Way to Get Last Item from Python Iterator
How to Create a Custom Activation Function with Keras
How to Escape Special Characters of a String with Single Backslashes
How to Draw Intersecting Planes
How Did Python Implement the Built-In Function Pow()
How to Properly Assert That an Exception Gets Raised in Pytest
How to Use Qscrollarea to Make Scrollbars Appear
How to Add a String in a Certain Position
Pandas Dataframe Groupby Datetime Month
Python Pip Install Module Is Not Found. How to Link Python to Pip Location