Convert Structured Array to Regular Numpy Array

Convert structured array to regular NumPy array

[~]
|5> x = np.array([(1.0, 4.0,), (2.0, -1.0)], dtype=[('f0', '<f8'), ('f1', '<f8')])

[~]
|6> x.view(np.float64).reshape(x.shape + (-1,))
array([[ 1., 4.],
[ 2., -1.]])

Convert a slice of a structured array to regular NumPy array in NumPy 1.14

The 1d array does convert with view:

In [270]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [271]: arr
Out[271]:
array([(105., 34., 145., 217.)],
dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])
In [272]: arr.view('<f4')
Out[272]: array([105., 34., 145., 217.], dtype=float32)

It's when we try to convert a single element, that we get this error:

In [273]: arr[0].view('<f4')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-273-70fbab8f61ba> in <module>()
----> 1 arr[0].view('<f4')

ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

Earlier view often required a tweak in the dimensions. I suspect that with recent changes to handling of structured arrays (most evident when indexing several fields at once), this error is a result, either intentionally or not.

In the whole array case it changed the 1d, 4 field array into a 1d, 4 element array, (1,) to (4,). But changing the element, goes from () to (4,).

In the past I have recommended tolist as the surest way around problem with view (and astype):

In [274]: arr[0].tolist()
Out[274]: (105.0, 34.0, 145.0, 217.0)
In [279]: list(arr[0].tolist())
Out[279]: [105.0, 34.0, 145.0, 217.0]
In [280]: np.array(arr[0].tolist())
Out[280]: array([105., 34., 145., 217.])

item is also a good way of pulling an element out of its numpy structure:

In [281]: arr[0].item()
Out[281]: (105.0, 34.0, 145.0, 217.0)

The result from tolost and item is a tuple.

You worry about speed. But you are just converting one element. It's one thing to worry about the speed when using tolist on a 1000 item array, quite another when working with 1 element.

In [283]: timeit arr[0]
131 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [284]: timeit arr[0].tolist()
1.25 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [285]: timeit arr[0].item()
1.27 µs ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [286]: timeit arr.tolist()
493 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [287]: timeit arr.view('f4')
1.74 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

You could index the element in a way that doesn't reduce the dimension to 0 (not that it helps much with speed):

In [288]: arr[[0]].view('f4')
Out[288]: array([105., 34., 145., 217.], dtype=float32)
In [289]: timeit arr[[0]].view('f4')
6.54 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [290]: timeit arr[0:1].view('f4')
2.63 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [298]: timeit arr[0][None].view('f4')
4.28 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

view still requires a change in shape; consider a big array:

In [299]: arrs = np.repeat(arr, 10000)
In [301]: arrs.view('f4')
Out[301]: array([105., 34., 145., ..., 34., 145., 217.], dtype=float32)
In [303]: arrs.shape
Out[303]: (10000,)
In [304]: arrs.view('f4').shape
Out[304]: (40000,)

The view is still 1d, where as we'd probably want a (10000,4) shaped 2d array.

A better view change:

In [306]: arrs.view(('f4',4))
Out[306]:
array([[105., 34., 145., 217.],
[105., 34., 145., 217.],
[105., 34., 145., 217.],
...,
[105., 34., 145., 217.],
[105., 34., 145., 217.],
[105., 34., 145., 217.]], dtype=float32)
In [307]: _.shape
Out[307]: (10000, 4)

This works with the 1 element array, whether 1d or 0d:

In [308]: arr.view(('f4',4))
Out[308]: array([[105., 34., 145., 217.]], dtype=float32)
In [309]: _.shape
Out[309]: (1, 4)
In [310]: arr[0].view(('f4',4))
Out[310]: array([105., 34., 145., 217.], dtype=float32)
In [311]: _.shape
Out[311]: (4,)

This was suggested in one of the answers in your link: https://stackoverflow.com/a/10171321/901925

Contrary to your comment there, it works for me:

In [312]: arr[0].view((np.float32, len(arr.dtype.names)))
Out[312]: array([105., 34., 145., 217.], dtype=float32)
In [313]: np.__version__
Out[313]: '1.14.0'

With the edit:

In [84]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [85]: arr2 = arr[['a', 'b']]
In [86]: arr2
Out[86]:
array([(105., 34.)],
dtype={'names':['a','b'], 'formats':['<f4','<f4'], 'offsets':[0,4], 'itemsize':16})

In [87]: arr2.view(('f4',2))
...
ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged

Note that the arr2 dtype includes an offsets value. In a recent numpy version, multiple field selection has changed. It is now a true view, preserving the original data - all of it, not just the selected fields. The itemsize is unchanged:

In [93]: arr.itemsize
Out[93]: 16
In [94]: arr2.itemsize
Out[94]: 16

arr.view(('f4',4) and arr2.view(('f4',4)) produce the same thing.

So you can't view (change dtype) a partial set of the fields. You have to first take the view of the whole array, and then select rows/columns, or work with tolist.

I'm using 1.14.0. Release notes for 1.14.1 says:

The change in 1.14.0 that multi-field indexing of structured arrays returns a
view instead of a copy has been reverted but remains on track for NumPy 1.15.
Affected users should read the 1.14.1 Numpy User Guide section
"basics/structured arrays/accessing multiple fields" for advice on how to
manage this transition.

https://docs.scipy.org/doc/numpy-1.14.2/user/basics.rec.html#accessing-multiple-fields

This is still under development. That doc mentions a repack_fields function, but that doesn't exist yet.

Convert structured array to numpy array for use with Scikit-Learn

Add a .copy() to data[features]:

X = data[features].copy()
X = X.view((float, len(X.dtype.names)))

and the FutureWarning message is gone.

This should be more efficient than converting to a list first.

Converting numpy array to structured array

There are special helper functions for this:

>>> from numpy.lib.recfunctions import unstructured_to_structured

So,

>>> import numpy as np
>>> arr = np.array([[1,2], [3,4]], dtype='u1')
>>> unstructured_to_structured(arr, dtype=np.dtype([('a', 'u1'), ('b', 'u1')]))
array([(1, 2), (3, 4)], dtype=[('a', 'u1'), ('b', 'u1')])

You can also create a view:

>>> arr.ravel().view(dtype=np.dtype([('a', 'u1'), ('b', 'u1')]))
array([(1, 2), (3, 4)], dtype=[('a', 'u1'), ('b', 'u1')])

And in this simple case, that is fine, but if you choose to use a view you sometimes have to worry about how the array is packed. Note, a view doesn't copy the underlying buffer! Which can make it much more efficient if you are working with large arrays.

Convert a numpy array to a structured array

In [222]: x = np.array([[ 0,  2,  3,  4,  5], [ 0, 12, 13, 14, 15]])
In [223]: dt = np.dtype([('checksum','u2'), ('word', 'B', (3,))])

I know from past use, the genfromtxt can handle relatively complex dtypes:

In [224]: np.savetxt('temp', x[:,1:], fmt='%d')
In [225]: cat temp
2 3 4 5
12 13 14 15
In [226]: data = np.genfromtxt('temp', dtype=dt)
In [227]: data
Out[227]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But I haven't dug into its code to see how it maps the flat row data on to the dtypes.

But it turns out the unstructured_to_structured that I mentioned in a comment can handle your dtype:

In [228]: import numpy.lib.recfunctions as rf
In [229]: rf.unstructured_to_structured(x[:,1:],dtype=dt)
Out[229]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

But for simpler dtype, I and others have often recommended turning the list of lists into a list of tuples.

In [230]: [tuple(row) for row in x[:,1:]]
Out[230]: [(2, 3, 4, 5), (12, 13, 14, 15)]

Many of the recfunctions use a field-by-field copy

In [231]: res = np.zeros(2, dtype=dt)
In [232]: res
Out[232]:
array([(0, [0, 0, 0]), (0, [0, 0, 0])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
In [233]: res['checksum']= x[:,1]
In [234]: res['word']
Out[234]:
array([[0, 0, 0],
[0, 0, 0]], dtype=uint8)
In [235]: res['word'] = x[:,2:]
In [236]: res
Out[236]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

byte view

I missed the fact that you wanted to repack bytes. My above answer treats the input line as 4 numbers/ints that will be assigned to the 4 slots in the compound dtype. But with uint8 input, and u2 and u1 slots, you want to view the 5 bytes with the new dtype, not make a new array.

In [332]: dt
Out[332]: dtype([('checksum', '<u2'), ('word', 'u1', (3,))])
In [333]: arr = np.array([(1,2,3,4,5),
...: (11,12,13,14,15)], dtype = np.uint8)
In [334]: arr.view(dt)
Out[334]:
array([[( 513, [ 3, 4, 5])],
[(3083, [13, 14, 15])]],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

view adds a dimension, that we need to remove:

In [335]: _.shape
Out[335]: (2, 1)
In [336]: arr.view(dt).reshape(2)
Out[336]:
array([( 513, [ 3, 4, 5]), (3083, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])

and changing the endedness of the u2 field:

In [337]: dt = np.dtype([('checksum','>u2'), ('word', 'B', (3,))])
In [338]: arr.view(dt).reshape(2)
Out[338]:
array([( 258, [ 3, 4, 5]), (2828, [13, 14, 15])],
dtype=[('checksum', '>u2'), ('word', 'u1', (3,))])

Converting a 2D numpy array to a structured array

You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:

>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
['World' '3.6' '2']]

>>> newrecarray = np.core.records.fromarrays(myarray.transpose(),
names='col1, col2, col3',
formats = 'S8, f8, i8')

>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]

I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.

Convert structured array with various numeric data types to regular array

You can do it easily with Pandas:

>>> import pandas as pd
>>> pd.DataFrame(my_data).values
array([[ 17. , 182.1000061],
[ 19. , 175.6000061]], dtype=float32)

How to convert numpy.recarray to numpy.array?

By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:

>>> a = np.array([(0, 1, 2),
(3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])

we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:

>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0., 1., 2., 3., 4., 5.])

astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a. Each row of a requires 4+8+4=16 bytes, while a.astype(...) requires 8*3=24 bytes. Calling view requires no new memory, since view just changes how the underlying data is interpreted.

a.tolist() returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist() requires more memory than a.astype(...).

Calling a.astype(...).view(...) is also faster than np.array(a.tolist()):

In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)

In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop

In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop


Related Topics



Leave a reply



Submit