Convert structured array to regular NumPy array
[~]
|5> x = np.array([(1.0, 4.0,), (2.0, -1.0)], dtype=[('f0', '<f8'), ('f1', '<f8')])
[~]
|6> x.view(np.float64).reshape(x.shape + (-1,))
array([[ 1., 4.],
[ 2., -1.]])
Convert a slice of a structured array to regular NumPy array in NumPy 1.14
The 1d array does convert with view
:
In [270]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [271]: arr
Out[271]:
array([(105., 34., 145., 217.)],
dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])
In [272]: arr.view('<f4')
Out[272]: array([105., 34., 145., 217.], dtype=float32)
It's when we try to convert a single element, that we get this error:
In [273]: arr[0].view('<f4')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-273-70fbab8f61ba> in <module>()
----> 1 arr[0].view('<f4')
ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged
Earlier view
often required a tweak in the dimensions. I suspect that with recent changes to handling of structured arrays (most evident when indexing several fields at once), this error is a result, either intentionally or not.
In the whole array case it changed the 1d, 4 field array into a 1d, 4 element array, (1,) to (4,). But changing the element, goes from () to (4,).
In the past I have recommended tolist
as the surest way around problem with view
(and astype
):
In [274]: arr[0].tolist()
Out[274]: (105.0, 34.0, 145.0, 217.0)
In [279]: list(arr[0].tolist())
Out[279]: [105.0, 34.0, 145.0, 217.0]
In [280]: np.array(arr[0].tolist())
Out[280]: array([105., 34., 145., 217.])
item
is also a good way of pulling an element out of its numpy structure:
In [281]: arr[0].item()
Out[281]: (105.0, 34.0, 145.0, 217.0)
The result from tolost
and item
is a tuple.
You worry about speed. But you are just converting one element. It's one thing to worry about the speed when using tolist
on a 1000 item array, quite another when working with 1 element.
In [283]: timeit arr[0]
131 ns ± 1.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [284]: timeit arr[0].tolist()
1.25 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [285]: timeit arr[0].item()
1.27 µs ± 2.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [286]: timeit arr.tolist()
493 ns ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [287]: timeit arr.view('f4')
1.74 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
You could index the element in a way that doesn't reduce the dimension to 0 (not that it helps much with speed):
In [288]: arr[[0]].view('f4')
Out[288]: array([105., 34., 145., 217.], dtype=float32)
In [289]: timeit arr[[0]].view('f4')
6.54 µs ± 15.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [290]: timeit arr[0:1].view('f4')
2.63 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [298]: timeit arr[0][None].view('f4')
4.28 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
view
still requires a change in shape; consider a big array:
In [299]: arrs = np.repeat(arr, 10000)
In [301]: arrs.view('f4')
Out[301]: array([105., 34., 145., ..., 34., 145., 217.], dtype=float32)
In [303]: arrs.shape
Out[303]: (10000,)
In [304]: arrs.view('f4').shape
Out[304]: (40000,)
The view is still 1d, where as we'd probably want a (10000,4) shaped 2d array.
A better view change:
In [306]: arrs.view(('f4',4))
Out[306]:
array([[105., 34., 145., 217.],
[105., 34., 145., 217.],
[105., 34., 145., 217.],
...,
[105., 34., 145., 217.],
[105., 34., 145., 217.],
[105., 34., 145., 217.]], dtype=float32)
In [307]: _.shape
Out[307]: (10000, 4)
This works with the 1 element array, whether 1d or 0d:
In [308]: arr.view(('f4',4))
Out[308]: array([[105., 34., 145., 217.]], dtype=float32)
In [309]: _.shape
Out[309]: (1, 4)
In [310]: arr[0].view(('f4',4))
Out[310]: array([105., 34., 145., 217.], dtype=float32)
In [311]: _.shape
Out[311]: (4,)
This was suggested in one of the answers in your link: https://stackoverflow.com/a/10171321/901925
Contrary to your comment there, it works for me:
In [312]: arr[0].view((np.float32, len(arr.dtype.names)))
Out[312]: array([105., 34., 145., 217.], dtype=float32)
In [313]: np.__version__
Out[313]: '1.14.0'
With the edit:
In [84]: arr = np.array([(105.0, 34.0, 145.0, 217.0)], dtype=[('a', 'f4'), ('b','f4'), ('c', 'f4'), ('d', 'f4')])
In [85]: arr2 = arr[['a', 'b']]
In [86]: arr2
Out[86]:
array([(105., 34.)],
dtype={'names':['a','b'], 'formats':['<f4','<f4'], 'offsets':[0,4], 'itemsize':16})
In [87]: arr2.view(('f4',2))
...
ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged
Note that the arr2
dtype
includes an offsets
value. In a recent numpy version, multiple field selection has changed. It is now a true view, preserving the original data - all of it, not just the selected fields. The itemsize is unchanged:
In [93]: arr.itemsize
Out[93]: 16
In [94]: arr2.itemsize
Out[94]: 16
arr.view(('f4',4)
and arr2.view(('f4',4))
produce the same thing.
So you can't view
(change dtype) a partial set of the fields. You have to first take the view
of the whole array, and then select rows/columns, or work with tolist
.
I'm using 1.14.0
. Release notes for 1.14.1
says:
The change in 1.14.0 that multi-field indexing of structured arrays returns a
view instead of a copy has been reverted but remains on track for NumPy 1.15.
Affected users should read the 1.14.1 Numpy User Guide section
"basics/structured arrays/accessing multiple fields" for advice on how to
manage this transition.
https://docs.scipy.org/doc/numpy-1.14.2/user/basics.rec.html#accessing-multiple-fields
This is still under development. That doc mentions a repack_fields
function, but that doesn't exist yet.
Convert structured array to numpy array for use with Scikit-Learn
Add a .copy()
to data[features]
:
X = data[features].copy()
X = X.view((float, len(X.dtype.names)))
and the FutureWarning
message is gone.
This should be more efficient than converting to a list first.
Converting numpy array to structured array
There are special helper functions for this:
>>> from numpy.lib.recfunctions import unstructured_to_structured
So,
>>> import numpy as np
>>> arr = np.array([[1,2], [3,4]], dtype='u1')
>>> unstructured_to_structured(arr, dtype=np.dtype([('a', 'u1'), ('b', 'u1')]))
array([(1, 2), (3, 4)], dtype=[('a', 'u1'), ('b', 'u1')])
You can also create a view:
>>> arr.ravel().view(dtype=np.dtype([('a', 'u1'), ('b', 'u1')]))
array([(1, 2), (3, 4)], dtype=[('a', 'u1'), ('b', 'u1')])
And in this simple case, that is fine, but if you choose to use a view you sometimes have to worry about how the array is packed. Note, a view doesn't copy the underlying buffer! Which can make it much more efficient if you are working with large arrays.
Convert a numpy array to a structured array
In [222]: x = np.array([[ 0, 2, 3, 4, 5], [ 0, 12, 13, 14, 15]])
In [223]: dt = np.dtype([('checksum','u2'), ('word', 'B', (3,))])
I know from past use, the genfromtxt
can handle relatively complex dtypes:
In [224]: np.savetxt('temp', x[:,1:], fmt='%d')
In [225]: cat temp
2 3 4 5
12 13 14 15
In [226]: data = np.genfromtxt('temp', dtype=dt)
In [227]: data
Out[227]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
But I haven't dug into its code to see how it maps the flat row data on to the dtypes.
But it turns out the unstructured_to_structured
that I mentioned in a comment can handle your dtype:
In [228]: import numpy.lib.recfunctions as rf
In [229]: rf.unstructured_to_structured(x[:,1:],dtype=dt)
Out[229]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
But for simpler dtype, I and others have often recommended turning the list of lists into a list of tuples.
In [230]: [tuple(row) for row in x[:,1:]]
Out[230]: [(2, 3, 4, 5), (12, 13, 14, 15)]
Many of the recfunctions
use a field-by-field copy
In [231]: res = np.zeros(2, dtype=dt)
In [232]: res
Out[232]:
array([(0, [0, 0, 0]), (0, [0, 0, 0])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
In [233]: res['checksum']= x[:,1]
In [234]: res['word']
Out[234]:
array([[0, 0, 0],
[0, 0, 0]], dtype=uint8)
In [235]: res['word'] = x[:,2:]
In [236]: res
Out[236]:
array([( 2, [ 3, 4, 5]), (12, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
byte view
I missed the fact that you wanted to repack bytes. My above answer treats the input line as 4 numbers/ints that will be assigned to the 4 slots in the compound dtype. But with uint8
input, and u2
and u1
slots, you want to view
the 5 bytes with the new dtype, not make a new array.
In [332]: dt
Out[332]: dtype([('checksum', '<u2'), ('word', 'u1', (3,))])
In [333]: arr = np.array([(1,2,3,4,5),
...: (11,12,13,14,15)], dtype = np.uint8)
In [334]: arr.view(dt)
Out[334]:
array([[( 513, [ 3, 4, 5])],
[(3083, [13, 14, 15])]],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
view
adds a dimension, that we need to remove:
In [335]: _.shape
Out[335]: (2, 1)
In [336]: arr.view(dt).reshape(2)
Out[336]:
array([( 513, [ 3, 4, 5]), (3083, [13, 14, 15])],
dtype=[('checksum', '<u2'), ('word', 'u1', (3,))])
and changing the endedness of the u2
field:
In [337]: dt = np.dtype([('checksum','>u2'), ('word', 'B', (3,))])
In [338]: arr.view(dt).reshape(2)
Out[338]:
array([( 258, [ 3, 4, 5]), (2828, [13, 14, 15])],
dtype=[('checksum', '>u2'), ('word', 'u1', (3,))])
Converting a 2D numpy array to a structured array
You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:
>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
['World' '3.6' '2']]
>>> newrecarray = np.core.records.fromarrays(myarray.transpose(),
names='col1, col2, col3',
formats = 'S8, f8, i8')
>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]
I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.
Convert structured array with various numeric data types to regular array
You can do it easily with Pandas:
>>> import pandas as pd
>>> pd.DataFrame(my_data).values
array([[ 17. , 182.1000061],
[ 19. , 175.6000061]], dtype=float32)
How to convert numpy.recarray to numpy.array?
By "normal array" I take it you mean a NumPy array of homogeneous dtype. Given a recarray, such as:
>>> a = np.array([(0, 1, 2),
(3, 4, 5)],[('x', int), ('y', float), ('z', int)]).view(np.recarray)
rec.array([(0, 1.0, 2), (3, 4.0, 5)],
dtype=[('x', '<i4'), ('y', '<f8'), ('z', '<i4')])
we must first make each column have the same dtype. We can then convert it to a "normal array" by viewing the data by the same dtype:
>>> a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
array([ 0., 1., 2., 3., 4., 5.])
astype returns a new numpy array. So the above requires additional memory in an amount proportional to the size of a
. Each row of a
requires 4+8+4=16 bytes, while a.astype(...)
requires 8*3=24 bytes. Calling view requires no new memory, since view
just changes how the underlying data is interpreted.
a.tolist()
returns a new Python list. Each Python number is an object which requires more bytes than its equivalent representation in a numpy array. So a.tolist()
requires more memory than a.astype(...)
.
Calling a.astype(...).view(...)
is also faster than np.array(a.tolist())
:
In [8]: a = np.array(zip(*[iter(xrange(300))]*3),[('x', int), ('y', float), ('z', int)]).view(np.recarray)
In [9]: %timeit a.astype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')]).view('<f8')
10000 loops, best of 3: 165 us per loop
In [10]: %timeit np.array(a.tolist())
1000 loops, best of 3: 683 us per loop
Related Topics
Efficient Way to Add Spaces Between Characters in a String
String Formatting: Columns in Line
How to Call Setattr() on the Current Module
Numpy Np.Apply_Along_Axis Function Speed Up
Using Subprocess to Run Python Script on Windows
Saving Upload in Flask Only Saves to Project Root
Why Does Python's _Import_ Require Fromlist
Python and Openssl Version Reference Issue on Os X
When to Use Which Fuzz Function to Compare 2 Strings
Why Does Pyplot.Contour() Require Z to Be a 2D Array
How to Get Tkinter Canvas to Dynamically Resize to Window Width
Pandas - Explanation on Apply Function Being Slow