Converting Numpy Dtypes to Native Python Types

Converting numpy dtypes to native python types

Use val.item() to convert most NumPy values to a native Python type:

import numpy as np

# for example, numpy.float32 -> python float
val = np.float32(0)
pyval = val.item()
print(type(pyval)) # <class 'float'>

# and similar...
type(np.float64(0).item()) # <class 'float'>
type(np.uint32(0).item()) # <class 'int'>
type(np.int16(0).item()) # <class 'int'>
type(np.cfloat(0).item()) # <class 'complex'>
type(np.datetime64(0, 'D').item()) # <class 'datetime.date'>
type(np.datetime64('2001-01-01 00:00:00').item()) # <class 'datetime.datetime'>
type(np.timedelta64(0, 'D').item()) # <class 'datetime.timedelta'>
...

(Another method is np.asscalar(val), however it is deprecated since NumPy 1.16).


For the curious, to build a table of conversions of NumPy array scalars for your system:

for name in dir(np):
obj = getattr(np, name)
if hasattr(obj, 'dtype'):
try:
if 'time' in name:
npn = obj(0, 'D')
else:
npn = obj(0)
nat = npn.item()
print('{0} ({1!r}) -> {2}'.format(name, npn.dtype.char, type(nat)))
except:
pass

There are a few NumPy types that have no native Python equivalent on some systems, including: clongdouble, clongfloat, complex192, complex256, float128, longcomplex, longdouble and longfloat. These need to be converted to their nearest NumPy equivalent before using .item().

Converting native python types to numpy dtypes

numpy.float is just the regular Python float type. It's not a NumPy dtype. It's almost certainly not what you need:

>>> import numpy
>>> numpy.float is float
True

If you want the dtype NumPy would coerce your scalar to, just make an array and get its dtype:

>>> numpy.array(7.7).dtype
dtype('float64')

If you want the type NumPy uses for scalars of this dtype, access the dtype's type attribute:

>>> numpy.array(7.7).dtype.type
<class 'numpy.float64'>

Easier way of converting numpy datatypes to native python datatypes

NumPy.item() instead of NumPy element it will give the approximate python native type

a =[val.item() if type(val).__module__ == np.__name__ else val for val in a ]

for val in native:
print(type(val))

 numpyNum = np.float(1.2)
pythonNum = num.item()

pythonNativeTypeValues = [ v.item() for v in a]

When you have multiple types in your list you need to check the element type is NumPy or not so the code will be as follow

import numpy as np 
import datetime
a = [np.float64(1.2), np.int64(123), 'blablabla', datetime.datetime.now()]
native = []


for val in a:
if type(val).__module__ == np.__name__:
val =val.item()
native.append(val)


for val in native:
print(type(val))
#<class 'float'>
#<class 'int'>
#<class 'str'>
#<class 'datetime.datetime'>

If you plan to use list compression the code will be one line and that is

native =[val.item() if type(val).__module__ == np.__name__ else val for val in a ]

Cannot convert numpy dtypes to its native python types (int64 to int)

So it seems that Amazon s3 is a bit sensitive to dtypes so in order for it to be compatible you can first cast to int and then to object so it's compatible:

avg_Credit_Bal['No. of transactions'] = sum_Credit_Bal['No. of transactions'].astype(int).astype(object)

If you look at the type of the elements it will output object indicating that it's a generic python object:

type(avg_Credit_Bal['No. of transactions'][0])

will output object

Convert list of numpy.float64 to float in Python quickly

The tolist() method should do what you want. If you have a numpy array, just call tolist():

In [17]: a
Out[17]:
array([ 0. , 0.14285714, 0.28571429, 0.42857143, 0.57142857,
0.71428571, 0.85714286, 1. , 1.14285714, 1.28571429,
1.42857143, 1.57142857, 1.71428571, 1.85714286, 2. ])

In [18]: a.dtype
Out[18]: dtype('float64')

In [19]: b = a.tolist()

In [20]: b
Out[20]:
[0.0,
0.14285714285714285,
0.2857142857142857,
0.42857142857142855,
0.5714285714285714,
0.7142857142857142,
0.8571428571428571,
1.0,
1.1428571428571428,
1.2857142857142856,
1.4285714285714284,
1.5714285714285714,
1.7142857142857142,
1.857142857142857,
2.0]

In [21]: type(b)
Out[21]: list

In [22]: type(b[0])
Out[22]: float

If, in fact, you really have python list of numpy.float64 objects, then @Alexander's answer is great, or you could convert the list to an array and then use the tolist() method. E.g.

In [46]: c
Out[46]:
[0.0,
0.33333333333333331,
0.66666666666666663,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]

In [47]: type(c)
Out[47]: list

In [48]: type(c[0])
Out[48]: numpy.float64

@Alexander's suggestion, a list comprehension:

In [49]: [float(v) for v in c]
Out[49]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]

Or, convert to an array and then use the tolist() method.

In [50]: np.array(c).tolist()
Out[50]:
[0.0,
0.3333333333333333,
0.6666666666666666,
1.0,
1.3333333333333333,
1.6666666666666665,
2.0]

If you are concerned with the speed, here's a comparison. The input, x, is a python list of numpy.float64 objects:

In [8]: type(x)
Out[8]: list

In [9]: len(x)
Out[9]: 1000

In [10]: type(x[0])
Out[10]: numpy.float64

Timing for the list comprehension:

In [11]: %timeit list1 = [float(v) for v in x]
10000 loops, best of 3: 109 µs per loop

Timing for conversion to numpy array and then tolist():

In [12]: %timeit list2 = np.array(x).tolist()
10000 loops, best of 3: 70.5 µs per loop

So it is faster to convert the list to an array and then call tolist().

Convert numpy elements to numpy dtypes

If I add to your code a __repr__ method and some prints:

import numpy as np
from operator import attrgetter

class myobj():
def __init__(self, value):
self.myattr = value
def __repr__(self):
return self.myattr.__repr__()

obj_array = np.empty((3,3), dtype='object')
for i in range(obj_array.shape[0]):
for j in range(obj_array.shape[0]):
obj_array[i,j] = myobj(i+j)

native_type_array = np.frompyfunc(attrgetter('myattr'), 1, 1)(obj_array)
print(native_type_array.shape)
print(native_type_array.dtype)
print(native_type_array)
print(obj_array)

I get

1011:~/mypy$ python3 stack38332556.py 
(3, 3)
object
[[0 1 2]
[1 2 3]
[2 3 4]]
[[0 1 2]
[1 2 3]
[2 3 4]]

native_type_array is also an object dtype array - that's what the doc for frompyfunc says it does. But since the elements are numbers, the display looks nice.

And by giving myobj a similar repr, I get the same thing. If I change the repr

def __repr__(self):
return '<%s>'%self.myattr

I get:

[[<0> <1> <2>]
[<1> <2> <3>]
[<2> <3> <4>]]

This applies to lists as well. print([myobj(10),myobj(11)]) produces [<10>, <11>]

Numpy dtype 'h' as dtype

Indeed, the numpy docs can be hard to navigate. Here's the main page on dtype, but it doesn't mention 'h'.

So to probe it experimentally:

import numpy as np
np.dtype('h')

--> dtype('int16')

It's a 16-bit signed integer.

Numpy Array get datatype by cell?

import numpy as np
arr = np.array([1, "sd", 3.6])

You'll notice that the values in this array are not numerics and strings, they're just strings.

>>> arr
array(['1', 'sd', '3.6'], dtype='<U32')

You'll also note that they're not python strings. There is a reason for this but it isn't important here.

>>> type(arr[1])
<class 'numpy.str_'>

>>> type(arr[1]) == type(str)
False

You should not try to mix data types like you are doing. Use a list instead. The difference in data types that you have in your input list is lost when you turn it into an array. I note that you're calling an array element a 'cell' - it isn't, arrays don't work like spreadsheets.

That said, if you absolutely must do this:

arr = np.array([1, "sd", 3.6], dtype=object)

>>> arr
array([1, 'sd', 3.6], dtype=object)

This will keep all the array elements as python objects instead of using numpy dtypes.

>>> np.array([type(x) == str for x in arr])
array([False, True, False])

Then you can test the type of each element accordingly.

h5py: convert numpy data to native python types

Not sure if it would make sense for your use-case, but you might think about storing single scalars or strings as attributes:

http://www.h5py.org/docs/intro/quick.html#attributes



Related Topics



Leave a reply



Submit