Store Different Datatypes in One Numpy Array

Store different datatypes in one NumPy array?

One approach might be to use a record array. The "columns" won't be like the columns of standard numpy arrays, but for most use cases, this is sufficient:

>>> a = numpy.array(['a', 'b', 'c', 'd', 'e'])
>>> b = numpy.arange(5)
>>> records = numpy.rec.fromarrays((a, b), names=('keys', 'data'))
>>> records
rec.array([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)],
dtype=[('keys', '|S1'), ('data', '<i8')])
>>> records['keys']
rec.array(['a', 'b', 'c', 'd', 'e'],
dtype='|S1')
>>> records['data']
array([0, 1, 2, 3, 4])

Note that you can also do something similar with a standard array by specifying the datatype of the array. This is known as a "structured array":

>>> arr = numpy.array([('a', 0), ('b', 1)], 
dtype=([('keys', '|S1'), ('data', 'i8')]))
>>> arr
array([('a', 0), ('b', 1)],
dtype=[('keys', '|S1'), ('data', '<i8')])

The difference is that record arrays also allow attribute access to individual data fields. Standard structured arrays do not.

>>> records.keys
chararray(['a', 'b', 'c', 'd', 'e'],
dtype='|S1')
>>> arr.keys
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'

Numpy array with different data types

Those are numpy records:

  • https://numpy.org/doc/stable/user/basics.rec.html

Numpy provides two data structures, the homogeneous arrays and the structured (aka record) arrays. The latter one, what you just stumbled across, is a structure that not only allows you to have different data types (float, int, str, etc.) but also provides handy methods to access them, through labels for instance.

Storing different dataypes in the same numpy array

Numpy is a package for scientific computing and most useful to manipulate matrices. If you create the feature array only to print to the console, it would be much easier to use a pandas dataframe or python lists.

That being said:
Your numpy array has a dtype of <U1. This is a Unicode string of length one. So it is effectively a character array, which is why it will only store the first character of every string you assign.

Numpy structured arrays are intended to hold values of different datatypes. But you could also use dtype object to store both floats and arbitrary long strings in the same matrix:

np.full((4,7), 0, dtype=np.object)

Alternatively you can specify the maximum length string you will need: dtype='<U256' specifies that strings of up to 256 characters can be stored for example.

Your code example is very long and most lines are not immediately relevant to the problem. It is better to only show the minimum of code necessary to reproduce the problem. This will also help you understand the problem and narrow down where the bug is.

How to handle mixed data types in numpy arrays

The error is due to the string data in your array, which makes the dtype to be Unicode(indicated by U11 i.e., 11-character unicode) string.
If you wish to store data in the numerical format, then use structured arrays.
However, if you only wish to compute the maximum of the numerical column, use

print(a[:, 1].astype(np.int).max())
// 33

You may choose to use other numerical dtypes such as np.float inplace of np.int based on the nature of data in the specific column.

Numpy with different data type

Maybe you want to check this one out
Store different datatypes in one NumPy array?

There, you will find the same exact question with the one you asked. That is "How to store different datatypes in numpy array" (cmiiw).

Basically, you could either do "Record array" or "Structured array".

Edit:
I don't know about keras parameter(s), or if this kind of structure will be suppprted by it. But if it's about storing 2 different datatype in one single numpy array, I guess you could use this. :)

Hope this could help.

assigning different data types for different columns in a numpy array

I thought it would be relatively easy to break up the array into floats and ints and then use a combination of zip and np.savetxt to put it all back together in the csv. But Support zip input in savetxt in Python 3 suggests that way lies madness.

However, being stuck on the zip idea, I just moved the work to the standard csv module. Since numpy data needs to be converted to python types it may be a bit slower. But we're talking csv writing here so hopefully its just lost in the noise.

First, generate the test array

>>> import numpy as np
>>> array = np.arange(0., 18.*5, 5., dtype=np.float32).reshape((3,6))
>>> array
array([[ 0., 5., 10., 15., 20., 25.],
[ 30., 35., 40., 45., 50., 55.],
[ 60., 65., 70., 75., 80., 85.]], dtype=float32)

Split out the final column and recast as uint8

>>> floats, ints, _after = np.hsplit(array, (5,6))
>>> ints=ints.astype(np.uint8)
>>> floats
array([[ 0., 5., 10., 15., 20.],
[ 30., 35., 40., 45., 50.],
[ 60., 65., 70., 75., 80.]], dtype=float32)
>>> ints
array([[25],
[55],
[85]], dtype=uint8)

Use the python csv module to do the writes. You need to cast the zipped array rows to tuples and add them together to go from np.array to python data types.

>>> import csv
>>> writer = csv.writer(open('test.csv', 'w'))
>>> writer.writerows(tuple(f)+tuple(i) for f,i in zip(floats, ints))
>>> del writer
>>> print(open('test.csv').read())
0.0,5.0,10.0,15.0,20.0,25
30.0,35.0,40.0,45.0,50.0,55
60.0,65.0,70.0,75.0,80.0,85

numpy different data type of multidimensional array

You cannot have different data types in the same numpy array. You could instead have a list of linear numpy arrays with each having its own data type.

e.g. like this:

names = ["asd", "shd", "wdf"]
ages = np.array([12, 35, 23])

d = {'name': names, 'age': ages}
df = pd.DataFrame(data=d)

NumPy array/matrix of mixed types

Your problem is in the data. Try this:

res = np.array(("TEXT", 1, 1), dtype='|S4, i4, i4')

or

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='|S4, i4, i4')

The data has to be a tuple or a list of tuples. Not quite evident form the error message, is it?

Also, please note that the length of the text field has to be specified for the text data to really be saved. If you want to save the text as objects (only references in the array, then:

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='object, i4, i4')

This is often quite useful, as well.



Related Topics



Leave a reply



Submit