Translate Every Element in Numpy Array According to Key

Translate every element in numpy array according to key

I don't know about efficient, but you could use np.vectorize on the .get method of dictionaries:

>>> a = np.array([[1,2,3],
[3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
[36, 34, 45]])

Efficiently replace elements in array based on dictionary - NumPy / Python

Approach #1 : Loopy one with array data

One approach would be extracting the keys and values in arrays and then use a similar loop -

k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))

out = np.zeros_like(input_array)
for key,val in zip(k,v):
out[input_array==key] = val

Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.

Also, since you mentioned thousand large np.arrays. So, if the mapping dictionary stays the same, that step to get the array versions - k and v would be a one-time setup process.

Approach #2 : Vectorized one with searchsorted

A vectorized one could be suggested using np.searchsorted -

sidx = k.argsort() #k,v from approach #1

k = k[sidx]
v = v[sidx]

idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)

Approach #3 : Vectorized one with mapping-array for integer keys

A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -

mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]

How to use a dictionary to translate/replace elements of an array?

Will this do? Sometimes, plain Python is a good, direct way to handle such things. The below builds a list of translations (easily converted back to a numpy array) and the joined output.

import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J'])

transdict = {'A': 'Adelaide',
'B': 'Bombay',
'C': 'Cologne',
'D': 'Dresden',
'E': 'Erlangen',
'F': 'Formosa',
'G': 'Gdansk',
'H': 'Hague',
'I': 'Inchon',
'J': 'Jakarta',
'Z': 'Zambia'
}

phoenetic = [transdict[letter] for letter in abc_array]
print ' '.join(phoenetic)

The output from this is:

Bombay Dresden Adelaide Formosa Hague Inchon Zambia Jakarta

change 2d array by a dictionary translation in python

You can use np.vectorize():

x = np.array([[2,3,4],[4,4,2]])
y = {2:7,3:5,4:6}
np.vectorize(y.get)(x)

array([[7, 5, 6],
[6, 6, 7]])

Is there a Numpy equivalent to string `translate`?

This translate is a string operation. np.char has a bunch of functions that apply such methods to all elements of a string dtype array:

In [7]: s = "abcdef"
In [8]: arr = np.array([[s,s,s],[s,s,s]])
In [9]: arr
Out[9]:
array([['abcdef', 'abcdef', 'abcdef'],
['abcdef', 'abcdef', 'abcdef']], dtype='<U6')
In [10]: np.char.translate(arr, str.maketrans("abc", "xyz"))
Out[10]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')

However, because it calls string methods, it is not particularly fast. Past tests have shown the functions to be about the same speed as explicit loops.

If there were a limited number of such replacements, you could use one of the mapping methods in the proposed duplicate. But if you want to full power of str.translate, this, or some iteration, is the best you can do. numpy does not implement string operations in compiled code.

frompyfunc is a good way of applying a function to all elements of an array. It tends to be modestly faster than more explicit loops:

In [11]: np.frompyfunc(lambda s: s.translate(str.maketrans("abc", "xyz")),1,1)(arr)
Out[11]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype=object)
In [12]: _.astype('U6')
Out[12]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')

NumPy adds a dot after each element of an array which I can’t strip

The reason that you are getting the decimal places is because final_array = np.array([]) is creating a float type array. When you append your integer array waveform_integers with the float type array final_array, you get a float type array because final_array is set to use floats.

To fix this, you can use final_array = np.array([], dtype='int16') which will make it so that both arrays in np.append are int16 arrays and the result is also an int16 array.

Numpy : Translating elements increases size of file by a lot (factor of 8)

You're likely getting a factor of 8 by writing back your array as int64 where the original array is stored as uint8. You could try:

array=np.vectorize(my_dict.get)(array).astype(np.uint8)

and then saving to h5...

As @Jaime points out, you save an array copy by telling vectorize what datatype you want straight off:

array=np.vectorize(my_dict.get, otypes=[np.uint8])(array)

How to translate / shift a numpy array?

To simply manage the edges, you can enlarge your array in a bigger one :

square=\
array([[0, 2, 2, 0],
[0, 2, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int64)

n,m=square.shape
bigsquare=np.zeros((3*n,3*m),square.dtype)
bigsquare[n:2*n,m:2*m]=square

Then shift is just a view :

def shift(dx,dy):
x=n-dx
y=m-dy
return bigsquare[x:x+n,y:y+m]

print(shift(1,1))

#[[0 0 0 0]
# [0 0 2 2]
# [0 0 2 0]
# [0 0 0 0]]

Use numpy to translate huge array of 2-byte strings to corresponding 1-byte strings according to a fixed mapping

For such search operations, NumPy has np.searchsorted, so allow me to suggest an approach with it -

def search_dic(dic, search_keys):
# Extract out keys and values
k = dic.keys()
v = dic.values()

# Use searchsorted to locate the indices
sidx = np.argsort(k)
idx = np.searchsorted(k,search_keys, sorter=sidx)

# Finally index and extract out the corresponding values
return np.take(v,sidx[idx])

Sample run -

In [46]: translation_dict = {'AC': '2', 'AG': '3', 'AT': '4',
...: 'CA': '5', 'CG': '6', 'CT': '7',
...: 'GA': '8', 'GC': '9', 'GT': 'a',
...: 'TA': 'b', 'TC': 'c', 'TG': 'd'}

In [47]: s = np.char.array(['CA', 'CA', 'GC', 'TC', 'AT', 'GT', 'AG', 'CT'])

In [48]: search_dic(translation_dict, s)
Out[48]:
array(['5', '5', '9', 'c', '4', 'a', '3', '7'],
dtype='|S1')


Related Topics



Leave a reply



Submit