Translate every element in numpy array according to key
I don't know about efficient, but you could use np.vectorize
on the .get
method of dictionaries:
>>> a = np.array([[1,2,3],
[3,2,4]])
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
[36, 34, 45]])
Efficiently replace elements in array based on dictionary - NumPy / Python
Approach #1 : Loopy one with array data
One approach would be extracting the keys and values in arrays and then use a similar loop -
k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))
out = np.zeros_like(input_array)
for key,val in zip(k,v):
out[input_array==key] = val
Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.
Also, since you mentioned thousand large np.arrays
. So, if the mapping
dictionary stays the same, that step to get the array versions - k
and v
would be a one-time setup process.
Approach #2 : Vectorized one with searchsorted
A vectorized one could be suggested using np.searchsorted
-
sidx = k.argsort() #k,v from approach #1
k = k[sidx]
v = v[sidx]
idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)
Approach #3 : Vectorized one with mapping-array for integer keys
A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -
mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]
How to use a dictionary to translate/replace elements of an array?
Will this do? Sometimes, plain Python is a good, direct way to handle such things. The below builds a list of translations (easily converted back to a numpy array) and the joined output.
import numpy as np
abc_array = np.array(['B', 'D', 'A', 'F', 'H', 'I', 'Z', 'J'])
transdict = {'A': 'Adelaide',
'B': 'Bombay',
'C': 'Cologne',
'D': 'Dresden',
'E': 'Erlangen',
'F': 'Formosa',
'G': 'Gdansk',
'H': 'Hague',
'I': 'Inchon',
'J': 'Jakarta',
'Z': 'Zambia'
}
phoenetic = [transdict[letter] for letter in abc_array]
print ' '.join(phoenetic)
The output from this is:
Bombay Dresden Adelaide Formosa Hague Inchon Zambia Jakarta
change 2d array by a dictionary translation in python
You can use np.vectorize():
x = np.array([[2,3,4],[4,4,2]])
y = {2:7,3:5,4:6}
np.vectorize(y.get)(x)
array([[7, 5, 6],
[6, 6, 7]])
Is there a Numpy equivalent to string `translate`?
This translate
is a string operation. np.char
has a bunch of functions that apply such methods to all elements of a string dtype array:
In [7]: s = "abcdef"
In [8]: arr = np.array([[s,s,s],[s,s,s]])
In [9]: arr
Out[9]:
array([['abcdef', 'abcdef', 'abcdef'],
['abcdef', 'abcdef', 'abcdef']], dtype='<U6')
In [10]: np.char.translate(arr, str.maketrans("abc", "xyz"))
Out[10]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')
However, because it calls string methods, it is not particularly fast. Past tests have shown the functions to be about the same speed as explicit loops.
If there were a limited number of such replacements, you could use one of the mapping methods in the proposed duplicate. But if you want to full power of str.translate
, this, or some iteration, is the best you can do. numpy
does not implement string operations in compiled code.
frompyfunc
is a good way of applying a function to all elements of an array. It tends to be modestly faster than more explicit loops:
In [11]: np.frompyfunc(lambda s: s.translate(str.maketrans("abc", "xyz")),1,1)(arr)
Out[11]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype=object)
In [12]: _.astype('U6')
Out[12]:
array([['xyzdef', 'xyzdef', 'xyzdef'],
['xyzdef', 'xyzdef', 'xyzdef']], dtype='<U6')
NumPy adds a dot after each element of an array which I can’t strip
The reason that you are getting the decimal places is because final_array = np.array([])
is creating a float type array. When you append your integer array waveform_integers
with the float type array final_array
, you get a float type array because final_array
is set to use floats.
To fix this, you can use final_array = np.array([], dtype='int16')
which will make it so that both arrays in np.append
are int16
arrays and the result is also an int16
array.
Numpy : Translating elements increases size of file by a lot (factor of 8)
You're likely getting a factor of 8 by writing back your array as int64 where the original array is stored as uint8. You could try:
array=np.vectorize(my_dict.get)(array).astype(np.uint8)
and then saving to h5...
As @Jaime points out, you save an array copy by telling vectorize
what datatype you want straight off:
array=np.vectorize(my_dict.get, otypes=[np.uint8])(array)
How to translate / shift a numpy array?
To simply manage the edges, you can enlarge your array in a bigger one :
square=\
array([[0, 2, 2, 0],
[0, 2, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int64)
n,m=square.shape
bigsquare=np.zeros((3*n,3*m),square.dtype)
bigsquare[n:2*n,m:2*m]=square
Then shift is just a view :
def shift(dx,dy):
x=n-dx
y=m-dy
return bigsquare[x:x+n,y:y+m]
print(shift(1,1))
#[[0 0 0 0]
# [0 0 2 2]
# [0 0 2 0]
# [0 0 0 0]]
Use numpy to translate huge array of 2-byte strings to corresponding 1-byte strings according to a fixed mapping
For such search operations, NumPy has np.searchsorted
, so allow me to suggest an approach with it -
def search_dic(dic, search_keys):
# Extract out keys and values
k = dic.keys()
v = dic.values()
# Use searchsorted to locate the indices
sidx = np.argsort(k)
idx = np.searchsorted(k,search_keys, sorter=sidx)
# Finally index and extract out the corresponding values
return np.take(v,sidx[idx])
Sample run -
In [46]: translation_dict = {'AC': '2', 'AG': '3', 'AT': '4',
...: 'CA': '5', 'CG': '6', 'CT': '7',
...: 'GA': '8', 'GC': '9', 'GT': 'a',
...: 'TA': 'b', 'TC': 'c', 'TG': 'd'}
In [47]: s = np.char.array(['CA', 'CA', 'GC', 'TC', 'AT', 'GT', 'AG', 'CT'])
In [48]: search_dic(translation_dict, s)
Out[48]:
array(['5', '5', '9', 'c', '4', 'a', '3', '7'],
dtype='|S1')
Related Topics
How to Create Nested Dict in Python
Get Human Readable Version of File Size
In Python, How to Capture the Stdout from a C++ Shared Library to a Variable
Passing a Matplotlib Figure to HTML (Flask)
Generating HTML Documents in Python
R, Python: Install Packages on Rpy2
Fama MACbeth Regression in Python (Pandas or Statsmodels)
Plotting of 2D Data:Heatmap with Different Colormaps
R's Which() and Which.Min() Equivalent in Python
How to Style Gtkbox Margin/Padding with CSS Only
Wtforms, Add a Class to a Form Dynamically
Matplotlib Table Formatting Column Width
Display Loading Symbol While Waiting for a Result with Plot.Ly Dash
CSS Not Rendered by Pisa's PDF Generation in Django
How to Select and Extract Texts Between Two Elements