Fast replacement of values in a numpy array
I believe there's even more efficient method, but for now, try
from numpy import copy
newArray = copy(theArray)
for k, v in d.iteritems(): newArray[theArray==k] = v
Microbenchmark and test for correctness:
#!/usr/bin/env python2.7
from numpy import copy, random, arange
random.seed(0)
data = random.randint(30, size=10**5)
d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
dk = d.keys()
dv = d.values()
def f1(a, d):
b = copy(a)
for k, v in d.iteritems():
b[a==k] = v
return b
def f2(a, d):
for i in xrange(len(a)):
a[i] = d.get(a[i], a[i])
return a
def f3(a, dk, dv):
mp = arange(0, max(a)+1)
mp[dk] = dv
return mp[a]
a = copy(data)
res = f2(a, d)
assert (f1(data, d) == res).all()
assert (f3(data, dk, dv) == res).all()
Result:
$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f1(data,d)'
100 loops, best of 3: 6.15 msec per loop
$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f3(data,dk,dv)'
100 loops, best of 3: 19.6 msec per loop
Fastest way to replace values in a numpy array with a list
As you "know the size of the list and it is invariable", you can set up an array first:
b = np.zeros((7,))
This then works faster:
%timeit b[:] = a
1000000 loops, best of 3: 1.41 µs per loop
vs
%timeit b = np.array(a)
1000000 loops, best of 3: 1.67 µs per loop
Fast in-place replacement of some values in a numpy array
The following will do it:
elevation[elevation > 0] = numpy.NAN
See Indexing with Boolean Arrays in the NumPy tutorial.
Faster way to iteratively replace values in relatively large NumPy array
Here you go:
import numpy as np
import pandas as pd
VEG_TYPE = ['Shrub (S)','Grass (G)','Moss (M)','Grass (G)']
OBJECTID = [1 ,2 ,3 ,4]
mapping= {k:v for k,v in zip(OBJECTID, VEG_TYPE)}
input_array = np.random.randint(1,5, (10,10))
out = np.empty(input_array.shape, dtype=np.dtype('U100'))
for key,val in mapping.items():
out[input_array==key] = val
Efficiently replace elements in array based on dictionary - NumPy / Python
Approach #1 : Loopy one with array data
One approach would be extracting the keys and values in arrays and then use a similar loop -
k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))
out = np.zeros_like(input_array)
for key,val in zip(k,v):
out[input_array==key] = val
Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.
Also, since you mentioned thousand large np.arrays
. So, if the mapping
dictionary stays the same, that step to get the array versions - k
and v
would be a one-time setup process.
Approach #2 : Vectorized one with searchsorted
A vectorized one could be suggested using np.searchsorted
-
sidx = k.argsort() #k,v from approach #1
k = k[sidx]
v = v[sidx]
idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)
Approach #3 : Vectorized one with mapping-array for integer keys
A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -
mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]
What's the most efficient way to replace some given indices of a NumPy array?
Use zip
to separate x
and y
indices, then cast to tuple
and assign:
>>> values[tuple(zip(*indices))] = replace_values
>>> values
array([[[140, 150, 160],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]],
[[ 20, 30, 40],
[ 0, 0, 0],
[ 0, 0, 0],
[100, 110, 120],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]]])
Where tuple(zip(*indices))
returns:
((0, 1, 1), (0, 0, 3))
As your indices is np.array
itself, you can remove zip
and use transpose, as pointed out by @MadPhysicist:
>>> values[tuple(*indices.T)]
Fast replace in numpy array
Use a combination of numpy.tile() and numpy.hstack(), as follows:
A = np.array([1,2,3])
A_counts = np.array([3,3,3])
A_powers = np.array([[3],[4],[5]])
B_nodup = np.power(A, A_powers)
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ]
B = np.hstack( B_list )
The transpose and stack may be reversed, this may be faster:
B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ]
B = np.transpose( np.vstack( B_list ) )
This is likely only worth doing if the function you are calculating is quite expensive, or it is duplicated many, many times (more than 10); doing a tile and stack to prevent calculating the power function an extra 10 times is likely not worth it. Please benchmark and let us know.
EDIT: Or, you could just use broadcasting to get rid of the list comprehension:
>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B = np.power(A,[[3],[4],[5]])
>>> B
array([[ 1, 1, 1, 8, 8, 8, 27, 27, 27],
[ 1, 1, 1, 16, 16, 16, 81, 81, 81],
[ 1, 1, 1, 32, 32, 32, 243, 243, 243]])
This is probably pretty fast, but doesn't actually do what you asked.
Replace values in Python numpy array based on value from dictionary
Here is code that does what you've asked:
import numpy as np
a = [[ 0., -1., 1., 1.],
[ 0., 1., -2., -3.],
[-1., 1., 1., -5.],
[-3., -1., -1., 2.],
[-5., 2., -4., -2.],
[-1., -3., -1., 2.],
[ 0., 1., -3., 1.],
[-2., -3., 0., -2.],
[-2., -2., 1., -6.],
[-0., -2., 2., -0.]]
d = {-13: 13.0,
-12: 9.375,
-11: 9.4,
-10: 8.6,
-9: 8.3,
-8: 7.8,
-7: 7.1,
-6: 6.4,
-5: 5.8,
-4: 5.2,
-3: 4.6,
-2: 4.0,
-1: 3.6,
0: 3.2,
1: 2.8,
2: 2.5,
3: 2.2,
4: 2.0,
5: 1.8,
6: 1.6}
x = np.array(a)
y = np.copy(x)
for k, v in d.items():
x[y == k] = v
print(x)
I have replaced dict
from the question with d
to avoid using the name of the dict
built-in datatype as a variable name, which can cause problems elsewhere in the same module.
Here is sample output:
[[3.2 3.6 2.8 2.8]
[3.2 2.8 4. 4.6]
[3.6 2.8 2.8 5.8]
[4.6 3.6 3.6 2.5]
[5.8 2.5 5.2 4. ]
[3.6 4.6 3.6 2.5]
[3.2 2.8 4.6 2.8]
[4. 4.6 3.2 4. ]
[4. 4. 2.8 6.4]
[3.2 4. 2.5 3.2]]
Replace all elements of Python NumPy Array that are greater than some value
I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray
named arr
, you can replace all elements >255
with a value x
as follows:
arr[arr > 255] = x
I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.
In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop
Related Topics
Python Script to Copy Text to Clipboard
How to Convert CSV File to Multiline JSON
Pandas Make New Column from String Slice of Another Column
How to Do/Workaround a Conditional Join in Python Pandas
Python "Syntaxerror: Non-Ascii Character '\Xe2' in File"
Python: Can Executable Zip Files Include Data Files
Importerror: No Module Named _Ssl
Python: Get the Print Output in an Exec Statement
How to Make a Multidimension Numpy Array with a Varying Row Size
Timeit Versus Timing Decorator
How Would I Stop a While Loop After N Amount of Time
Python Argparse Mutual Exclusive Group
Url Query Parameters to Dict Python
Problem with Multi Threaded Python App and Socket Connections
Matching Nested Structures with Regular Expressions in Python
"Importerror: No Module Named" When Trying to Run Python Script