Fast Replacement of Values in a Numpy Array

Fast replacement of values in a numpy array

I believe there's even more efficient method, but for now, try

from numpy import copy

newArray = copy(theArray)
for k, v in d.iteritems(): newArray[theArray==k] = v

Microbenchmark and test for correctness:

#!/usr/bin/env python2.7

from numpy import copy, random, arange

random.seed(0)
data = random.randint(30, size=10**5)

d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
dk = d.keys()
dv = d.values()

def f1(a, d):
    b = copy(a)
    for k, v in d.iteritems():
        b[a==k] = v
    return b

def f2(a, d):
    for i in xrange(len(a)):
        a[i] = d.get(a[i], a[i])
    return a

def f3(a, dk, dv):
    mp = arange(0, max(a)+1)
    mp[dk] = dv
    return mp[a]

a = copy(data)
res = f2(a, d)

assert (f1(data, d) == res).all()
assert (f3(data, dk, dv) == res).all()

Result:

$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f1(data,d)'
100 loops, best of 3: 6.15 msec per loop

$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f3(data,dk,dv)'
100 loops, best of 3: 19.6 msec per loop

Fastest way to replace values in a numpy array with a list

As you "know the size of the list and it is invariable", you can set up an array first:

b = np.zeros((7,))

This then works faster:

%timeit b[:] = a
1000000 loops, best of 3: 1.41 µs per loop

%timeit b = np.array(a)
1000000 loops, best of 3: 1.67 µs per loop

Fast in-place replacement of some values in a numpy array

The following will do it:

elevation[elevation > 0] = numpy.NAN

See Indexing with Boolean Arrays in the NumPy tutorial.

Faster way to iteratively replace values in relatively large NumPy array

Here you go:

import numpy as np
import pandas as pd

VEG_TYPE = ['Shrub (S)','Grass (G)','Moss  (M)','Grass (G)']
OBJECTID = [1 ,2 ,3 ,4]

mapping= {k:v for k,v in zip(OBJECTID, VEG_TYPE)}

input_array = np.random.randint(1,5, (10,10))

out = np.empty(input_array.shape, dtype=np.dtype('U100'))
for key,val in mapping.items():
    out[input_array==key] = val

Efficiently replace elements in array based on dictionary - NumPy / Python

Approach #1 : Loopy one with array data

One approach would be extracting the keys and values in arrays and then use a similar loop -

k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))

out = np.zeros_like(input_array)
for key,val in zip(k,v):
    out[input_array==key] = val

Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.

Also, since you mentioned thousand large np.arrays. So, if the mapping dictionary stays the same, that step to get the array versions - k and v would be a one-time setup process.

Approach #2 : Vectorized one with searchsorted

A vectorized one could be suggested using np.searchsorted -

sidx = k.argsort() #k,v from approach #1

k = k[sidx]
v = v[sidx]

idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)

Approach #3 : Vectorized one with mapping-array for integer keys

A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -

mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]

What's the most efficient way to replace some given indices of a NumPy array?

Use zip to separate x and y indices, then cast to tuple and assign:

>>> values[tuple(zip(*indices))] = replace_values
>>> values

array([[[140, 150, 160],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[ 20,  30,  40],
        [  0,   0,   0],
        [  0,   0,   0],
        [100, 110, 120],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]])

Where tuple(zip(*indices)) returns:

((0, 1, 1), (0, 0, 3))

As your indices is np.array itself, you can remove zip and use transpose, as pointed out by @MadPhysicist:

>>> values[tuple(*indices.T)]

Fast replace in numpy array

Use a combination of numpy.tile() and numpy.hstack(), as follows:

A = np.array([1,2,3])
A_counts = np.array([3,3,3])
A_powers = np.array([[3],[4],[5]])
B_nodup = np.power(A, A_powers)
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ]
B = np.hstack( B_list )

The transpose and stack may be reversed, this may be faster:

B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ]
B = np.transpose( np.vstack( B_list ) )

This is likely only worth doing if the function you are calculating is quite expensive, or it is duplicated many, many times (more than 10); doing a tile and stack to prevent calculating the power function an extra 10 times is likely not worth it. Please benchmark and let us know.

EDIT: Or, you could just use broadcasting to get rid of the list comprehension:

>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B = np.power(A,[[3],[4],[5]])
>>> B
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

This is probably pretty fast, but doesn't actually do what you asked.

Replace values in Python numpy array based on value from dictionary

Here is code that does what you've asked:

        import numpy as np
        a = [[ 0., -1.,  1.,  1.],
             [ 0.,  1., -2., -3.],
             [-1.,  1.,  1., -5.],
             [-3., -1., -1.,  2.],
             [-5.,  2., -4., -2.],
             [-1., -3., -1.,  2.],
             [ 0.,  1., -3.,  1.],
             [-2., -3.,  0., -2.],
             [-2., -2.,  1., -6.],
             [-0., -2.,  2., -0.]]
        d = {-13: 13.0,
              -12: 9.375,
              -11: 9.4,
              -10: 8.6,
              -9: 8.3,
              -8: 7.8,
              -7: 7.1,
              -6: 6.4,
              -5: 5.8,
              -4: 5.2,
              -3: 4.6,
              -2: 4.0,
              -1: 3.6,
              0: 3.2,
              1: 2.8,
              2: 2.5,
              3: 2.2,
              4: 2.0,
              5: 1.8,
              6: 1.6}
        x = np.array(a)
        y = np.copy(x)
        for k, v in d.items():
            x[y == k] = v
        print(x)

I have replaced dict from the question with d to avoid using the name of the dict built-in datatype as a variable name, which can cause problems elsewhere in the same module.

Here is sample output:

[[3.2 3.6 2.8 2.8]
 [3.2 2.8 4.  4.6]
 [3.6 2.8 2.8 5.8]
 [4.6 3.6 3.6 2.5]
 [5.8 2.5 5.2 4. ]
 [3.6 4.6 3.6 2.5]
 [3.2 2.8 4.6 2.8]
 [4.  4.6 3.2 4. ]
 [4.  4.  2.8 6.4]
 [3.2 4.  2.5 3.2]]

Replace all elements of Python NumPy Array that are greater than some value

I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray named arr, you can replace all elements >255 with a value x as follows:

arr[arr > 255] = x

I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop

Fast Replacement of Values in a Numpy Array