Most Efficient Way to Map Function Over Numpy Array

Most efficient way to apply operation on each element of Numpy array

I think map is the most appropriate function for this

map() a function with a numpy array and lists as arguments

Check what map is feeding your func:

In [31]: def func(arr, a, b):
...: print(arr,a,b)
...: return 1
...:
...:
In [32]: a = numpy.array([0,1,2])
...: b = numpy.array([0,2,0])
...: arr = numpy.array([[0,2,3],[4,4,0]])
...:
...: out = map(func, arr, a, b)
...: list(out)
[0 2 3] 0 0
[4 4 0] 1 2
Out[32]: [1, 1]

transpose arr so it's (3,2)

    ...: out = map(func, arr.T, a, b)
...: list(out)
[0 4] 0 0
[2 4] 1 2
[3 0] 2 0
Out[33]: [1, 1, 1]

It's iterating over all arguments, not just a and b. And using the shortest.

It's the same sort of iteration that we get from zip:

In [34]: list(zip(arr,a,b))
Out[34]: [(array([0, 2, 3]), 0, 0), (array([4, 4, 0]), 1, 2)]
In [35]: list(zip(arr.T,a,b))
Out[35]: [(array([0, 4]), 0, 0), (array([2, 4]), 1, 2), (array([3, 0]), 2, 0)]

Leave arr outside of the map, taking it as a global:

In [36]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: mean = numpy.mean(sub, axis=0)
...: return mean
...:
In [37]: list(map(func,a,b))
Out[37]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]

map docs:

map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables. Stops when the shortest iterable is exhausted.

Let's add a print to get a clearer idea of what your func is doing:

In [56]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: print(a,b,sub)
...: mean = numpy.mean(sub, axis=0)
...: return mean
...:
In [57]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[57]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]

With that indexing sub is a (1,3) array, so the mean does do anything interesting

Drop the axis, it's more interesting:

In [59]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: print(a,b,sub)
...: mean = numpy.mean(sub)
...: return mean
...:
...:
In [60]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[60]: [2.6666666666666665, 2.6666666666666665, 1.6666666666666667]

This indexing of arr selects whole rows, in this case the 2nd 2 times, and the 1st once.

How to map function over numpy with condition on each variable?

This has problems:

a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)

A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:

def g(x):
return 0 if x % 2 == 0 else 1

But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):

def g(x):
return x % 2

At which point you have to wonder if a function is needed at all. And it isn't, this works:

a = np.array([1, 2, 3, 4, 5])
a % 2

However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.

Result:

array([1, 0, 1, 0, 1], dtype=int32)

Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.

How to map a function over numpy array

You should use numpy to apply your function to every element of an array.

Ex :

import numpy as np
np.sqrt(np.square(x) + np.square(y))

Map function with array based on integers on 2D numpy array

There are a couple of ways of doing this. I would probably use the where and out arguments to np.subtract:

np.subtract(np.ones(len(b)), a, out=np.broadcast_to(a, b.shape).copy(), where=b.astype(bool))

Going with @hpaulj's solution to use the 3-arg version of np.where is probably much cleaner in this case:

np.where(b, a, 1 - a)

Applying a function along a numpy array

Function numpy.apply_along_axis is not good for this purpose.
Try to use numpy.vectorize to vectorize your function: https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
This function defines a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.

import numpy as np
import math

# custom function
def sigmoid(x):
return 1 / (1 + math.exp(-x))

# define vectorized sigmoid
sigmoid_v = np.vectorize(sigmoid)

# test
scores = np.array([ -0.54761371, 17.04850603, 4.86054302])
print sigmoid_v(scores)

Output: [ 0.36641822 0.99999996 0.99231327]

Performance test which shows that the scipy.special.expit is the best solution to calculate logistic function and vectorized variant comes to the worst:

import numpy as np
import math
import timeit

def sigmoid_(x):
return 1 / (1 + math.exp(-x))
sigmoidv = np.vectorize(sigmoid_)

def sigmoid(x):
return 1 / (1 + np.exp(x))

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(100)", number=25)

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(1000)", number=25)

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(10000)", number=25)

Results:

size        vectorized      numpy                 expit
N=100: 0.00179314613342 0.000460863113403 0.000132083892822
N=1000: 0.0122890472412 0.00084114074707 0.000464916229248
N=10000: 0.109477043152 0.00530695915222 0.00424313545227


Related Topics



Leave a reply



Submit