Most Efficient Way to Map Function Over Numpy Array

Most efficient way to apply operation on each element of Numpy array

I think map is the most appropriate function for this

map() a function with a numpy array and lists as arguments

Check what map is feeding your func:

In [31]: def func(arr, a, b):
    ...:    print(arr,a,b)
    ...:    return 1
    ...: 
    ...: 
In [32]: a = numpy.array([0,1,2])
    ...: b = numpy.array([0,2,0])
    ...: arr = numpy.array([[0,2,3],[4,4,0]])
    ...: 
    ...: out = map(func, arr, a, b)
    ...: list(out)
[0 2 3] 0 0
[4 4 0] 1 2
Out[32]: [1, 1]

transpose arr so it's (3,2)

    ...: out = map(func, arr.T, a, b)
    ...: list(out)
[0 4] 0 0
[2 4] 1 2
[3 0] 2 0
Out[33]: [1, 1, 1]

It's iterating over all arguments, not just a and b. And using the shortest.

It's the same sort of iteration that we get from zip:

In [34]: list(zip(arr,a,b))
Out[34]: [(array([0, 2, 3]), 0, 0), (array([4, 4, 0]), 1, 2)]
In [35]: list(zip(arr.T,a,b))
Out[35]: [(array([0, 4]), 0, 0), (array([2, 4]), 1, 2), (array([3, 0]), 2, 0)]

Leave arr outside of the map, taking it as a global:

In [36]: def func(a, b):
    ...:     sub = arr[arr[:,a] > b]
    ...:     mean = numpy.mean(sub, axis=0)
    ...:     return mean
    ...: 
In [37]: list(map(func,a,b))
Out[37]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]

map docs:

map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables.  Stops when the shortest iterable is exhausted.

Let's add a print to get a clearer idea of what your func is doing:

In [56]: def func(a, b):
    ...:     sub = arr[arr[:,a] > b]
    ...:     print(a,b,sub)
    ...:     mean = numpy.mean(sub, axis=0)
    ...:     return mean
    ...: 
In [57]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[57]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]

With that indexing sub is a (1,3) array, so the mean does do anything interesting

Drop the axis, it's more interesting:

In [59]: def func(a, b):
    ...:     sub = arr[arr[:,a] > b]
    ...:     print(a,b,sub)
    ...:     mean = numpy.mean(sub)
    ...:     return mean
    ...: 
    ...: 
In [60]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[60]: [2.6666666666666665, 2.6666666666666665, 1.6666666666666667]

This indexing of arr selects whole rows, in this case the 2nd 2 times, and the 1st once.

How to map function over numpy with condition on each variable?

This has problems:

a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)

A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:

def g(x):
    return 0 if x % 2 == 0 else 1

But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):

def g(x):
    return x % 2

At which point you have to wonder if a function is needed at all. And it isn't, this works:

a = np.array([1, 2, 3, 4, 5])
a % 2

However, note that the mistake you made is that f = lambda x: x ** 2 followed by f(a) works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2 works.

Result:

array([1, 0, 1, 0, 1], dtype=int32)

Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.

How to map a function over numpy array

You should use numpy to apply your function to every element of an array.

Ex :

import numpy as np
np.sqrt(np.square(x) + np.square(y))

Map function with array based on integers on 2D numpy array

There are a couple of ways of doing this. I would probably use the where and out arguments to np.subtract:

np.subtract(np.ones(len(b)), a, out=np.broadcast_to(a, b.shape).copy(), where=b.astype(bool))

Going with @hpaulj's solution to use the 3-arg version of np.where is probably much cleaner in this case:

np.where(b, a, 1 - a)

Applying a function along a numpy array

Function numpy.apply_along_axis is not good for this purpose.
Try to use numpy.vectorize to vectorize your function: https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
This function defines a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.

import numpy as np
import math

# custom function
def sigmoid(x):
  return 1 / (1 + math.exp(-x))

# define vectorized sigmoid
sigmoid_v = np.vectorize(sigmoid)

# test
scores = np.array([ -0.54761371,  17.04850603,   4.86054302])
print sigmoid_v(scores)

Output: [ 0.36641822 0.99999996 0.99231327]

Performance test which shows that the scipy.special.expit is the best solution to calculate logistic function and vectorized variant comes to the worst:

import numpy as np
import math
import timeit

def sigmoid_(x):
  return 1 / (1 + math.exp(-x))
sigmoidv = np.vectorize(sigmoid_)

def sigmoid(x):
   return 1 / (1 + np.exp(x))

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(100)",  number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np;   scores = np.random.randn(100)",  number=25)

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(1000)",  number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np;   scores = np.random.randn(1000)",  number=25)

print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(10000)",  number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np;   scores = np.random.randn(10000)",  number=25)

Results:

size        vectorized      numpy                 expit
N=100:   0.00179314613342 0.000460863113403 0.000132083892822
N=1000:  0.0122890472412  0.00084114074707  0.000464916229248
N=10000: 0.109477043152   0.00530695915222  0.00424313545227