Most efficient way to apply operation on each element of Numpy array
I think map is the most appropriate function for this
map() a function with a numpy array and lists as arguments
Check what map
is feeding your func
:
In [31]: def func(arr, a, b):
...: print(arr,a,b)
...: return 1
...:
...:
In [32]: a = numpy.array([0,1,2])
...: b = numpy.array([0,2,0])
...: arr = numpy.array([[0,2,3],[4,4,0]])
...:
...: out = map(func, arr, a, b)
...: list(out)
[0 2 3] 0 0
[4 4 0] 1 2
Out[32]: [1, 1]
transpose arr
so it's (3,2)
...: out = map(func, arr.T, a, b)
...: list(out)
[0 4] 0 0
[2 4] 1 2
[3 0] 2 0
Out[33]: [1, 1, 1]
It's iterating over all arguments, not just a
and b
. And using the shortest.
It's the same sort of iteration that we get from zip
:
In [34]: list(zip(arr,a,b))
Out[34]: [(array([0, 2, 3]), 0, 0), (array([4, 4, 0]), 1, 2)]
In [35]: list(zip(arr.T,a,b))
Out[35]: [(array([0, 4]), 0, 0), (array([2, 4]), 1, 2), (array([3, 0]), 2, 0)]
Leave arr
outside of the map, taking it as a global:
In [36]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: mean = numpy.mean(sub, axis=0)
...: return mean
...:
In [37]: list(map(func,a,b))
Out[37]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]
map
docs:
map(func, *iterables) --> map object
Make an iterator that computes the function using arguments from
each of the iterables. Stops when the shortest iterable is exhausted.
Let's add a print to get a clearer idea of what your func is doing:
In [56]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: print(a,b,sub)
...: mean = numpy.mean(sub, axis=0)
...: return mean
...:
In [57]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[57]: [array([4., 4., 0.]), array([4., 4., 0.]), array([0., 2., 3.])]
With that indexing sub
is a (1,3) array, so the mean
does do anything interesting
Drop the axis
, it's more interesting:
In [59]: def func(a, b):
...: sub = arr[arr[:,a] > b]
...: print(a,b,sub)
...: mean = numpy.mean(sub)
...: return mean
...:
...:
In [60]: list(map(func,a,b))
0 0 [[4 4 0]]
1 2 [[4 4 0]]
2 0 [[0 2 3]]
Out[60]: [2.6666666666666665, 2.6666666666666665, 1.6666666666666667]
This indexing of arr
selects whole rows, in this case the 2nd 2 times, and the 1st once.
How to map function over numpy with condition on each variable?
This has problems:
a = np.array([1, 2, 3, 4, 5])
g = lambda x: 0 if x % 2 == 0 else 1
g(a)
A lambda is essentially just an unnamed function, which you happen to be naming here, so you might as well:
def g(x):
return 0 if x % 2 == 0 else 1
But that's still a bit odd, since taking an integer modulo 2 already is 0 or 1, so this would be the same (when applied to integers, which is what you're looking to do):
def g(x):
return x % 2
At which point you have to wonder if a function is needed at all. And it isn't, this works:
a = np.array([1, 2, 3, 4, 5])
a % 2
However, note that the mistake you made is that f = lambda x: x ** 2
followed by f(a)
works not because it applies the operation to each element - it applies the operation to the array, and the array supports spreading of the operation to its elements for raising to a power, just like it does for the modulo operator, which is why a % 2
works.
Result:
array([1, 0, 1, 0, 1], dtype=int32)
Note that this type of spreading isn't something that generally works - you shouldn't expect Python to just do the spreading when needed for any data type (like a list or set). It's a feature of numpy
's implementation of arrays, the operations have been defined on the array and implemented to spread the operation over the elements.
How to map a function over numpy array
You should use numpy to apply your function to every element of an array.
Ex :
import numpy as np
np.sqrt(np.square(x) + np.square(y))
Map function with array based on integers on 2D numpy array
There are a couple of ways of doing this. I would probably use the where
and out
arguments to np.subtract
:
np.subtract(np.ones(len(b)), a, out=np.broadcast_to(a, b.shape).copy(), where=b.astype(bool))
Going with @hpaulj's solution to use the 3-arg version of np.where
is probably much cleaner in this case:
np.where(b, a, 1 - a)
Applying a function along a numpy array
Function numpy.apply_along_axis
is not good for this purpose.
Try to use numpy.vectorize
to vectorize your function: https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
This function defines a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns an single or tuple of numpy array as output.
import numpy as np
import math
# custom function
def sigmoid(x):
return 1 / (1 + math.exp(-x))
# define vectorized sigmoid
sigmoid_v = np.vectorize(sigmoid)
# test
scores = np.array([ -0.54761371, 17.04850603, 4.86054302])
print sigmoid_v(scores)
Output: [ 0.36641822 0.99999996 0.99231327]
Performance test which shows that the scipy.special.expit
is the best solution to calculate logistic function and vectorized variant comes to the worst:
import numpy as np
import math
import timeit
def sigmoid_(x):
return 1 / (1 + math.exp(-x))
sigmoidv = np.vectorize(sigmoid_)
def sigmoid(x):
return 1 / (1 + np.exp(x))
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(100)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(100)", number=25)
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(1000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(1000)", number=25)
print timeit.timeit("sigmoidv(scores)", "from __main__ import sigmoidv, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("sigmoid(scores)", "from __main__ import sigmoid, np; scores = np.random.randn(10000)", number=25),\
timeit.timeit("expit(scores)", "from scipy.special import expit; import numpy as np; scores = np.random.randn(10000)", number=25)
Results:
size vectorized numpy expit
N=100: 0.00179314613342 0.000460863113403 0.000132083892822
N=1000: 0.0122890472412 0.00084114074707 0.000464916229248
N=10000: 0.109477043152 0.00530695915222 0.00424313545227
Related Topics
How to Call a Script from Another Script
Which Python Memory Profiler Is Recommended
Difference Between Text and Innerhtml Using Selenium
Efficient Way to Rotate a List in Python
Reference Template Variable Within Jinja Expression
Pandas Percentage of Total With Groupby
Typeerror: 'List' Object Is Not Callable in Python
How to Query as Group by in Django
How to Read a Text File into a String Variable and Strip Newlines
How to Use Glob() to Find Files Recursively
Converting Unix Timestamp String to Readable Date
Can't Modify List Elements in a Loop
Matplotlib/Seaborn: First and Last Row Cut in Half of Heatmap Plot
Find If 24 Hrs Have Passed Between Datetimes