numpy.vectorize of function that returns an array
In [237]: f1 = np.vectorize(f, signature='(),()->(n)')
In [238]: f1(np.arange(5),3)
Out[238]:
array([[ 3, -3, 0, 3],
[ 4, -2, 1, 3],
[ 5, -1, 2, 3],
[ 6, 0, 3, 3],
[ 7, 1, 4, 3]])
In [241]: f1(np.arange(5),np.ones((4,5))).shape
Out[241]: (4, 5, 4)
In [242]: f1(np.arange(5),np.ones((1,5))).shape
Out[242]: (1, 5, 4)
frompyfunc
returns a object dtype array:
In [336]: f2 = np.frompyfunc(f,2,1)
In [337]: f2(np.arange(5), 3)
Out[337]:
array([array([ 3, -3, 0, 3]), array([ 4, -2, 1, 3]),
array([ 5, -1, 2, 3]), array([6, 0, 3, 3]), array([7, 1, 4, 3])],
dtype=object)
In [338]: _.shape
Out[338]: (5,)
np.vectorize
, without the signature
, uses frompyfunc
, but adds its own dtype
conversion.
In [340]: f1(np.arange(5), np.arange(3))
ValueError: shape mismatch: objects cannot be broadcast to a single shape
This fails for the same reason the following addition fails:
In [341]: np.arange(5)+np.arange(3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-341-fb1c4f4372da> in <module>
----> 1 np.arange(5)+np.arange(3)
ValueError: operands could not be broadcast together with shapes (5,) (3,)
To get a (5,3) result we need to make the first argument (5,1) shape:
In [342]: np.arange(5)[:,None]+np.arange(3)
Out[342]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Vectorize a function with a condition
The statement np.where(x < 1.1, 1, np.arcsin(1 / x))
is equivalent to
mask = x < 1.1
a = 1
b = np.arcsin(1 / x)
np.where(mask, a, b)
Notice that you're calling np.arcsin
on all the elements of x
, regardless of whether 1 / x <= 1
or not. Your basic plan is correct. You can do the operations in-place on an output array using the where
keyword of np.arcsin
and np.reciprocal
, without having to recombine anything:
def myfx(x):
mask = (x >= 1.1)
out = np.ones(x.shape)
np.reciprocal(x, where=mask, out=out) # >= 1.1 implies != 0
return np.arcsin(out, where=mask, out=out)
Using np.ones
ensures that the unmasked elements of out
are initialized correctly. An equivalent method would be
out = np.empty(x.shape)
out[~mask] = 1
numpy vectorize a function to accepts vectors of different lengths and return the tensor result
You can chain with np.ix_
:
>>> import functools
>>>
>>> def tensorize(f):
... fv = np.vectorize(f)
... @functools.wraps(f)
... def ft(*args):
... return fv(*np.ix_(*map(np.ravel, args)))
... return ft
...
>>> tester = tensorize(tester)
>>> tester(np.arange(3), np.arange(2))
array([[0., 0.],
[0., 1.],
[0., 4.]])
How to use np.vectorize?
The problem is the vectorize
-call inside the function.
import numpy as np
# first define the function
def vector_function(x, y):
if y >= x:
return x * y
else:
return x / y
# vectorize it
vfunc = np.vectorize(vector_function)
# validation
print(vfunc([1, 2, 3, 4], 2.5)) # [2.5 5. 1.2 1.6]
Note, however, from numpy.vectorize: The vectorize
function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
Numpy vectorize function with non-scalar output
frompyfunc
might be better:
In [525]: def fun(x):
...: return x+.1, x+.2, x+.3
...:
I specify 1 input, 3 output values. It returns dtype object:
In [526]: np.frompyfunc(fun,1,3)(np.arange(5))
Out[526]:
(array([0.1, 1.1, 2.1, 3.1, 4.1], dtype=object),
array([0.2, 1.2, 2.2, 3.2, 4.2], dtype=object),
array([0.3, 1.3, 2.3, 3.3, 4.3], dtype=object))
That's a tuple of 3 arrays. They can be turned into one 2d array with stack
:
In [527]: np.stack(_, 1)
Out[527]:
array([[0.1, 0.2, 0.3],
[1.1, 1.2, 1.3],
[2.1, 2.2, 2.3],
[3.1, 3.2, 3.3],
[4.1, 4.2, 4.3]], dtype=object)
I could take it a further step with a astype(float)
.
I assume, of course, that this is a toy func. For something this simple there's no need to use vectorize
.
In [528]: fun(np.arange(5))
Out[528]:
(array([ 0.1, 1.1, 2.1, 3.1, 4.1]),
array([ 0.2, 1.2, 2.2, 3.2, 4.2]),
array([ 0.3, 1.3, 2.3, 3.3, 4.3]))
All that vectorize
needs is the otypes
parameter:
In [536]: np.vectorize(fun, otypes='ddd')(np.arange(5))
Out[536]:
(array([ 0.1, 1.1, 2.1, 3.1, 4.1]),
array([ 0.2, 1.2, 2.2, 3.2, 4.2]),
array([ 0.3, 1.3, 2.3, 3.3, 4.3]))
If the function returns an array instead of a tuple or list, we could use signature
:
In [546]: def fun(x):
...: return np.array([x+.1, x+.2, x+.3])
In [547]: np.vectorize(fun, signature='()->(n)')(np.arange(5))
Out[547]:
array([[ 0.1, 0.2, 0.3],
[ 1.1, 1.2, 1.3],
[ 2.1, 2.2, 2.3],
[ 3.1, 3.2, 3.3],
[ 4.1, 4.2, 4.3]])
Or with the original tuple/list case, wrap it in a lambda, np.vectorize(lambda x:np.array(fun(x)), signature='()->(n)')
Experience suggests that the frompyfunc
approach is fastest. vectorize
with otypes is a bit slower (but it uses frompyfunc
). signature
is newer method, using different code, and somewhat slower.
With your new func
, the signature
approach still works. I added excluded
so it doesn't try to broadcast
the n
argument:
In [553]: np.vectorize(lambda x,n:np.array(func(x,n)), signature='()->(n)',excluded=[1])(np.arange(5),3)
Out[553]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
In [554]: np.vectorize(lambda x,n:np.array(func(x,n)), signature='()->(n)',excluded=[1])(np.arange(5),7)
Out[554]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10]])
NumPy vectorization without the use of numpy.vectorize
"vectorization" can mean be different things depending on context. Use of low level C code with BLAS or SIMD is just one.
In physics 101, a vector represents a point or velocity whose numeric representation can vary with coordinate system. Thus I think of "vectorization", broadly speaking, as performing math operations on the "whole" array, without explicit control over numerical elements.
numpy
basically adds a ndarray
class to python. It has a large number of methods (and operators and ufunc
) that do indexing and math in compiled code (not necessarily using processor specific SIMD). The big gain in speed, relative to Python level iteration, is the use of compiled code optimized for the ndarray
data structure. Python level iteration (interpreted code) on arrays is actually slower than on equivalent lists.
I don't think numpy
formally defines "vectorization". There isn't a "vector" class. I haven't searched the documentation for those terms. Here, and possibly on other forums, it just means, writing code that makes optimal use of ndarray
methods. Generally that means avoiding python level iteration.
np.vectorize
is a tool for applying arrays to functions that only accept scalar inputs. It doesn't not compile or otherwise "look inside" that function. But it does accept and apply arguments in a fully broadcasted
sense, such as in:
In [162]: vfunc(np.arange(3)[:,None],np.arange(4))
Out[162]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
Speedwise np.vectorize
is slower than the equivalent list comprehension, at least for smaller sample cases. Recent testing shows that it scales better, so for large inputs it may be better. But still the performance is nothing like your myfunc_2
.
myfunc
is not "vectorized" simply because expressions like if a > b
do not work with arrays.
np.where(a > b, a - b, a + b)
is "vectorized" because all arguments to the where
work with arrays, and where
itself uses them with full broadcasting
powers.
In [163]: a,b = np.arange(3)[:,None], np.arange(4)
In [164]: np.where(a>b, a-b, a+b)
Out[164]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
myfunc_2
is "vectorized", at least in a
:
In [168]: myfunc_2(a,2)
Out[168]:
array([[4],
[1],
[2]])
It does not work when b
is array; it's trickier to match the a[cond]
shape with anything but a scalar:
In [169]: myfunc_2(a,b)
Traceback (most recent call last):
Input In [169] in <cell line: 1>
myfunc_2(a,b)
Input In [159] in myfunc_2
a[cond] -= b
IndexError: boolean index did not match indexed array along dimension 1; dimension is 1 but corresponding boolean dimension is 4
===
- What are the requirements to say a function is "vectorized" if not converted via numpy.vectorize? Just broadcastable in its entirety?
In your examples, my_func
is not "vectorized" because it only works with scalars. vfunc
is full "vectorized", but not faster. where
is also "vectorized" and (probably) faster, though this may be scale dependent. my_func2
is only "vectorized" in a
.
- How does NumPy determine if a function is "vectorized"/broadcastable?
numpy
doesn't determine anything like this. numpy
is a ndarray
class with many methods. It's just the use of those methods that makes a block of code "vectorized".
- Why isn't this form of vectorization documented anywhere? (i.e., why doesn't NumPy have a "How to write a vectorized function" page?
Keep in mind the distinction between "vectorization" as a performance strategy, and the basic idea of operating on whole arrays.
Truly vectorize function for numpy array in python
here 3 posible solutions, the last one, is a little caothic but uses a vectorization to calculate n
q vs one. Also is the fastests
from itertools import combinations
import numpy as np
import numpy.linalg as lin
def coulomb(q_i, q_j, c1=0.8, c2=0.2, k=0.5):
"""
Parameters
----------
q_i : numpy.ndarray
q_j : numpy.ndarray
c1 : float
c2 : float
k : float
Returns
-------
numpy.ndarray
"""
q_ij = q_i - q_j
q_ij_norm = lin.norm(q_ij)
f_q_i = (k * c1 * c2 / q_ij_norm ** 3) * q_ij
return f_q_i
def coulomb2(q_i, q_j, c1=0.8, c2=0.2, k=0.5):
"""
Parameters
----------
q_i : numpy.ndarray
q_j : numpy.ndarray
c1 : float
c2 : float
k : float
Returns
-------
numpy.ndarray
"""
q_ij = q_i - q_j
q_ij_norm = lin.norm(q_ij,axis=1).reshape(-1,1)
f_q_i = (k * c1 * c2 / q_ij_norm ** 3) * q_ij
return f_q_i
q = np.random.randn(500,10)
from itertools import combinations
from time import time
t1= time()
v = []
for i in range(q.shape[0]):
for j in range(i+1,q.shape[0]):
v.append([coulomb(q[i], q[j])])
t2= time()
combs = combinations(range(len(q)), 2)
vv =[]
for i,j in combs:
vv.append([coulomb(q[i], q[j])])
t3 = time()
vvv = []
for i in range(q.shape[0]):
vvv += list(coulomb2(q[i], q[i+1:]))
t4 = time()
print(t2-t1)
print(t3-t2)
print(t4-t3)
#0.9133327007293701
#1.0843684673309326
#0.04461050033569336
``
Related Topics
How to Make the Width of the Title Box Span the Entire Plot
How to Check If an Object Is a List or Tuple (But Not String)
Pandas: Resample Timeseries with Groupby
Format String Unused Named Arguments
Split String Based on a Regular Expression
Python: How to Remove Empty Lists from a List
How to Install Xgboost Package in Python (Windows Platform)
How to Fix Character Constantly Accelerating in Both Directions After Deceleration Pygame
How to Install the Yaml Package for Python
List All Base Classes in a Hierarchy of Given Class
Python 2.X Gotchas and Landmines
Flask-Sqlalchemy Update a Row's Information
How Would I Make a Method Which Is Run Every Time a Frame Is Shown in Tkinter