Understanding tensordot
The idea with tensordot
is pretty simple - We input the arrays and the respective axes along which the sum-reductions are intended. The axes that take part in sum-reduction are removed in the output and all of the remaining axes from the input arrays are spread-out as different axes in the output keeping the order in which the input arrays are fed.
Let's look at few sample cases with one and two axes of sum-reductions and also swap the input places and see how the order is kept in the output.
I. One axis of sum-reduction
Inputs :
In [7]: A = np.random.randint(2, size=(2, 6, 5))
...: B = np.random.randint(2, size=(3, 2, 4))
...:
Case #1:
In [9]: np.tensordot(A, B, axes=((0),(1))).shape
Out[9]: (6, 5, 3, 4)
A : (2, 6, 5) -> reduction of axis=0
B : (3, 2, 4) -> reduction of axis=1
Output : `(2, 6, 5)`, `(3, 2, 4)` ===(2 gone)==> `(6,5)` + `(3,4)` => `(6,5,3,4)`
Case #2 (same as case #1 but the inputs are fed swapped):
In [8]: np.tensordot(B, A, axes=((1),(0))).shape
Out[8]: (3, 4, 6, 5)
B : (3, 2, 4) -> reduction of axis=1
A : (2, 6, 5) -> reduction of axis=0
Output : `(3, 2, 4)`, `(2, 6, 5)` ===(2 gone)==> `(3,4)` + `(6,5)` => `(3,4,6,5)`.
II. Two axes of sum-reduction
Inputs :
In [11]: A = np.random.randint(2, size=(2, 3, 5))
...: B = np.random.randint(2, size=(3, 2, 4))
...:
Case #1:
In [12]: np.tensordot(A, B, axes=((0,1),(1,0))).shape
Out[12]: (5, 4)
A : (2, 3, 5) -> reduction of axis=(0,1)
B : (3, 2, 4) -> reduction of axis=(1,0)
Output : `(2, 3, 5)`, `(3, 2, 4)` ===(2,3 gone)==> `(5)` + `(4)` => `(5,4)`
Case #2:
In [14]: np.tensordot(B, A, axes=((1,0),(0,1))).shape
Out[14]: (4, 5)
B : (3, 2, 4) -> reduction of axis=(1,0)
A : (2, 3, 5) -> reduction of axis=(0,1)
Output : `(3, 2, 4)`, `(2, 3, 5)` ===(2,3 gone)==> `(4)` + `(5)` => `(4,5)`
We can extend this to as many axes as possible.
How does numpy.tensordot function works step-by-step?
Edit: The initial focus of this answer was on the case where axes
is a tuple, specifying one or more axes for each argument. This use allows us to perform variations on the conventional dot
, especially for arrays larger than 2d (my answer in the linked question also, https://stackoverflow.com/a/41870980/901925). Axes as scalar is a special case, that gets translated into the tuples version. So at its core it is still a dot
product.
axes as tuple
In [235]: a=[1,1]; b=[2,2]
a
and b
are lists; tensordot
turns them into arrays.
In [236]: np.tensordot(a,b,(0,0))
Out[236]: array(4)
Since they are both 1d arrays, we specify the axis values as 0.
If we try to specify 1:
In [237]: np.tensordot(a,b,(0,1))
---------------------------------------------------------------------------
1282 else:
1283 for k in range(na):
-> 1284 if as_[axes_a[k]] != bs[axes_b[k]]:
1285 equal = False
1286 break
IndexError: tuple index out of range
It is checking whether size of axis 0 of a
matches the size of axis 1 of b
. But since b
is 1d, it can't check that.
In [239]: np.array(a).shape[0]
Out[239]: 2
In [240]: np.array(b).shape[1]
IndexError: tuple index out of range
Your second example is 2d arrays:
In [242]: a=np.array([[1,1],[1,1]]); b=np.array([[2,2],[2,2]])
Specifying the last axis of a
and first of b
(second to the last), produces the conventional matrix (dot) product:
In [243]: np.tensordot(a,b,(1,0))
Out[243]:
array([[4, 4],
[4, 4]])
In [244]: a.dot(b)
Out[244]:
array([[4, 4],
[4, 4]])
Better diagnostic values:
In [250]: a=np.array([[1,2],[3,4]]); b=np.array([[2,3],[2,1]])
In [251]: np.tensordot(a,b,(1,0))
Out[251]:
array([[ 6, 5],
[14, 13]])
In [252]: np.dot(a,b)
Out[252]:
array([[ 6, 5],
[14, 13]])
In [253]: np.tensordot(a,b,(0,1))
Out[253]:
array([[11, 5],
[16, 8]])
In [254]: np.dot(b,a) # same numbers, different layout
Out[254]:
array([[11, 16],
[ 5, 8]])
In [255]: np.dot(b,a).T
Out[255]:
array([[11, 5],
[16, 8]])
Another pairing:
In [256]: np.tensordot(a,b,(0,0))
In [257]: np.dot(a.T,b)
(0,1,2) for axis is plain wrong. The axis parameter should be 2 numbers, or 2 tuples, corresponding to the 2 arguments.
The basic processing in tensordot
is to transpose and reshape the inputs so it can then pass the results to np.dot
for a conventional (last of a, second to the last of b) matrix product.
axes as scalar
If my reading of tensordot
code is right, the axes
parameter is converted into two lists with:
def foo(axes):
try:
iter(axes)
except Exception:
axes_a = list(range(-axes, 0))
axes_b = list(range(0, axes))
else:
axes_a, axes_b = axes
try:
na = len(axes_a)
axes_a = list(axes_a)
except TypeError:
axes_a = [axes_a]
na = 1
try:
nb = len(axes_b)
axes_b = list(axes_b)
except TypeError:
axes_b = [axes_b]
nb = 1
return axes_a, axes_b
For scalar values, 0,1,2 the results are:
In [281]: foo(0)
Out[281]: ([], [])
In [282]: foo(1)
Out[282]: ([-1], [0])
In [283]: foo(2)
Out[283]: ([-2, -1], [0, 1])
axes=1
is the same as specifying in a tuple:
In [284]: foo((-1,0))
Out[284]: ([-1], [0])
And for 2:
In [285]: foo(((-2,-1),(0,1)))
Out[285]: ([-2, -1], [0, 1])
With my latest example, axes=2
is the same as specifying a dot
over all axes of the 2 arrays:
In [287]: np.tensordot(a,b,axes=2)
Out[287]: array(18)
In [288]: np.tensordot(a,b,axes=((0,1),(0,1)))
Out[288]: array(18)
This is the same as doing dot
on the flattened, 1d, views of the arrays:
In [289]: np.dot(a.ravel(), b.ravel())
Out[289]: 18
I already demonstrated the conventional dot product for these arrays, the axes=1
case.
axes=0
is the same as axes=((),())
, no summation axes for the 2 arrays:
In [292]: foo(((),()))
Out[292]: ([], [])
np.tensordot(a,b,((),()))
is the same as np.tensordot(a,b,axes=0)
It's the -2
in the foo(2)
translation that's giving you problems when the input arrays are 1d. axes=1
is the 'contraction' for 1d array. In other words, don't take the word descriptions in the documentation too literally. They just attempt to describe the action of the code; they aren't a formal specification.
einsum equivalents
I think the axes specifications for einsum
are clearer and more powerful. Here are the equivalents for 0,1,2
In [295]: np.einsum('ij,kl',a,b)
Out[295]:
array([[[[ 2, 3],
[ 2, 1]],
[[ 4, 6],
[ 4, 2]]],
[[[ 6, 9],
[ 6, 3]],
[[ 8, 12],
[ 8, 4]]]])
In [296]: np.einsum('ij,jk',a,b)
Out[296]:
array([[ 6, 5],
[14, 13]])
In [297]: np.einsum('ij,ij',a,b)
Out[297]: 18
The axes=0 case, is equivalent to:
np.dot(a[:,:,None],b[:,None,:])
It adds a new last axis and new 2nd to last axis, and does a conventional dot product summing over those. But we usually do this sort of 'outer' multiplication with broadcasting:
a[:,:,None,None]*b[None,None,:,:]
While the use of 0,1,2 for axes is interesting, it really doesn't add new calculation power. The tuple form of axes is more powerful and useful.
code summary (big steps)
1 - translate axes
into axes_a
and axes_b
as excerpted in the above foo
function
2 - make a
and b
into arrays, and get the shape and ndim
3 - check for matching size on axes that will be summed (contracted)
4 - construct a newshape_a
and newaxes_a
; same for b
(complex step)
5 - at = a.transpose(newaxes_a).reshape(newshape_a)
; same for b
6 - res = dot(at, bt)
7 - reshape the res
to desired return shape
5 and 6 are the calculation core. 4 is conceptually the most complex step. For all axes
values the calculation is the same, a dot
product, but the setup varies.
beyond 0,1,2
While the documentation only mentions 0,1,2 for scalar axes, the code isn't restricted to those values
In [331]: foo(3)
Out[331]: ([-3, -2, -1], [0, 1, 2])
If the inputs are 3, axes=3 should work:
In [330]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=3)
Out[330]: array(8.)
or more generally:
In [325]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=0).shape
Out[325]: (2, 2, 2, 2, 2, 2)
In [326]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=1).shape
Out[326]: (2, 2, 2, 2)
In [327]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=2).shape
Out[327]: (2, 2)
In [328]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=3).shape
Out[328]: ()
and if the inputs are 0d, axes=0 works (axes = 1 does not):
In [335]: np.tensordot(2,3, axes=0)
Out[335]: array(6)
Can you explain this?
In [363]: np.tensordot(np.ones((4,2,3)),np.ones((2,3,4)),axes=2).shape
Out[363]: (4, 4)
I've played around with other scalar axes values for 3d arrays. While it is possible to come up with pairs of shapes that work, the more explicit tuple axes values is easier to work with. The 0,1,2
options are short cuts that only work for special cases. The tuple approach is much easier to use - though I still prefer the einsum
notation.
understanding numpy np.tensordot
You forgot to show the arrays:
In [87]: arr1
Out[87]:
array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
In [88]: arr2
Out[88]:
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [89]: ans
Out[89]:
array([[ 8, 9, 10, 11],
[ 32, 37, 42, 47],
[ 56, 65, 74, 83],
[ 80, 93, 106, 119]])
In [90]: ans2
Out[90]:
array([[ 76, 124],
[ 98, 162]])
In [91]: ans3
Out[91]: array(238)
ans
is just the regular dot, matrix product:
In [92]: np.dot(arr1,arr2)
Out[92]:
array([[ 8, 9, 10, 11],
[ 32, 37, 42, 47],
[ 56, 65, 74, 83],
[ 80, 93, 106, 119]])
The dot
sum-of-products is performed on ([1],[0])
axis 1 of arr1
, and axis 0 of arr2
(the conventional across the columns, down the rows). With 2d 'sum across ...' phrase can be confusing. It's clearer when dealing with 1 or 3d arrays. Here the matching size 2 dimensions are summed, leaving the (4,4).
ans2
reverses them, summing on the 4's, producing a (2,2):
In [94]: np.dot(arr2,arr1)
Out[94]:
array([[ 76, 98],
[124, 162]])
tensordot
has just transposed the 2 arrays and performed a regular dot
:
In [95]: np.dot(arr1.T,arr2.T)
Out[95]:
array([[ 76, 124],
[ 98, 162]])
ans3
is uses a transpose and reshape (ravel
), to sum on both axes:
In [98]: np.dot(arr1.ravel(),arr2.T.ravel())
Out[98]: 238
In general, tensordot
uses a mix of transpose and reshape to reduce the problem to a 2d np.dot
problem. It may then reshape and transpose the result.
I find the dimensions control of einsum
to be clearer:
In [99]: np.einsum('ij,jk->ik',arr1,arr2)
Out[99]:
array([[ 8, 9, 10, 11],
[ 32, 37, 42, 47],
[ 56, 65, 74, 83],
[ 80, 93, 106, 119]])
In [100]: np.einsum('ji,kj->ik',arr1,arr2)
Out[100]:
array([[ 76, 124],
[ 98, 162]])
In [101]: np.einsum('ij,ji',arr1,arr2)
Out[101]: 238
With the development of einsum
and matmul/@
, tensordot
has become less necessary. It's harder to understand, and doesn't have any speed or flexibility advantages. Don't worry about understanding it.
ans3
is the trace (sum of diagonal) of the other 2 ans:
In [103]: np.trace(ans)
Out[103]: 238
In [104]: np.trace(ans2)
Out[104]: 238
How numpy.tensordot command works?and what is the meaning of summing over axis in this command?
In [432]: a=np.array([[1,2],[3,4]]); b=np.array([[0,5],[-1,20]])
In [433]: np.tensordot(a,b,axes=(1,0))
Out[433]:
array([[-2, 45],
[-4, 95]])
The (1,0) means axis 1 of a
and axis 0 of b
are the sum-of-products axes. That's just the normal np.dot
pairing:
In [434]: np.dot(a,b)
Out[434]:
array([[-2, 45],
[-4, 95]])
I find the einsum
notation to be clearer:
In [435]: np.einsum('ij,jk->ik',a,b)
Out[435]:
array([[-2, 45],
[-4, 95]])
In any case this is matrix product we learned in school - run your finger across the rows of a
, and the down the columns of b
.
[[1*0+2*-1, 1*5+2*20], ...]
Yet another expression - expanding from the einsum
one:
In [440]: (a[:,:,None]*b[None,:,:]).sum(axis=1)
Out[440]:
array([[-2, 45],
[-4, 95]])
tensordot
reshapes and transposes axes, aiming to reduce the problem to a simple call to np.dot
. It then reshapes/transposes back as needed. The details depend on the axes
parameters. In your case no reshaping is needed, since your specification matches the default dot
action.
A tuple axes parameter is relatively easy to explain. There is also a scalar axis case (0,1,2 etc), that's a bit trickier. I've explored that in another post.
what is the difference between matrix multiplication methods and functions in tensorflow?
Let us understand this with below example, I have taken two matrix a, b
to perform these functions:
import tensorflow as tf
a = tf.constant([[1, 2],
[3, 4]])
b = tf.constant([[1, 1],
[1, 1]]) # or `tf.ones([2,2])`
tf.matmul(a,b)
and (a @ b)
- both performs matrix mutiplication
print(tf.matmul(a, b), "\n") # matrix - multiplication
Output:
tf.Tensor(
[[3 3]
[7 7]], shape=(2, 2), dtype=int32)
You can see the same output here as well for same matrix:
print(a @ b, "\n") # @ used as matrix_multiplication operator
Output:
tf.Tensor(
[[3 3]
[7 7]], shape=(2, 2), dtype=int32)
tf.tensordot()
- Tensordot (also known as tensor contraction) sums the product of elements from a and b over the indices specified by axes .
if we take axes=0
(scalar, no axes):
print(tf.tensordot(a, b, axes=0), "\n")
#One by one each element(scalar) of first matrix multiply with all element of second matrix and keeps output in separate matrix for each element multiplication.
Output:
tf.Tensor(
[[[[1 1]
[1 1]]
[[2 2]
[2 2]]]
[[[3 3]
[3 3]]
[[4 4]
[4 4]]]], shape=(2, 2, 2, 2), dtype=int32)
if we change axes=1
:
print(tf.tensordot(a, b, axes=1), "\n")
# performs matrix-multiplication
Output:
tf.Tensor(
[[3 3]
[7 7]], shape=(2, 2), dtype=int32)
and for axes=2
:
print(tf.tensordot(a, b, axes=2), "\n")
# performs element-wise multiplication,sums the result into scalar.
Output:
tf.Tensor(10, shape=(), dtype=int32)
You can explore more about tf.tensordot() and basic details on axes in given links.
How to select numpy tensordot axes
The problem is likely much simpler than you are making it. If you apply np.tensordot
to a pair of arrays of shape (w, h, 2)
along the last axis, you will get a result of shape (w, h, w, h)
. This is not what you want. There are three simple approaches here. In addition to showing the options, I've shown a few tips and tricks for making the code simpler without changing any of the basic functionality:
Do the sum-reduction manually (using
+
and*
):def average_angular_error(estimated_oc : np.ndarray, target_oc : np.ndarray):
# If you want to do in-place normalization, do x /= ... instead of x = x / ...
estimated_oc = estimated_oc / np.linalg.norm(estimated_oc, axis=-1, keepdims=True)
target_oc = target_oc / np.linalg.norm(target_oc, axis=-1, keepdims=True)
# Use plain element-wise multiplication
dots = np.sum(estimated_oc * target_oc, axis=-1)
return np.arccos(dots).mean()Use
np.matmul
(a.k.a.@
) with properly broadcasted dimensions:def average_angular_error(estimated_oc : np.ndarray, target_oc : np.ndarray):
estimated_oc = estimated_oc / np.linalg.norm(estimated_oc, axis=-1, keepdims=True)
target_oc = target_oc / np.linalg.norm(target_oc, axis=-1, keepdims=True)
# Matrix multiplication needs two dimensions to operate on
dots = estimated_oc[..., None, :] @ target_oc[..., :, None]
return np.arccos(dots).mean()np.matmul
andnp.dot
both require the last dimension of the first array to match the second to last of the second, like with normal matrix multiplication.None
is an alias fornp.newaxis
, which introduces a new axis of size 1 at the location of your choice. In this case, I made the first array(w, h, 1, 2)
and the second(w, h, 2, 1)
. That ensures that the last two dimensions are multiplied as a transposed vector and a regular vector at every corresponding element.Use
np.einsum
:def average_angular_error(estimated_oc : np.ndarray, target_oc : np.ndarray):
estimated_oc = estimated_oc / np.linalg.norm(estimated_oc, axis=-1, keepdims=True)
target_oc = target_oc / np.linalg.norm(target_oc, axis=-1, keepdims=True)
# Matrix multiplication needs two dimensions to operate on
dots = np.einsum('ijk,ijk->ik', estimated_oc, target_oc)
return np.arccos(dots).mean()
You can't use np.dot
or np.tensordot
for this. dot
and tensordot
keep the untouched dimensions of both arrays, as explained earlier. matmul
broadcasts them together, which is what you want.
What is the difference between tensordot and einsum in numpy?
They are different approaches to similar problems. einsum
is more general. Speed can be similar, though you need to check individual cases.
tensordot
works by reshaping and transposing axes, reducing the problem to one that np.dot
can solve. Its code, up to the dot
call is Python, so you can read it for yourself.
einsum
is built 'from-ground-up' to work with the 'Einstein notation' that is used in physics (it was written by scientist for fit his needs and usage). The documentation covers that. It is C code, so is a little harder to study. Basically it parses the indexing string, and builds an nditer
object that will iterate over the input arrays, performing some sort of sum-of-products calculation. It can take short cuts in case where you just want indexing, diagonal, etc.
There have been number of questions asking about either of these functions, or suggesting their use in the answers.
In new versions there is also a np.matmul
that generalized dot
in a different way. It is linked to the new @
operator in Python3.5.
NumPy tensordot grouped calculation
You can use: np.diag(np.tensordot(a, b, axes=((1, 2), (1, 2))))
to get the result you want. However, using np.tensordot
or a matrix multiplication is not a good idea in you case as they do much more work than needed. The fact that they are efficiently implemented does not balance the fact that they do much more computation than needed (only the diagonal is useful here). np.einsum('ijk,ijk->i',a,b)
does not compute more things than needed in your case. You can try the optimize=True
or even optimize='optimal'
since the parameter optimize
is set to False
by default. If this is not fast enough, you can try to use NumExpr so to compute np.sum(a*b,axis=(1, 2))
more efficiently (probably in parallel). Alternatively, you can use Numba or Cython too. Both supports fast parallel loops.
Related Topics
Scatter Plot and Color Mapping in Python
Why am I Getting Attributeerror: Object Has No Attribute
How to Run Python Code from Sublime Text 2
Python: JSON.Loads Returns Items Prefixing with 'U'
How to Print Variables Without Spaces Between Values
Regex Matching 5-Digit Substrings Not Enclosed with Digits
Problem Http Error 403 in Python 3 Web Scraping
How to Obtain the Element-Wise Logical Not of a Pandas Series
Selenium Compound Class Names Not Permitted
"Cloning" Row or Column Vectors
How to Export Keras .H5 to Tensorflow .Pb
Computing the Correlation Coefficient Between Two Multi-Dimensional Arrays
Relative Imports - Modulenotfounderror: No Module Named X
Numpy Array Assignment with Copy
Replace() Method Not Working on Pandas Dataframe
Execute Multiple Commands in Paramiko So That Commands Are Affected by Their Predecessors