Python 3: Multiply a vector by a matrix without NumPy
The Numpythonic approach: (using numpy.dot
in order to get the dot product of two matrices)
In [1]: import numpy as np
In [3]: np.dot([1,0,0,1,0,0], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]])
Out[3]: array([1, 1])
The Pythonic approach:
The length of your second for
loop is len(v)
and you attempt to indexing v
based on that so you got index Error . As a more pythonic way you can use zip
function to get the columns of a list then use starmap
and mul
within a list comprehension:
In [13]: first,second=[1,0,0,1,0,0], [[0,1],[1,1],[1,0],[1,0],[1,1],[0,1]]
In [14]: from itertools import starmap
In [15]: from operator import mul
In [16]: [sum(starmap(mul, zip(first, col))) for col in zip(*second)]
Out[16]: [1, 1]
matrix multiplication with scalar without numpy
Since the OP says that he is not allowed to use imports, the following alternative has its own implementation of deepcopy:
class Matrix:
def __init__(self, rows):
self.rows = rows[:]
def deepcopy(self):
rows = [[elem for elem in row]
for row in self.rows
]
return Matrix(rows)
def scale(self, w):
copy = self.deepcopy()
copyr = copy.rows
for i in range(len(copyr)):
for j in range(len(copyr)):
copyr[i][j] = copyr[i][j] * w
return copyr
c = Matrix([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
print(c.scale(10))
print(c.scale(10))
3d Matrix multiplication in numpy
For array of dimension 3 (or, a rank-3 tensor) that you have, you can use np.einsum
doc for more complex matrix multiplications. In your particular case, you can use the following
>>> import numpy as np
>>> x = np.random.randint(0, 3, (3, 3, 3)) # shape (3, 3, 3)
>>> y = np.random.randint(0, 3, (3, 3, 3)) # shape (3, 3, 3)
>>> np.einsum('ijk,ikl->ijl', x, y) # still shape (3, 3, 3)
In particular, the einsum
expression 'ijk,ikl->ijl'
means for each i
th matrix, do a regular matrix multiplication jk,kl->jl
and put the result in the i
th entry in the resulting tensor (ndarray). A more general form of this process could be
np.einsum('...jk,...kl->...jl', x, y)
where you can have arbitrary number of dimensions in front of each tensor (ndarray) that you have.
See the following for a complete example:
>>> import numpy as np
>>> x = np.random.randint(0, 3, (3, 3, 3)) # shape (3, 3, 3)
>>> x
array([[[0, 0, 1],
[2, 2, 1],
[2, 1, 1]],
[[2, 0, 2],
[2, 2, 1],
[2, 2, 2]],
[[2, 2, 2],
[1, 1, 2],
[0, 2, 2]]])
>>> y = np.random.randint(0, 3, (3, 3, 3)) # shape (3, 3, 3)
>>> y
array([[[0, 0, 1],
[2, 1, 0],
[0, 0, 2]],
[[1, 2, 0],
[2, 0, 1],
[2, 2, 1]],
[[0, 2, 1],
[0, 1, 0],
[0, 2, 1]]])
>>> np.einsum('ijk,ikl->ijl', x, y)
array([[[ 0, 0, 2],
[ 4, 2, 4],
[ 2, 1, 4]],
[[ 6, 8, 2],
[ 8, 6, 3],
[10, 8, 4]],
[[ 0, 10, 4],
[ 0, 7, 3],
[ 0, 6, 2]]])
>>> np.einsum('...ij,...jk->...ik', x, y)
array([[[ 0, 0, 2],
[ 4, 2, 4],
[ 2, 1, 4]],
[[ 6, 8, 2],
[ 8, 6, 3],
[10, 8, 4]],
[[ 0, 10, 4],
[ 0, 7, 3],
[ 0, 6, 2]]])
Why is matrix multiplication faster with numpy than with ctypes in Python?
I'm not too familiar with Numpy, but the source is on Github. Part of dot products are implemented in https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arraytypes.c.src, which I'm assuming is translated into specific C implementations for each datatype. For example:
/**begin repeat
*
* #name = BYTE, UBYTE, SHORT, USHORT, INT, UINT,
* LONG, ULONG, LONGLONG, ULONGLONG,
* FLOAT, DOUBLE, LONGDOUBLE,
* DATETIME, TIMEDELTA#
* #type = npy_byte, npy_ubyte, npy_short, npy_ushort, npy_int, npy_uint,
* npy_long, npy_ulong, npy_longlong, npy_ulonglong,
* npy_float, npy_double, npy_longdouble,
* npy_datetime, npy_timedelta#
* #out = npy_long, npy_ulong, npy_long, npy_ulong, npy_long, npy_ulong,
* npy_long, npy_ulong, npy_longlong, npy_ulonglong,
* npy_float, npy_double, npy_longdouble,
* npy_datetime, npy_timedelta#
*/
static void
@name@_dot(char *ip1, npy_intp is1, char *ip2, npy_intp is2, char *op, npy_intp n,
void *NPY_UNUSED(ignore))
{
@out@ tmp = (@out@)0;
npy_intp i;
for (i = 0; i < n; i++, ip1 += is1, ip2 += is2) {
tmp += (@out@)(*((@type@ *)ip1)) *
(@out@)(*((@type@ *)ip2));
}
*((@type@ *)op) = (@type@) tmp;
}
/**end repeat**/
This appears to compute one-dimensional dot products, i.e. on vectors. In my few minutes of Github browsing I was unable to find the source for matrices, but it's possible that it uses one call to FLOAT_dot
for each element in the result matrix. That means the loop in this function corresponds to your inner-most loop.
One difference between them is that the "stride" -- the difference between successive elements in the inputs -- is explicitly computed once before calling the function. In your case there is no stride, and the offset of each input is computed each time, e.g. a[i * n + k]
. I would have expected a good compiler to optimise that away to something similar to the Numpy stride, but perhaps it can't prove that the step is a constant (or it's not being optimised).
Numpy may also be doing something smart with cache effects in the higher-level code that calls this function. A common trick is to think about whether each row is contiguous, or each column -- and try to iterate over each contiguous part first. It seems difficult to be perfectly optimal, for each dot product, one input matrix must be traversed by rows and the other by columns (unless they happened to be stored in different major order). But it can at least do that for the result elements.
Numpy also contains code to choose the implementation of certain operations, including "dot", from different basic implementations. For instance it can use a BLAS library. From discussion above it sounds like CBLAS is used. This was translated from Fortran into C. I think the implementation used in your test would be the one found in here: http://www.netlib.org/clapack/cblas/sdot.c.
Note that this program was written by a machine for another machine to read. But you can see at the bottom that it's using an unrolled loop to process 5 elements at a time:
for (i = mp1; i <= *n; i += 5) {
stemp = stemp + SX(i) * SY(i) + SX(i + 1) * SY(i + 1) + SX(i + 2) *
SY(i + 2) + SX(i + 3) * SY(i + 3) + SX(i + 4) * SY(i + 4);
}
This unrolling factor is likely to have been picked after profiling several. But one theoretical advantage of it is that more arithmetical operations are done between each branch point, and the compiler and CPU have more choice about how to optimally schedule them to get as much instruction pipelining as possible.
Matrix multiplication dimensons
You can use broadcast:
ret = (M @ v[...,None]).reshape(v.shape)
### check
loop = np.array([M[i] @ v[i] for i in range(n)])
(ret==loop).all()
# True
Most efficient way to multiply a small matrix with a scalar in numpy
large_dimension = 1000
a = np.random.random((1,))
B = np.random.random((1, large_dimension))
%timeit np.matmul(a, B)
5.43 µs ± 22 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a[0] * B
5.11 µs ± 6.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Use just float
%timeit float(a[0]) * B
3.48 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
To avoid memory allocation use "buffer"
buffer = np.empty_like(B)
%timeit np.multiply(float(a[0]), B, buffer)
2.96 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
To avoid unnecessary getting attribute use "alias"
mul = np.multiply
%timeit mul(float(a[0]), B, buffer)
2.73 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And I don't recommend using numpy scalars at all,
because if you avoid it, computation will be faster
a_float = float(a[0])
%timeit mul(a_float, B, buffer)
1.94 µs ± 5.74 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Furthermore, if it's possible then initialize buffer out of loop once (of course, if you have something like loop :)
rng = range(1000)
%%timeit
for i in rng:
pass
24.4 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for i in rng:
mul(a_float, B, buffer)
1.91 ms ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So,
"best_iteration_time" = (1.91 - 0.02) / 1000 => 1.89 (µs)
"speedup" = 5.43 / 1.89 = 2.87
Multiple matrix multiplication
Use np.einsum
-
np.einsum('ijk,ik->ij',matrices,vectors)
Steps :
1) Keep the first axes aligned.
2) Sum-reduce the last axes from the input arrays against each other.
3) Let the remainining axes(second axis from matrices
) be element-wise multiplied.
Related Topics
Prevent Python from Caching the Imported Modules
How to Change the Datetime Tick Label Frequency for Matplotlib Plots
Python Slice How-To, I Know the Python Slice But How to Use Built-In Slice Object for It
How to Do N-D Distance and Nearest Neighbor Calculations on Numpy Arrays
Error: Pg_Config Executable Not Found When Installing Psycopg2 on Alpine in Docker
Scaling of Tkinter Gui in 4K (3840*2160) Resolution
How to Convert a List to a List of Tuples
How to Plot Multi-Color Line If X-Axis Is Date Time Index of Pandas
Python How to Read N Number of Lines at a Time
Why Is the Empty Dictionary a Dangerous Default Value in Python
How to Merge Two Lists into a Single List
In Python, How to Escape Newline Characters When Printing a String