Calling Dot Products and Linear Algebra Operations in Cython

calling dot products and linear algebra operations in Cython?

Calling BLAS bundled with Scipy is "fairly" straightforward, here's one example for calling DGEMM to compute matrix multiplication: https://gist.github.com/pv/5437087 Note that BLAS and LAPACK expect all arrays to be Fortran-contiguous (modulo the lda/b/c parameters), hence order="F" and double[::1,:] which are required for correct functioning.

Computing inverses can be similarly done by applying the LAPACK function dgesv on the identity matrix. For the signature, see here. All this requires dropping down to rather low-level coding, you need to allocate temporary work arrays yourself etc etc. --- however these can be encapsulated into your own convenience functions, or just reuse the code from tokyo by replacing the lib_* functions with function pointers obtained from Scipy in the above way.

If you use Cython's memoryview syntax (double[::1,:]) you transpose is the same x.T as usual. Alternatively, you can compute the transpose by writing a function of your own that swaps elements of the array across the diagonal. Numpy doesn't actually contain this operation, x.T only changes the strides of the array and doesn't move the data around.

It would probably be possible to rewrite the tokyo module to use the BLAS/LAPACK exported by Scipy and bundle it in scipy.linalg, so that you could just do from scipy.linalg.blas cimport dgemm. Pull requests are accepted if someone wants to get down to it.

As you can see, it all boils down to passing function pointers around. As alluded to above, Cython does in fact provide its own protocol for exchanging function pointers. For an example, consider from scipy.spatial import qhull; print(qhull.__pyx_capi__) --- those functions could be accessed via from scipy.spatial.qhull cimport XXXX in Cython (they're private though, so don't do that).

However, at the present, scipy.special does not offer this C-API. It would however in fact be quite simple to provide it, given that the interface module in scipy.special is written in Cython.

I don't think there is at the moment any sane and portable way to access the function doing the heavy lifting for gamln, (although you could snoop around the UFunc object, but that's not a sane solution :), so at the moment it's probably best to just grab the relevant part of source code from scipy.special and bundle it with your project, or use e.g. GSL.

Fast basic linear algebra in Cython for recurrent calls

The answer you link to is still a good way to call BLAS function from Cython. It is not really a python wrapper, Python is merely used so get the C pointer to the function and this can be done at initialization time. So you should get essentially C-like speed. I could be wrong, but I think that the upcoming Scipy 0.16 release will provide a convenient BLAS Cython API, based on this approach, it will not change things performance wise.

If you didn't experience any speed up after porting to Cython of repeatedly called BLAS functions, either the python overhead for doing this in numpy doesn't matter (e.g. if the computation itself is the most expensive part) or you are doing something wrong (unnecessary memory copies etc.)

I would say that this approach should be faster and easier to maintain than with GSL, provided of course that you compiled scipy with an optimized BLAS (OpenBLAS, ATLAS, MKL, etc).

Efficient matrix operations in cython with no python objects

A few of small points with respect to your optimized Cython/BLAS functions:

ipiv_a = np.zeros(n).astype(np.int32)
cdef int[::1] ipiv = ipiv_a

can have two easy improvements: it doesn't has to go through a temporary variable, and you can directly create an array of type np.int32 rather than create an array of a different type then casting:

 cdef int[::1] ipiv = np.zeros(n,dtype=np.int32)

Simiarly, in both fucntions you can initialize B in a fewer steps by doing

cdef double[:, ::1] B = A.copy()

rather than creating a zero array and copying

The second (more significant) change would be to use C arrays for temporary variables such as the Fortran workspaces. I'd still keep things like return values as numpy arrays since the reference counting and ability to send them back to Python is very useful.

 cdef double* work = <double*>malloc(n*n*sizeof(double))
 try:
     # rest of function
 finally:
     free(work)

You need to cimport malloc and free from libc.stdlib. The try: ... finally: ensures the memory is correctly deallocated. Don't go too over the top with this - for example, if it isn't obvious where to deallocate a C array then just use numpy.

The final option to look at is to not have a return value but to modify an input:

cdef void inv_c(double[:,::1] A, double[:,::1] B):
    # check that A and B are the right size, then just write into B
    # ...

The advantage of this is that if you need to call this in a loop with identical sized inputs then you only need to do one allocation for the whole loop. You could extend this to include the working arrays too, although that might be a little more complicated.

Cython: when using typed memoryviews, are Cython users supposed to implement their own library of vector functions?

If your code spends a lot of time in linear algebra functions or vectorized operations, it's probably not a very good candidate for Cythonizing.

Dot products between numpy arrays are usually performed using BLAS library calls which are already very highly optimized, and you are pretty much guaranteed to do worse if you try and reinvent the wheel in Cython*. Similarly, basic vectorized operations such as adding two vectors are already pretty efficient for numpy arrays, although in certain circumstances it might be possible to do a bit better in Cython (e.g. by exploiting parallelization, or by avoiding intermediate array allocation).

Cython is most useful for speeding up operations that can't be easily vectorized, where you would otherwise have to resort to Python for loops etc. The best approach is usually to identify and Cythonize just these bottlenecks, rather than attempting to re-write everything in Cython.

*Having said that, it is possible to call a BLAS or LAPACK function directly on a typed memoryview by passing a pointer to the first element in the array (see here and here for some examples).

Calling Dot Products and Linear Algebra Operations in Cython