Numpy 'Smart' Symmetric Matrix

Numpy ‘smart’ symmetric matrix

If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:

def symmetrize(a):
    """
    Return a symmetrized version of NumPy array a.

    Values 0 are replaced by the array value at the symmetric
    position (with respect to the diagonal), i.e. if a_ij = 0,
    then the returned array a' is such that a'_ij = a_ji.

    Diagonal values are left untouched.

    a -- square NumPy array, such that a_ij = 0 or a_ji = 0, 
    for i != j.
    """
    return a + a.T - numpy.diag(a.diagonal())

This works under reasonable assumptions (such as not doing both a[0, 1] = 42 and the contradictory a[1, 0] = 123 before running symmetrize).

If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__:

class SymNDArray(numpy.ndarray):
    """
    NumPy array subclass for symmetric matrices.

    A SymNDArray arr is such that doing arr[i,j] = value
    automatically does arr[j,i] = value, so that array
    updates remain symmetrical.
    """

    def __setitem__(self, (i, j), value):
        super(SymNDArray, self).__setitem__((i, j), value)                    
        super(SymNDArray, self).__setitem__((j, i), value)                    

def symarray(input_array):
    """
    Return a symmetrized version of the array-like input_array.

    The returned array has class SymNDArray. Further assignments to the array
    are thus automatically symmetrized.
    """
    return symmetrize(numpy.asarray(input_array)).view(SymNDArray)

# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a  # a[1, 0] == 42 too!

(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1, which correctly sets a[1, :] elements.

Note that Python 3 removed the possibility of writing def …(…, (i, j),…), so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes…

Checking if a matrix is symmetric in Numpy

You can simply compare it to its transpose using allclose

def check_symmetric(a, rtol=1e-05, atol=1e-08):
    return numpy.allclose(a, a.T, rtol=rtol, atol=atol)

Making a numpy ndarray matrix symmetric

Found a following solution which works for me:

import numpy as np
W = np.maximum( A, A.transpose() )

Numpy dot too clever about symmetric multiplications

This behaviour is the result of a change introduced for NumPy 1.11.0, in pull request #6932. From the release notes for 1.11.0:

Previously, gemm BLAS operations were used for all matrix products.
Now, if the matrix product is between a matrix and its transpose, it
will use syrk BLAS operations for a performance boost. This
optimization has been extended to @, numpy.dot, numpy.inner, and
numpy.matmul.

In the changes for that PR, one finds this comment:

/*
 * Use syrk if we have a case of a matrix times its transpose.
 * Otherwise, use gemm for all other cases.
 */

So NumPy is making an explicit check for the case of a matrix times its transpose, and calling a different underlying BLAS function in that case. As @hpaulj notes in a comment, such a check is cheap for NumPy, since a transposed 2d array is simply a view on the original array, with inverted shape and strides, so it suffices to check a few pieces of metadata on the arrays (rather than having to compare the actual array data).

Here's a slightly simpler case that shows the discrepancy. Note that using a .copy on one of the arguments to dot is enough to defeat NumPy's special-casing.

import numpy as np
random = np.random.RandomState(12345)
A = random.uniform(size=(10, 5))
Sym1 = A.dot(A.T)
Sym2 = A.dot(A.T.copy())
print(abs(Sym1 - Sym2).max())

I guess one advantage of this special-casing, beyond the obvious potential for speed-up, is that you're guaranteed (I'd hope, but in practice it'll depend on the BLAS implementation) to get a perfectly symmetric result when syrk is used, rather than a matrix which is merely symmetric up to numerical error. As an (admittedly not very good) test for this, I tried:

import numpy as np
random = np.random.RandomState(12345)
A = random.uniform(size=(100, 50))
Sym1 = A.dot(A.T)
Sym2 = A.dot(A.T.copy())
print("Sym1 symmetric: ", (Sym1 == Sym1.T).all())
print("Sym2 symmetric: ", (Sym2 == Sym2.T).all())

Results on my machine:

Sym1 symmetric:  True
Sym2 symmetric:  False

Symmetric matrices in numpy?

I don't think it's feasible to try work with that kind of triangular arrays.

So here is for example a straightforward implementation of (squared) pairwise Euclidean distances:

def pdista(X):
    """Squared pairwise distances between all columns of X."""
    B= np.dot(X.T, X)
    q= np.diag(B)[:, None]
    return q+ q.T- 2* B

For performance wise it's hard to beat it (in Python level). What would be the main advantage of not using this approach?

Numpy symmetric matrix becomes asymmetric when I applied min-max scaling

Looking at this description the minmaxscaler appears to work column-by-column, so, naturally, you can't expect it to preserve symmetry.

What's best to do in your case depends a bit on what you are trying to achieve, really. If having the values between 0 and 1 is all you require you can rescale by hand:

 mn, mx = dist.min(), dist.max()
 dist01 = (dist - mn) / (mx - mn)

but depending on your ultimate problem this may be too simplistic...

Efficient Way to Permutate a Symmetric Square Matrix in Numpy

In [703]: N=10000
In [704]: a=np.arange(N*N).reshape(N,N);a=np.maximum(a, a.T)
In [705]: perm=np.random.permutation(N)

One indexing step is quite a bit faster:

In [706]: timeit a[perm[:,None],perm]   # same as `np.ix_...`
1 loop, best of 3: 1.88 s per loop

In [707]: timeit a[perm,:][:,perm]
1 loop, best of 3: 8.88 s per loop

In [708]: timeit np.take(np.take(a,perm,0),perm,1)
1 loop, best of 3: 1.41 s per loop

a[perm,perm[:,None]] is in the 8s category.

Creating symmetric matrix indexes-values in Python

The CSR presentation of the input here is highly convenient. As you consider each row, you of course learn about a column of the symmetric matrix. When you reach each row, you already know the contents of all the columns omitted from its upper-triangular form. You even learn about those values in the order they appear in that row!

It’s then just a simple matter of programming:

def sym(up):        # alters 'up' in place
  pfx=[([],[]) for _ in up]  # to be added to each row
  for r,((cc,vv),(pc,pv)) in enumerate(zip(up,pfx)):
    for c,v in zip(cc,vv):
      if c>r:       # store off-diagonal for later row
        cr,cv=pfx[c]
        cr.append(r); cv.append(v)
    cc[:0]=pc; vv[:0]=pv     # prepend to preserve order

Numpy 'Smart' Symmetric Matrix