Numpy ‘smart’ symmetric matrix
If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:
def symmetrize(a):
"""
Return a symmetrized version of NumPy array a.
Values 0 are replaced by the array value at the symmetric
position (with respect to the diagonal), i.e. if a_ij = 0,
then the returned array a' is such that a'_ij = a_ji.
Diagonal values are left untouched.
a -- square NumPy array, such that a_ij = 0 or a_ji = 0,
for i != j.
"""
return a + a.T - numpy.diag(a.diagonal())
This works under reasonable assumptions (such as not doing both a[0, 1] = 42
and the contradictory a[1, 0] = 123
before running symmetrize
).
If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__
:
class SymNDArray(numpy.ndarray):
"""
NumPy array subclass for symmetric matrices.
A SymNDArray arr is such that doing arr[i,j] = value
automatically does arr[j,i] = value, so that array
updates remain symmetrical.
"""
def __setitem__(self, (i, j), value):
super(SymNDArray, self).__setitem__((i, j), value)
super(SymNDArray, self).__setitem__((j, i), value)
def symarray(input_array):
"""
Return a symmetrized version of the array-like input_array.
The returned array has class SymNDArray. Further assignments to the array
are thus automatically symmetrized.
"""
return symmetrize(numpy.asarray(input_array)).view(SymNDArray)
# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a # a[1, 0] == 42 too!
(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1
, which correctly sets a[1, :]
elements.
Note that Python 3 removed the possibility of writing def …(…, (i, j),…)
, so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes
…
Checking if a matrix is symmetric in Numpy
You can simply compare it to its transpose using allclose
def check_symmetric(a, rtol=1e-05, atol=1e-08):
return numpy.allclose(a, a.T, rtol=rtol, atol=atol)
Making a numpy ndarray matrix symmetric
Found a following solution which works for me:
import numpy as np
W = np.maximum( A, A.transpose() )
Numpy dot too clever about symmetric multiplications
This behaviour is the result of a change introduced for NumPy 1.11.0, in pull request #6932. From the release notes for 1.11.0:
Previously, gemm BLAS operations were used for all matrix products.
Now, if the matrix product is between a matrix and its transpose, it
will use syrk BLAS operations for a performance boost. This
optimization has been extended to @, numpy.dot, numpy.inner, and
numpy.matmul.
In the changes for that PR, one finds this comment:
/*
* Use syrk if we have a case of a matrix times its transpose.
* Otherwise, use gemm for all other cases.
*/
So NumPy is making an explicit check for the case of a matrix times its transpose, and calling a different underlying BLAS function in that case. As @hpaulj notes in a comment, such a check is cheap for NumPy, since a transposed 2d array is simply a view on the original array, with inverted shape and strides, so it suffices to check a few pieces of metadata on the arrays (rather than having to compare the actual array data).
Here's a slightly simpler case that shows the discrepancy. Note that using a .copy
on one of the arguments to dot
is enough to defeat NumPy's special-casing.
import numpy as np
random = np.random.RandomState(12345)
A = random.uniform(size=(10, 5))
Sym1 = A.dot(A.T)
Sym2 = A.dot(A.T.copy())
print(abs(Sym1 - Sym2).max())
I guess one advantage of this special-casing, beyond the obvious potential for speed-up, is that you're guaranteed (I'd hope, but in practice it'll depend on the BLAS implementation) to get a perfectly symmetric result when syrk
is used, rather than a matrix which is merely symmetric up to numerical error. As an (admittedly not very good) test for this, I tried:
import numpy as np
random = np.random.RandomState(12345)
A = random.uniform(size=(100, 50))
Sym1 = A.dot(A.T)
Sym2 = A.dot(A.T.copy())
print("Sym1 symmetric: ", (Sym1 == Sym1.T).all())
print("Sym2 symmetric: ", (Sym2 == Sym2.T).all())
Results on my machine:
Sym1 symmetric: True
Sym2 symmetric: False
Symmetric matrices in numpy?
I don't think it's feasible to try work with that kind of triangular arrays.
So here is for example a straightforward implementation of (squared) pairwise Euclidean distances:
def pdista(X):
"""Squared pairwise distances between all columns of X."""
B= np.dot(X.T, X)
q= np.diag(B)[:, None]
return q+ q.T- 2* B
For performance wise it's hard to beat it (in Python level). What would be the main advantage of not using this approach?
Numpy symmetric matrix becomes asymmetric when I applied min-max scaling
Looking at this description the minmaxscaler appears to work column-by-column, so, naturally, you can't expect it to preserve symmetry.
What's best to do in your case depends a bit on what you are trying to achieve, really. If having the values between 0 and 1 is all you require you can rescale by hand:
mn, mx = dist.min(), dist.max()
dist01 = (dist - mn) / (mx - mn)
but depending on your ultimate problem this may be too simplistic...
Efficient Way to Permutate a Symmetric Square Matrix in Numpy
In [703]: N=10000
In [704]: a=np.arange(N*N).reshape(N,N);a=np.maximum(a, a.T)
In [705]: perm=np.random.permutation(N)
One indexing step is quite a bit faster:
In [706]: timeit a[perm[:,None],perm] # same as `np.ix_...`
1 loop, best of 3: 1.88 s per loop
In [707]: timeit a[perm,:][:,perm]
1 loop, best of 3: 8.88 s per loop
In [708]: timeit np.take(np.take(a,perm,0),perm,1)
1 loop, best of 3: 1.41 s per loop
a[perm,perm[:,None]]
is in the 8s category.
Creating symmetric matrix indexes-values in Python
The CSR presentation of the input here is highly convenient. As you consider each row, you of course learn about a column of the symmetric matrix. When you reach each row, you already know the contents of all the columns omitted from its upper-triangular form. You even learn about those values in the order they appear in that row!
It’s then just a simple matter of programming:
def sym(up): # alters 'up' in place
pfx=[([],[]) for _ in up] # to be added to each row
for r,((cc,vv),(pc,pv)) in enumerate(zip(up,pfx)):
for c,v in zip(cc,vv):
if c>r: # store off-diagonal for later row
cr,cv=pfx[c]
cr.append(r); cv.append(v)
cc[:0]=pc; vv[:0]=pv # prepend to preserve order
Related Topics
Most Pythonic Way to Interleave Two Strings
Pandas - Convert Strings to Time Without Date
Python (And Python C API): _New_ Versus _Init_
How to Change Sprite Colours in Pygame
Inserting a Table Name into a Query Gives SQLite3.Operationalerror: Near "": Syntax Error
Why Can't Environmental Variables Set in Python Persist
Python Progression Path - from Apprentice to Guru
Reduce Left and Right Margins in Matplotlib Plot
Smtpauthenticationerror When Sending Mail Using Gmail and Python
Splitting on Last Delimiter in Python String
Filter a Pandas Dataframe Using Values from a Dict
Return a Download and Rendered Page in One Flask Response
How to Generate a List of Consecutive Numbers
Set Environment Variable in Python Script
Defining "Boolness" of a Class in Python
Case-Insensitive List Sorting, Without Lowercasing the Result