Reconstruct Symmetric Matrix from Values in Long-Form

Reconstruct symmetric matrix from values in long-form

An igraph solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix

dat <- read.table(header=T, text=" one   two   value
  a     b     30
  a     c     40
  a     d     20
  b     c     10
  b     d     05
  c     d     30")

library(igraph)

# Make undirected so that graph matrix will be symmetric
g <- graph.data.frame(dat, directed=FALSE)

# add value as a weight attribute
get.adjacency(g, attr="value", sparse=FALSE)
#   a  b  c  d
#a  0 30 40 20
#b 30  0 10  5
#c 40 10  0 30
#d 20  5 30  0

Net changes in network using dplyr

Using dplyr :

mutate(data,new_value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)

Using data table:

setDT(data)
data[,new_value:=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2]]$value,0)})-value]

If you want to remove the previous values and have a final result:

mutate(data,value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)[,c(3,2,1)]

Create square matrix of pairwise values from a dataframe in R

I think you can try the code like below using xtabs

xtabs(Mean_Market_Fare~.,df)

such that

> xtabs(Mean_Market_Fare~.,df)
               State_2
State_1          Alabama  Arizona Arkansas California Colorado Connecticut Wisconsin  Wyoming
  Alabama       263.3752 320.5036 288.9775   352.6983 282.6864    266.9601    0.0000   0.0000
  Washington      0.0000   0.0000   0.0000     0.0000   0.0000      0.0000    0.0000 286.9314
  West Virginia   0.0000   0.0000   0.0000     0.0000   0.0000      0.0000  302.7769 493.2000
  Wisconsin       0.0000   0.0000   0.0000     0.0000   0.0000      0.0000  251.3333 285.3015
  Wyoming         0.0000   0.0000   0.0000     0.0000   0.0000      0.0000    0.0000 275.9800

DATA

df <- structure(list(State_1 = c("Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama", "Washington", "West Virginia", "West Virginia", 
"Wisconsin", "Wisconsin", "Wyoming"), State_2 = c("Alabama", 
"Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Wyoming", "Wisconsin", "Wyoming", "Wisconsin", "Wyoming", "Wyoming"
), Mean_Market_Fare = c(263.3752, 320.5036, 288.9775, 352.6983, 
282.6864, 266.9601, 286.9314, 302.7769, 493.2, 251.3333, 285.3015, 
275.98)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12"))

R matrix NA remove from cells

Dataframes are by definition rectangular data forms, and so you can't remove individual cells and still keep at as a dataframe. In the comments you indicate that moving the NAs to the end of the row would be an acceptable solution.

You can move the non-NA values to the start of each row with the following code:

df <- data.frame(V1=c(1, NA, 2.1, 3.4), V2=c(2,1.1,1,NA), V3=c(NA,NA,NA,5))

df[] <- t(apply(df,1,function(x) c(x[!is.na(x)],x[is.na(x)])))

#Show output
df
#    V1 V2 V3
# 1 1.0  2 NA
# 2 1.1 NA NA
# 3 2.1  1 NA
# 4 3.4  5 NA

Numpy ‘smart’ symmetric matrix

If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:

def symmetrize(a):
    """
    Return a symmetrized version of NumPy array a.

    Values 0 are replaced by the array value at the symmetric
    position (with respect to the diagonal), i.e. if a_ij = 0,
    then the returned array a' is such that a'_ij = a_ji.

    Diagonal values are left untouched.

    a -- square NumPy array, such that a_ij = 0 or a_ji = 0, 
    for i != j.
    """
    return a + a.T - numpy.diag(a.diagonal())

This works under reasonable assumptions (such as not doing both a[0, 1] = 42 and the contradictory a[1, 0] = 123 before running symmetrize).

If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__:

class SymNDArray(numpy.ndarray):
    """
    NumPy array subclass for symmetric matrices.

    A SymNDArray arr is such that doing arr[i,j] = value
    automatically does arr[j,i] = value, so that array
    updates remain symmetrical.
    """

    def __setitem__(self, (i, j), value):
        super(SymNDArray, self).__setitem__((i, j), value)                    
        super(SymNDArray, self).__setitem__((j, i), value)                    

def symarray(input_array):
    """
    Return a symmetrized version of the array-like input_array.

    The returned array has class SymNDArray. Further assignments to the array
    are thus automatically symmetrized.
    """
    return symmetrize(numpy.asarray(input_array)).view(SymNDArray)

# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a  # a[1, 0] == 42 too!

(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1, which correctly sets a[1, :] elements.

Note that Python 3 removed the possibility of writing def …(…, (i, j),…), so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes…

transform the upper/lower triangular part of a symmetric matrix (2D array) into a 1D array and return it to the 2D format

The fastest and smartest way to put back a vector into a 2D symmetric array is to do this:

Case 1: No offset (k=0) i.e. upper triangle part includes the diagonal

import numpy as np

X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
#       [4, 5, 6],
#       [7, 8, 9]])

#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 0)]
print(v)
# [1 2 3 5 6 9]

# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 0)] = v
X = X + X.T - np.diag(np.diag(X))
#array([[1., 2., 3.],
#       [2., 5., 6.],
#       [3., 6., 9.]])

The above will work fine even if instead of numpy.array you use numpy.matrix.

Case 2: With offset (k=1) i.e. upper triangle part does NOT include the diagonal

import numpy as np

X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
#       [4, 5, 6],
#       [7, 8, 9]])

#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 1)] # offset
print(v)
# [2 3 6]

# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 1)] = v
X = X + X.T
#array([[0., 2., 3.],
#       [2., 0., 6.],
#       [3., 6., 0.]])

How to store data of a symmetric matrix table?

The most compact representation of all, which I don't recommend unless you really are short of space, is that of a triangular matrix in a single array.

function triMatrix(n) { // not really needed
    return new Array(n * (n + 1) / 2);
}

function trindex(row, col) {
    if (col > row) {
        var tmp = row; row = col; col = tmp;
    }
    return row * (row + 1) / 2 + col;
}

function triStore(tri, indexOfKey, rowKey, colKey, value) {
    tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])] = value;
}

function triGet(tri, indexOfKey, rowKey, colKey) {
    return tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])];
}

const keyOfIndex = ['Y', 'R', 'W'];
const indexOfKey = {'Y': 0, 'R': 1, 'W': 2}; // can be calculated
const N = keyOfIndex.length;
var tri = triMatrix(N); // could also be var tri = [];
triStore(tri, indexOfKey, 'Y', 'Y', 'Y');
triStore(tri, indexOfKey, 'Y', 'R', 'Y');
triStore(tri, indexOfKey, 'Y', 'W', 'Y');
triStore(tri, indexOfKey, 'R', 'R', 'R');
triStore(tri, indexOfKey, 'R', 'W', 'P');
triStore(tri, indexOfKey, 'W', 'W', 'W');
tri; // => [ "Y", "Y", "R", "Y", "P", "W" ]
triGet(tri, indexOfKey, 'R', 'W'); // => "P"
triGet(tri, indexOfKey, 'W', 'R'); // => "P"

The point is: your matrix is symmetric, so you only need either its upper or its lower triangular matrix (including the diagonal). In your proposal you store the upper triangular matrix, in mine I store the lower one because index calculation is much simpler. The array will contain the 1st row of 1 element, the 2nd of 2, the 3rd of 3, etc. Just remember that 1+2+...+n=n(n+1)/2 and you'll understand how the array index is calculated.

M₀₀ M₀₁ M₀₂      M₀₀              T₀
M₁₀ M₁₁ M₁₂  =>  M₁₀ M₁₁      =>  T₁  T₂      =>  T₀ T₁ T₂ T₃ T₄ T₅
M₂₀ M₂₁ M₂₂      M₂₀ M₂₁ M₂₂      T₃  T₄  T₅

The matrix is easily extended by 1 row/column, no reindexing of the array is necessary:

M₀₀ M₀₁ M₀₂ M₀₃      M₀₀                  T₀
M₁₀ M₁₁ M₁₂ M₁₃      M₁₀ M₁₁              T₁  T₂
M₂₀ M₂₁ M₂₂ M₂₃  =>  M₂₀ M₂₁ M₂₂      =>  T₃  T₄  T₅      => T₀ ... T₆ T₇ T₈ T₉
M₃₀ M₃₁ M₃₂ M₃₃      M₃₀ M₃₁ M₃₂ M₃₃      T₆  T₇  T₈  T₉

As an excercise I loosely translated the above to PHP, which is your target language. Even though I started by not recommending the “triangular matrix in a single array” approach, you are welcome to use the following class as a black box, if it matches your needs (and if it doesn't, maybe I can help).

class SymmetricMatrix
{
    private $n = 0;
    private $triangular = [];
    private $key_of_index = [];
    private $index_of_key = [];

    private function add_key_if_necessary($key) {
        if ( !isset($this->index_of_key[$key])) {
            $index = $this->n++;
            $this->index_of_key[$key] = $index;
            $this->key_of_index[$index] = $key;
            for ($i = 0; $i < $this->n; $i++) {
                $this->triangular[] = false; // avoid "jumping" index & init to "absent"
            }
        }
    }

    private static function trindex($row, $col) {
        if ($col > $row) {
            $tmp = $row; $row = $col; $col = $tmp;
        }
        return $row * ($row + 1) / 2 + $col;
    }

    public function put($key1, $key2, $value) {
        $this->add_key_if_necessary($key1);
        $this->add_key_if_necessary($key2);
        $trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
        $this->triangular[$trindex] = $value;
    }

    public function get($key1, $key2) {
        if (!isset($this->index_of_key[$key1]) || !isset($this->index_of_key[$key2])) {
            return false;
        }
        $trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
        return $this->triangular[$trindex];
    }

    public function find_first($value) { // $value !== false
        for ($row = 0; $row < $this->n; $row++) {
            for ($col = 0; $col <= $row; $col++) {
                $trindex = trindex($row, $col);
                if ($this->triangular[$trindex] === $value) {
                    return [$this->key_of_index[$row], $this->key_of_index[$col]];
                }
            }
        }
        return false;
    }

    public function get_keys() {
        return $this->key_of_index;
    }

    public function dump() {
        var_export($this);
        echo "\n";
    }
}

$m = new SymmetricMatrix();
$m->put('Y', 'Y', 'Y');
$m->put('Y', 'R', 'Y');
$m->put('Y', 'W', 'Y');
$m->put('R', 'R', 'R');
$m->put('R', 'W', 'P');
$m->put('W', 'W', 'W');
$m->dump();
echo "keys: ", implode(', ', $m->get_keys()), "\n";
echo "m[R][W]: ", $m->get('R', 'W'), "\n";
echo "m[W][R]: ", $m->get('W', 'R'), "\n";

Symmetric matrix checking: how does tolerance work?

@BenBolker told you to look at the help page for isSymmetric which directs you to all.equal. You built this matrix:

> mat1
            (Intercept)     outcome2     outcome3   treatment2   treatment3
(Intercept)         150 4.000000e+01 4.700000e+01 5.000000e+01 5.000000e+01
outcome2             40 4.000000e+01 2.237910e-14 1.333333e+01 1.333333e+01
outcome3             47 2.476834e-14 4.700000e+01 1.566667e+01 1.566667e+01
treatment2           50 1.333333e+01 1.566667e+01 5.000000e+01 2.710506e-14
treatment3           50 1.333333e+01 1.566667e+01 2.818926e-14 5.000000e+01

The test in all.equal is tolerance = .Machine$double.eps ^ 0.5, so none of your tests were actually the same as the one without an argument. (In the case of very small numbers the sqrt is actually quite a bit bigger.) Notice there is an additional test regarding equality of row and column names which your example would have satisfied.

If you look at the help page you should become suspicious about your understanding of what all.equal might be doing when yious see sthat is is testing ‘near equality’ and then refers you to the Details where it says:

 Numerical comparisons for scale = NULL (the default) are done by first computing the mean 
 absolute difference of the two numerical vectors.

In the code you can see that it is not individual differences being tested relative to the tolerance but the mean absolute differences.

How to obtain symmetrical matrix from dictionary in Python

One option is to reconstruct the dictionary in full matrix format and then pivot it with pandas:

import pandas as pd
mydict={('A', 'E'): 23972,
 ('A', 'D'): 10730,
 ('A', 'B'): 14748,
 ('A', 'C'): 3424,
 ('E', 'D'): 3294,
 ('E', 'B'): 16016,
 ('E', 'C'): 3373,
 ('D', 'B'): 69734,
 ('D', 'C'): 4662,
 ('B', 'C'): 159161}
 
 
# construct the full dictionary
newdict = {}

for (k1, k2), v in mydict.items():
    newdict[k1, k2] = v
    newdict[k2, k1] = v
    newdict[k1, k1] = 0
    newdict[k2, k2] = 0

# pivot the result from long to wide
pd.Series(newdict).reset_index().pivot(index='level_0', columns='level_1', values=0)

#level_1      A       B       C      D      E
#level_0                                     
#A            0   14748    3424  10730  23972
#B        14748       0  159161  69734  16016
#C         3424  159161       0   4662   3373
#D        10730   69734    4662      0   3294
#E        23972   16016    3373   3294      0

Or as commented by @Ch3steR, you can also just do pd.Series(newdict).unstack() for the pivot.

Demo link

Reconstruct Symmetric Matrix from Values in Long-Form