Reconstruct Symmetric Matrix from Values in Long-Form

Reconstruct symmetric matrix from values in long-form

An igraph solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix

dat <- read.table(header=T, text=" one   two   value
a b 30
a c 40
a d 20
b c 10
b d 05
c d 30")

library(igraph)

# Make undirected so that graph matrix will be symmetric
g <- graph.data.frame(dat, directed=FALSE)

# add value as a weight attribute
get.adjacency(g, attr="value", sparse=FALSE)
# a b c d
#a 0 30 40 20
#b 30 0 10 5
#c 40 10 0 30
#d 20 5 30 0

Net changes in network using dplyr

Using dplyr :

mutate(data,new_value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)

Using data table:

setDT(data)
data[,new_value:=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2]]$value,0)})-value]

If you want to remove the previous values and have a final result:

mutate(data,value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)[,c(3,2,1)]

Create square matrix of pairwise values from a dataframe in R

I think you can try the code like below using xtabs

xtabs(Mean_Market_Fare~.,df)

such that

> xtabs(Mean_Market_Fare~.,df)
State_2
State_1 Alabama Arizona Arkansas California Colorado Connecticut Wisconsin Wyoming
Alabama 263.3752 320.5036 288.9775 352.6983 282.6864 266.9601 0.0000 0.0000
Washington 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 286.9314
West Virginia 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 302.7769 493.2000
Wisconsin 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 251.3333 285.3015
Wyoming 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 275.9800

DATA

df <- structure(list(State_1 = c("Alabama", "Alabama", "Alabama", "Alabama", 
"Alabama", "Alabama", "Washington", "West Virginia", "West Virginia",
"Wisconsin", "Wisconsin", "Wyoming"), State_2 = c("Alabama",
"Arizona", "Arkansas", "California", "Colorado", "Connecticut",
"Wyoming", "Wisconsin", "Wyoming", "Wisconsin", "Wyoming", "Wyoming"
), Mean_Market_Fare = c(263.3752, 320.5036, 288.9775, 352.6983,
282.6864, 266.9601, 286.9314, 302.7769, 493.2, 251.3333, 285.3015,
275.98)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12"))

R matrix NA remove from cells

Dataframes are by definition rectangular data forms, and so you can't remove individual cells and still keep at as a dataframe. In the comments you indicate that moving the NAs to the end of the row would be an acceptable solution.

You can move the non-NA values to the start of each row with the following code:

df <- data.frame(V1=c(1, NA, 2.1, 3.4), V2=c(2,1.1,1,NA), V3=c(NA,NA,NA,5))

df[] <- t(apply(df,1,function(x) c(x[!is.na(x)],x[is.na(x)])))

#Show output
df
# V1 V2 V3
# 1 1.0 2 NA
# 2 1.1 NA NA
# 3 2.1 1 NA
# 4 3.4 5 NA

Numpy ‘smart’ symmetric matrix

If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:

def symmetrize(a):
"""
Return a symmetrized version of NumPy array a.

Values 0 are replaced by the array value at the symmetric
position (with respect to the diagonal), i.e. if a_ij = 0,
then the returned array a' is such that a'_ij = a_ji.

Diagonal values are left untouched.

a -- square NumPy array, such that a_ij = 0 or a_ji = 0,
for i != j.
"""
return a + a.T - numpy.diag(a.diagonal())

This works under reasonable assumptions (such as not doing both a[0, 1] = 42 and the contradictory a[1, 0] = 123 before running symmetrize).

If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__:

class SymNDArray(numpy.ndarray):
"""
NumPy array subclass for symmetric matrices.

A SymNDArray arr is such that doing arr[i,j] = value
automatically does arr[j,i] = value, so that array
updates remain symmetrical.
"""

def __setitem__(self, (i, j), value):
super(SymNDArray, self).__setitem__((i, j), value)
super(SymNDArray, self).__setitem__((j, i), value)

def symarray(input_array):
"""
Return a symmetrized version of the array-like input_array.

The returned array has class SymNDArray. Further assignments to the array
are thus automatically symmetrized.
"""
return symmetrize(numpy.asarray(input_array)).view(SymNDArray)

# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a # a[1, 0] == 42 too!

(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1, which correctly sets a[1, :] elements.

Note that Python 3 removed the possibility of writing def …(…, (i, j),…), so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes

transform the upper/lower triangular part of a symmetric matrix (2D array) into a 1D array and return it to the 2D format

The fastest and smartest way to put back a vector into a 2D symmetric array is to do this:


Case 1: No offset (k=0) i.e. upper triangle part includes the diagonal

import numpy as np

X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])

#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 0)]
print(v)
# [1 2 3 5 6 9]

# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 0)] = v
X = X + X.T - np.diag(np.diag(X))
#array([[1., 2., 3.],
# [2., 5., 6.],
# [3., 6., 9.]])

The above will work fine even if instead of numpy.array you use numpy.matrix.


Case 2: With offset (k=1) i.e. upper triangle part does NOT include the diagonal

import numpy as np

X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])

#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 1)] # offset
print(v)
# [2 3 6]

# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 1)] = v
X = X + X.T
#array([[0., 2., 3.],
# [2., 0., 6.],
# [3., 6., 0.]])

How to store data of a symmetric matrix table?

The most compact representation of all, which I don't recommend unless you really are short of space, is that of a triangular matrix in a single array.

function triMatrix(n) { // not really needed
return new Array(n * (n + 1) / 2);
}

function trindex(row, col) {
if (col > row) {
var tmp = row; row = col; col = tmp;
}
return row * (row + 1) / 2 + col;
}

function triStore(tri, indexOfKey, rowKey, colKey, value) {
tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])] = value;
}

function triGet(tri, indexOfKey, rowKey, colKey) {
return tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])];
}

const keyOfIndex = ['Y', 'R', 'W'];
const indexOfKey = {'Y': 0, 'R': 1, 'W': 2}; // can be calculated
const N = keyOfIndex.length;
var tri = triMatrix(N); // could also be var tri = [];
triStore(tri, indexOfKey, 'Y', 'Y', 'Y');
triStore(tri, indexOfKey, 'Y', 'R', 'Y');
triStore(tri, indexOfKey, 'Y', 'W', 'Y');
triStore(tri, indexOfKey, 'R', 'R', 'R');
triStore(tri, indexOfKey, 'R', 'W', 'P');
triStore(tri, indexOfKey, 'W', 'W', 'W');
tri; // => [ "Y", "Y", "R", "Y", "P", "W" ]
triGet(tri, indexOfKey, 'R', 'W'); // => "P"
triGet(tri, indexOfKey, 'W', 'R'); // => "P"

The point is: your matrix is symmetric, so you only need either its upper or its lower triangular matrix (including the diagonal). In your proposal you store the upper triangular matrix, in mine I store the lower one because index calculation is much simpler. The array will contain the 1st row of 1 element, the 2nd of 2, the 3rd of 3, etc. Just remember that 1+2+...+n=n(n+1)/2 and you'll understand how the array index is calculated.

M₀₀ M₀₁ M₀₂      M₀₀              T₀
M₁₀ M₁₁ M₁₂ => M₁₀ M₁₁ => T₁ T₂ => T₀ T₁ T₂ T₃ T₄ T₅
M₂₀ M₂₁ M₂₂ M₂₀ M₂₁ M₂₂ T₃ T₄ T₅

The matrix is easily extended by 1 row/column, no reindexing of the array is necessary:

M₀₀ M₀₁ M₀₂ M₀₃      M₀₀                  T₀
M₁₀ M₁₁ M₁₂ M₁₃ M₁₀ M₁₁ T₁ T₂
M₂₀ M₂₁ M₂₂ M₂₃ => M₂₀ M₂₁ M₂₂ => T₃ T₄ T₅ => T₀ ... T₆ T₇ T₈ T₉
M₃₀ M₃₁ M₃₂ M₃₃ M₃₀ M₃₁ M₃₂ M₃₃ T₆ T₇ T₈ T₉



As an excercise I loosely translated the above to PHP, which is your target language. Even though I started by not recommending the “triangular matrix in a single array” approach, you are welcome to use the following class as a black box, if it matches your needs (and if it doesn't, maybe I can help).

class SymmetricMatrix
{
private $n = 0;
private $triangular = [];
private $key_of_index = [];
private $index_of_key = [];

private function add_key_if_necessary($key) {
if ( !isset($this->index_of_key[$key])) {
$index = $this->n++;
$this->index_of_key[$key] = $index;
$this->key_of_index[$index] = $key;
for ($i = 0; $i < $this->n; $i++) {
$this->triangular[] = false; // avoid "jumping" index & init to "absent"
}
}
}

private static function trindex($row, $col) {
if ($col > $row) {
$tmp = $row; $row = $col; $col = $tmp;
}
return $row * ($row + 1) / 2 + $col;
}

public function put($key1, $key2, $value) {
$this->add_key_if_necessary($key1);
$this->add_key_if_necessary($key2);
$trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
$this->triangular[$trindex] = $value;
}

public function get($key1, $key2) {
if (!isset($this->index_of_key[$key1]) || !isset($this->index_of_key[$key2])) {
return false;
}
$trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
return $this->triangular[$trindex];
}

public function find_first($value) { // $value !== false
for ($row = 0; $row < $this->n; $row++) {
for ($col = 0; $col <= $row; $col++) {
$trindex = trindex($row, $col);
if ($this->triangular[$trindex] === $value) {
return [$this->key_of_index[$row], $this->key_of_index[$col]];
}
}
}
return false;
}

public function get_keys() {
return $this->key_of_index;
}

public function dump() {
var_export($this);
echo "\n";
}
}

$m = new SymmetricMatrix();
$m->put('Y', 'Y', 'Y');
$m->put('Y', 'R', 'Y');
$m->put('Y', 'W', 'Y');
$m->put('R', 'R', 'R');
$m->put('R', 'W', 'P');
$m->put('W', 'W', 'W');
$m->dump();
echo "keys: ", implode(', ', $m->get_keys()), "\n";
echo "m[R][W]: ", $m->get('R', 'W'), "\n";
echo "m[W][R]: ", $m->get('W', 'R'), "\n";

Symmetric matrix checking: how does tolerance work?

@BenBolker told you to look at the help page for isSymmetric which directs you to all.equal. You built this matrix:

> mat1
(Intercept) outcome2 outcome3 treatment2 treatment3
(Intercept) 150 4.000000e+01 4.700000e+01 5.000000e+01 5.000000e+01
outcome2 40 4.000000e+01 2.237910e-14 1.333333e+01 1.333333e+01
outcome3 47 2.476834e-14 4.700000e+01 1.566667e+01 1.566667e+01
treatment2 50 1.333333e+01 1.566667e+01 5.000000e+01 2.710506e-14
treatment3 50 1.333333e+01 1.566667e+01 2.818926e-14 5.000000e+01

The test in all.equal is tolerance = .Machine$double.eps ^ 0.5, so none of your tests were actually the same as the one without an argument. (In the case of very small numbers the sqrt is actually quite a bit bigger.) Notice there is an additional test regarding equality of row and column names which your example would have satisfied.

If you look at the help page you should become suspicious about your understanding of what all.equal might be doing when yious see sthat is is testing ‘near equality’ and then refers you to the Details where it says:

 Numerical comparisons for scale = NULL (the default) are done by first computing the mean 
absolute difference of the two numerical vectors.

In the code you can see that it is not individual differences being tested relative to the tolerance but the mean absolute differences.

How to obtain symmetrical matrix from dictionary in Python

One option is to reconstruct the dictionary in full matrix format and then pivot it with pandas:

import pandas as pd
mydict={('A', 'E'): 23972,
('A', 'D'): 10730,
('A', 'B'): 14748,
('A', 'C'): 3424,
('E', 'D'): 3294,
('E', 'B'): 16016,
('E', 'C'): 3373,
('D', 'B'): 69734,
('D', 'C'): 4662,
('B', 'C'): 159161}


# construct the full dictionary
newdict = {}

for (k1, k2), v in mydict.items():
newdict[k1, k2] = v
newdict[k2, k1] = v
newdict[k1, k1] = 0
newdict[k2, k2] = 0

# pivot the result from long to wide
pd.Series(newdict).reset_index().pivot(index='level_0', columns='level_1', values=0)

#level_1 A B C D E
#level_0
#A 0 14748 3424 10730 23972
#B 14748 0 159161 69734 16016
#C 3424 159161 0 4662 3373
#D 10730 69734 4662 0 3294
#E 23972 16016 3373 3294 0

Or as commented by @Ch3steR, you can also just do pd.Series(newdict).unstack() for the pivot.

Demo link



Related Topics



Leave a reply



Submit