Reconstruct symmetric matrix from values in long-form
An igraph
solution where you read in the dataframe, with the value assumed as edge weights. You can then convert this to an adjacency matrix
dat <- read.table(header=T, text=" one two value
a b 30
a c 40
a d 20
b c 10
b d 05
c d 30")
library(igraph)
# Make undirected so that graph matrix will be symmetric
g <- graph.data.frame(dat, directed=FALSE)
# add value as a weight attribute
get.adjacency(g, attr="value", sparse=FALSE)
# a b c d
#a 0 30 40 20
#b 30 0 10 5
#c 40 10 0 30
#d 20 5 30 0
Net changes in network using dplyr
Using dplyr :
mutate(data,new_value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)
Using data table:
setDT(data)
data[,new_value:=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2]]$value,0)})-value]
If you want to remove the previous values and have a final result:
mutate(data,value=apply(data,1,function(vec){ max(data[data$source==vec[3] & data$target==vec[2],"value"],0)})-value)[,c(3,2,1)]
Create square matrix of pairwise values from a dataframe in R
I think you can try the code like below using xtabs
xtabs(Mean_Market_Fare~.,df)
such that
> xtabs(Mean_Market_Fare~.,df)
State_2
State_1 Alabama Arizona Arkansas California Colorado Connecticut Wisconsin Wyoming
Alabama 263.3752 320.5036 288.9775 352.6983 282.6864 266.9601 0.0000 0.0000
Washington 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 286.9314
West Virginia 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 302.7769 493.2000
Wisconsin 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 251.3333 285.3015
Wyoming 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 275.9800
DATA
df <- structure(list(State_1 = c("Alabama", "Alabama", "Alabama", "Alabama",
"Alabama", "Alabama", "Washington", "West Virginia", "West Virginia",
"Wisconsin", "Wisconsin", "Wyoming"), State_2 = c("Alabama",
"Arizona", "Arkansas", "California", "Colorado", "Connecticut",
"Wyoming", "Wisconsin", "Wyoming", "Wisconsin", "Wyoming", "Wyoming"
), Mean_Market_Fare = c(263.3752, 320.5036, 288.9775, 352.6983,
282.6864, 266.9601, 286.9314, 302.7769, 493.2, 251.3333, 285.3015,
275.98)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12"))
R matrix NA remove from cells
Dataframes are by definition rectangular data forms, and so you can't remove individual cells and still keep at as a dataframe. In the comments you indicate that moving the NAs to the end of the row would be an acceptable solution.
You can move the non-NA values to the start of each row with the following code:
df <- data.frame(V1=c(1, NA, 2.1, 3.4), V2=c(2,1.1,1,NA), V3=c(NA,NA,NA,5))
df[] <- t(apply(df,1,function(x) c(x[!is.na(x)],x[is.na(x)])))
#Show output
df
# V1 V2 V3
# 1 1.0 2 NA
# 2 1.1 NA NA
# 3 2.1 1 NA
# 4 3.4 5 NA
Numpy ‘smart’ symmetric matrix
If you can afford to symmetrize the matrix just before doing calculations, the following should be reasonably fast:
def symmetrize(a):
"""
Return a symmetrized version of NumPy array a.
Values 0 are replaced by the array value at the symmetric
position (with respect to the diagonal), i.e. if a_ij = 0,
then the returned array a' is such that a'_ij = a_ji.
Diagonal values are left untouched.
a -- square NumPy array, such that a_ij = 0 or a_ji = 0,
for i != j.
"""
return a + a.T - numpy.diag(a.diagonal())
This works under reasonable assumptions (such as not doing both a[0, 1] = 42
and the contradictory a[1, 0] = 123
before running symmetrize
).
If you really need a transparent symmetrization, you might consider subclassing numpy.ndarray and simply redefining __setitem__
:
class SymNDArray(numpy.ndarray):
"""
NumPy array subclass for symmetric matrices.
A SymNDArray arr is such that doing arr[i,j] = value
automatically does arr[j,i] = value, so that array
updates remain symmetrical.
"""
def __setitem__(self, (i, j), value):
super(SymNDArray, self).__setitem__((i, j), value)
super(SymNDArray, self).__setitem__((j, i), value)
def symarray(input_array):
"""
Return a symmetrized version of the array-like input_array.
The returned array has class SymNDArray. Further assignments to the array
are thus automatically symmetrized.
"""
return symmetrize(numpy.asarray(input_array)).view(SymNDArray)
# Example:
a = symarray(numpy.zeros((3, 3)))
a[0, 1] = 42
print a # a[1, 0] == 42 too!
(or the equivalent with matrices instead of arrays, depending on your needs). This approach even handles more complicated assignments, like a[:, 1] = -1
, which correctly sets a[1, :]
elements.
Note that Python 3 removed the possibility of writing def …(…, (i, j),…)
, so the code has to be slightly adapted before running with Python 3: def __setitem__(self, indexes, value): (i, j) = indexes
…
transform the upper/lower triangular part of a symmetric matrix (2D array) into a 1D array and return it to the 2D format
The fastest and smartest way to put back a vector into a 2D symmetric array is to do this:
Case 1: No offset (k=0) i.e. upper triangle part includes the diagonal
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 0)]
print(v)
# [1 2 3 5 6 9]
# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 0)] = v
X = X + X.T - np.diag(np.diag(X))
#array([[1., 2., 3.],
# [2., 5., 6.],
# [3., 6., 9.]])
The above will work fine even if instead of numpy.array
you use numpy.matrix
.
Case 2: With offset (k=1) i.e. upper triangle part does NOT include the diagonal
import numpy as np
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
#array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]])
#get the upper triangular part of this matrix
v = X[np.triu_indices(X.shape[0], k = 1)] # offset
print(v)
# [2 3 6]
# put it back into a 2D symmetric array
size_X = 3
X = np.zeros((size_X,size_X))
X[np.triu_indices(X.shape[0], k = 1)] = v
X = X + X.T
#array([[0., 2., 3.],
# [2., 0., 6.],
# [3., 6., 0.]])
How to store data of a symmetric matrix table?
The most compact representation of all, which I don't recommend unless you really are short of space, is that of a triangular matrix in a single array.
function triMatrix(n) { // not really needed
return new Array(n * (n + 1) / 2);
}
function trindex(row, col) {
if (col > row) {
var tmp = row; row = col; col = tmp;
}
return row * (row + 1) / 2 + col;
}
function triStore(tri, indexOfKey, rowKey, colKey, value) {
tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])] = value;
}
function triGet(tri, indexOfKey, rowKey, colKey) {
return tri[trindex(indexOfKey[rowKey], indexOfKey[colKey])];
}
const keyOfIndex = ['Y', 'R', 'W'];
const indexOfKey = {'Y': 0, 'R': 1, 'W': 2}; // can be calculated
const N = keyOfIndex.length;
var tri = triMatrix(N); // could also be var tri = [];
triStore(tri, indexOfKey, 'Y', 'Y', 'Y');
triStore(tri, indexOfKey, 'Y', 'R', 'Y');
triStore(tri, indexOfKey, 'Y', 'W', 'Y');
triStore(tri, indexOfKey, 'R', 'R', 'R');
triStore(tri, indexOfKey, 'R', 'W', 'P');
triStore(tri, indexOfKey, 'W', 'W', 'W');
tri; // => [ "Y", "Y", "R", "Y", "P", "W" ]
triGet(tri, indexOfKey, 'R', 'W'); // => "P"
triGet(tri, indexOfKey, 'W', 'R'); // => "P"
The point is: your matrix is symmetric, so you only need either its upper or its lower triangular matrix (including the diagonal). In your proposal you store the upper triangular matrix, in mine I store the lower one because index calculation is much simpler. The array will contain the 1st row of 1 element, the 2nd of 2, the 3rd of 3, etc. Just remember that 1+2+...+n=n(n+1)/2 and you'll understand how the array index is calculated.
M₀₀ M₀₁ M₀₂ M₀₀ T₀
M₁₀ M₁₁ M₁₂ => M₁₀ M₁₁ => T₁ T₂ => T₀ T₁ T₂ T₃ T₄ T₅
M₂₀ M₂₁ M₂₂ M₂₀ M₂₁ M₂₂ T₃ T₄ T₅
The matrix is easily extended by 1 row/column, no reindexing of the array is necessary:
M₀₀ M₀₁ M₀₂ M₀₃ M₀₀ T₀
M₁₀ M₁₁ M₁₂ M₁₃ M₁₀ M₁₁ T₁ T₂
M₂₀ M₂₁ M₂₂ M₂₃ => M₂₀ M₂₁ M₂₂ => T₃ T₄ T₅ => T₀ ... T₆ T₇ T₈ T₉
M₃₀ M₃₁ M₃₂ M₃₃ M₃₀ M₃₁ M₃₂ M₃₃ T₆ T₇ T₈ T₉
As an excercise I loosely translated the above to PHP, which is your target language. Even though I started by not recommending the “triangular matrix in a single array” approach, you are welcome to use the following class as a black box, if it matches your needs (and if it doesn't, maybe I can help).
class SymmetricMatrix
{
private $n = 0;
private $triangular = [];
private $key_of_index = [];
private $index_of_key = [];
private function add_key_if_necessary($key) {
if ( !isset($this->index_of_key[$key])) {
$index = $this->n++;
$this->index_of_key[$key] = $index;
$this->key_of_index[$index] = $key;
for ($i = 0; $i < $this->n; $i++) {
$this->triangular[] = false; // avoid "jumping" index & init to "absent"
}
}
}
private static function trindex($row, $col) {
if ($col > $row) {
$tmp = $row; $row = $col; $col = $tmp;
}
return $row * ($row + 1) / 2 + $col;
}
public function put($key1, $key2, $value) {
$this->add_key_if_necessary($key1);
$this->add_key_if_necessary($key2);
$trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
$this->triangular[$trindex] = $value;
}
public function get($key1, $key2) {
if (!isset($this->index_of_key[$key1]) || !isset($this->index_of_key[$key2])) {
return false;
}
$trindex = self::trindex($this->index_of_key[$key1], $this->index_of_key[$key2]);
return $this->triangular[$trindex];
}
public function find_first($value) { // $value !== false
for ($row = 0; $row < $this->n; $row++) {
for ($col = 0; $col <= $row; $col++) {
$trindex = trindex($row, $col);
if ($this->triangular[$trindex] === $value) {
return [$this->key_of_index[$row], $this->key_of_index[$col]];
}
}
}
return false;
}
public function get_keys() {
return $this->key_of_index;
}
public function dump() {
var_export($this);
echo "\n";
}
}
$m = new SymmetricMatrix();
$m->put('Y', 'Y', 'Y');
$m->put('Y', 'R', 'Y');
$m->put('Y', 'W', 'Y');
$m->put('R', 'R', 'R');
$m->put('R', 'W', 'P');
$m->put('W', 'W', 'W');
$m->dump();
echo "keys: ", implode(', ', $m->get_keys()), "\n";
echo "m[R][W]: ", $m->get('R', 'W'), "\n";
echo "m[W][R]: ", $m->get('W', 'R'), "\n";
Symmetric matrix checking: how does tolerance work?
@BenBolker told you to look at the help page for isSymmetric which directs you to all.equal
. You built this matrix:
> mat1
(Intercept) outcome2 outcome3 treatment2 treatment3
(Intercept) 150 4.000000e+01 4.700000e+01 5.000000e+01 5.000000e+01
outcome2 40 4.000000e+01 2.237910e-14 1.333333e+01 1.333333e+01
outcome3 47 2.476834e-14 4.700000e+01 1.566667e+01 1.566667e+01
treatment2 50 1.333333e+01 1.566667e+01 5.000000e+01 2.710506e-14
treatment3 50 1.333333e+01 1.566667e+01 2.818926e-14 5.000000e+01
The test in all.equal
is tolerance = .Machine$double.eps ^ 0.5
, so none of your tests were actually the same as the one without an argument. (In the case of very small numbers the sqrt is actually quite a bit bigger.) Notice there is an additional test regarding equality of row and column names which your example would have satisfied.
If you look at the help page you should become suspicious about your understanding of what all.equal might be doing when yious see sthat is is testing ‘near equality’
and then refers you to the Details where it says:
Numerical comparisons for scale = NULL (the default) are done by first computing the mean
absolute difference of the two numerical vectors.
In the code you can see that it is not individual differences being tested relative to the tolerance but the mean absolute differences.
How to obtain symmetrical matrix from dictionary in Python
One option is to reconstruct the dictionary in full matrix format and then pivot it with pandas:
import pandas as pd
mydict={('A', 'E'): 23972,
('A', 'D'): 10730,
('A', 'B'): 14748,
('A', 'C'): 3424,
('E', 'D'): 3294,
('E', 'B'): 16016,
('E', 'C'): 3373,
('D', 'B'): 69734,
('D', 'C'): 4662,
('B', 'C'): 159161}
# construct the full dictionary
newdict = {}
for (k1, k2), v in mydict.items():
newdict[k1, k2] = v
newdict[k2, k1] = v
newdict[k1, k1] = 0
newdict[k2, k2] = 0
# pivot the result from long to wide
pd.Series(newdict).reset_index().pivot(index='level_0', columns='level_1', values=0)
#level_1 A B C D E
#level_0
#A 0 14748 3424 10730 23972
#B 14748 0 159161 69734 16016
#C 3424 159161 0 4662 3373
#D 10730 69734 4662 0 3294
#E 23972 16016 3373 3294 0
Or as commented by @Ch3steR, you can also just do pd.Series(newdict).unstack()
for the pivot.
Demo link
Related Topics
How to Find All Possible Subsets of a Set Iteratively in R
Error: Attempt to Use Zero-Length Variable Name
Trouble with Strings with <U+0092> Unicode Characters
Calculate Peak Values in a Plot Using R
R Finding Duplicates in One Column and Collapsing in a Second Column
How to Uninstall R Completely from Os X
Creating a Stacked Bar Chart Centered on Zero Using Ggplot
R Not Responding Request to Interrupt Stop Process
R: How to Filter a Timestamp by Hour and Minute
R Shiny: How to Change The Background Color of The Header
How to Subset a Table Object in R
How to Set Themes Globally for Ggplot2
Ggplot Legend Showing Transparency and Fill Color
How to Efficiently Retrieve Top K-Similar Vectors by Cosine Similarity Using R