Computing the correlation coefficient between two multi-dimensional arrays
Correlation (default 'valid' case) between two 2D arrays:
You can simply use matrix-multiplication np.dot
like so -
out = np.dot(arr_one,arr_two.T)
Correlation with the default "valid"
case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
Row-wise Correlation Coefficient calculation for two 2D arrays:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
Benchmarking
This section compares runtime performance with the proposed approach against generate_correlation_map
& loopy pearsonr
based approach listed in the other answer.(taken from the function test_generate_correlation_map()
without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
Case #1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
Case #2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
Case #3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
The other loopy pearsonr based
approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop
Correlation coefficient between a 2D and a 3D array - NumPy/Python
We could use corr2_coeff
from this post
after reshaping the inputs to 2D
versions, such that the first input is reshaped to a one-column array and the second one would have number of columns same as the combined length of its last two axes, like so -
corr2_coeff(A.reshape(1,-1),B.reshape(B.shape[0],-1)).ravel()
Sample run -
In [143]: from scipy.stats.stats import pearsonr
...:
...: A = np.random.random([5,5])
...: B = np.random.random([3,5,5])
...: C = []
...: for i in B:
...: C.append(pearsonr(A.flatten(), i.flatten())[0])
...:
...: C = np.array(C)
...:
In [144]: C
Out[144]: array([ 0.05637413, -0.26749579, -0.08957621])
In [145]: corr2_coeff(A.reshape(1,-1),B.reshape(B.shape[0],-1)).ravel()
Out[145]: array([ 0.05637413, -0.26749579, -0.08957621])
For really huge arrays, we might need to resort to one-loop, like so -
[corr2_coeff(A.reshape(1,-1), i.reshape(1,-1)) for i in B]
Computing row-wise correlation coefficients between two 2d arrays in Python
I think I'd just use a list-comprehension and a module for calculating the coefficient:
from scipy.stats.stats import pearsonr
import numpy as np
M = 10
T = 4
A = np.random.rand(M*T).reshape((M, T))
B = np.random.rand(M*T).reshape((M, T))
diag_pear_coef = [pearsonr(A[i, :], B[i, :])[0] for i in range(M)]
Does that work for you? Note that pearsonr
returns more than just the correlation coefficient, hence the [0]
indexing.
Good luck!
compute array of correlations between two multidimensional arrays in R
Using abind
we may combine these two arrays into a four-dimensional one and then employ apply
across the first two dimensions:
library(abind)
apply(abind(X, Y, along = 4), 1:2, function(Z) cor(Z[, 1], Z[, 2]))
correlation coefficient between columns of 2 dataframes
I think you need something like this,
a=df1.columns.values
b=df2.columns.values
print [df1[u].corr(df2[v]) for u,v in list(itertools.product(a, b))]
Related Topics
How to Compute the Intersection Point of Two Lines
What Does It Mean to "Call" a Function in Python
How to Properly Round-Up Half Float Numbers
Iterating Each Character in a String Using Python
Run Python Script Without Windows Console Appearing
Regex Matching 5-Digit Substrings Not Enclosed with Digits
Python3: Importerror: No Module Named '_Ctypes' When Using Value from Module Multiprocessing
Datetime to String with Series in Pandas
How to Use Argsort in Descending Order
Argument 1 Has Unexpected Type 'Nonetype'
List Comprehension VS Generator Expression's Weird Timeit Results
Creating a New Column Based on If-Elif-Else Condition
Make 2 Functions Run at the Same Time
Execute Multiple Commands in Paramiko So That Commands Are Affected by Their Predecessors