Display Correlation Tables as Descending List

Here's one of many ways I could think to do this. I used the reshape package because the melt() syntax was easy for me to remember, but the melt() command could pretty easily be done with base R commands:

require(reshape)
## set up dummy data
a <- rnorm(100)
b <- a + (rnorm(100, 0, 2))
c <- a + b + (rnorm(100)/10)
df <- data.frame(a, b, c)
c <- cor(df)
## c is the correlations matrix

## keep only the lower triangle by 
## filling upper with NA
c[upper.tri(c, diag=TRUE)] <- NA

m <- melt(c)

## sort by descending absolute correlation
m <- m[order(- abs(m$value)), ]

## omit the NA values
dfOut <- na.omit(m)

## if you really want a list and not a data.frame
listOut <- split(dfOut, 1:nrow(dfOut))

Show correlations as an ordered list, not as a large matrix

I always use

zdf <- as.data.frame(as.table(z))
zdf
#    Var1 Var2     Freq
# 1     a    a  1.00000
# 2     b    a -0.99669
# 3     c    a -0.14063
# 4     d    a -0.28061
# 5     e    a  0.80519

Then use subset(zdf, abs(Freq) > 0.5) to select significant values.

Create table showing the sorted absolute correlation of various variables with another series

Answer by @Jon Spring is perfect. Here is the same code in base R

res1 <- c(0, 5, 2, 7, 1)
data2 <- data.frame(x1 = 1:5,           # uncorrelated
                    x2 = 14:10,             # uncorrelated and wrong direction
                    x3 = c(0, 5, 1, 6, 0),  # very similar
                    x4 = c(0, 0, 2, 7, 1))  # somewhat similar

correlation = cor(data2, res1, method = "pearson")
names = rownames(correlation)
abs_cor = abs(correlation)
data = data.frame(X_var = names,abs_cor = abs_cor,cor = correlation)
data[order(data$abs_cor,decreasing = TRUE),]

List Highest Correlation Pairs from a Large Correlation Matrix in Pandas?

You can use DataFrame.values to get an numpy array of the data and then use NumPy functions such as argsort() to get the most correlated pairs.

But if you want to do this in pandas, you can unstack and sort the DataFrame:

import pandas as pd
import numpy as np

shape = (50, 4460)

data = np.random.normal(size=shape)

data[:, 1000] += data[:, 2000]

df = pd.DataFrame(data)

c = df.corr().abs()

s = c.unstack()
so = s.sort_values(kind="quicksort")

print so[-4470:-4460]

Here is the output:

2192  1522    0.636198
1522  2192    0.636198
3677  2027    0.641817
2027  3677    0.641817
242   130     0.646760
130   242     0.646760
1171  2733    0.670048
2733  1171    0.670048
1000  2000    0.742340
2000  1000    0.742340
dtype: float64

Is there a cleaner way to subset correlation matrices?

A better option is to create a temporary object with the cor output

tmp <- cor(numericData)

use that object to get the row/column index and subset the rows/columns

rc <- which(tmp < 1 & tmp > 0.8, arr.ind = TRUE)
out <- data.frame(rn = row.names(tmp)[rc[,1]], cn = colnames(tmp)[rc[,2]])

and remove the 'tmp'

rm(tmp)

Or another option without creating any temporary object is to convert to data.frame after creating the table class, and subset the data.frame based on the values in 'Freq' column

subset(as.data.frame.table(cor(numericData)), Freq < 1 & Freq > 0.8)

A reproducible example with mtcars

subset(as.data.frame.table(cor(mtcars)), Freq < 1 & Freq > 0.8)
#   Var1 Var2      Freq
#14 disp  cyl 0.9020329
#15   hp  cyl 0.8324475
#24  cyl disp 0.9020329
#28   wt disp 0.8879799
#35  cyl   hp 0.8324475
#58 disp   wt 0.8879799

Or with between

library(dplyr)
as.data.frame.table(cor(mtcars)) %>% 
     filter(data.table::between(Freq, 0.8, 1, incbounds = FALSE))
# Var1 Var2      Freq
#1 disp  cyl 0.9020329
#2   hp  cyl 0.8324475
#3  cyl disp 0.9020329
#4   wt disp 0.8879799
#5  cyl   hp 0.8324475
#6 disp   wt 0.8879799

Is it possible to filter a corrplot/cormatrix in R?

cor(x) function, when given one argument (matrix or a data.frame) computes correlations between all pairs of variables present in the columns. However the same function can accept two arguments: cor(x, y), in which case it only computes correlations between pairs x and y.

So in your case you can provide all your group variables as x, and the response variable as y, and then plot the result (assuming "response" is in the last column):

cors <- cor(dat[,-ncol(dat)], dat[,ncol(dat)])
corrplot::corrplot(cors)

Sorting correlation matrix

pd.concat([cor[col_name].sort_values(ascending=False)
                        .rename_axis(col_name.replace('Ply', 'index'))
                        .reset_index() 
           for col_name in cor], 
          axis=1)

Explanation:

pd.concat([df_1, ..., df_6], axis=1) concatenates 6 dataframes (each one will be already sorted and will have 2 columns: ‘index_i’ and ‘Ply_i’).
[cor[col_name] for col_name in cor] would create a list of 6 Series, where each Series is the next column of cor.
ser.sort_values(ascending=False) sorts values of a Series ser in the descending order (indices also move - together with their values).
col_name.replace('Ply', 'index') creates a new string from a string col_name by replacing 'Ply' with 'index'.
ser.rename_axis(name).reset_index() renames the index axis, and extracts the index (with its name) as a new column, converting a Series into a DataFrame. The new index of this dataframe is the default range index (from 0 to 6).

Result:

(with my randomly generated numbers)

	index_1	Ply_1	index_2	Ply_2	index_3	Ply_3	index_4	Ply_4	index_5	Ply_5	index_6	Ply_6
0	Ply_1	1	Ply_2	1	Ply_3	1	Ply_4	1	Ply_5	1	Ply_6	1
1	Ply_2	0.387854	Ply_1	0.387854	Ply_1	0.258825	Ply_1	0.337613	Ply_4	0.0618012	Ply_1	0.058282
2	Ply_4	0.337613	Ply_4	0.293496	Ply_4	0.0552454	Ply_2	0.293496	Ply_2	0.060881	Ply_3	-0.207621
3	Ply_3	0.258825	Ply_5	0.060881	Ply_2	-0.0900126	Ply_5	0.0618012	Ply_3	-0.110885	Ply_2	-0.22012
4	Ply_6	0.058282	Ply_3	-0.0900126	Ply_5	-0.110885	Ply_3	0.0552454	Ply_1	-0.390893	Ply_4	-0.291842
5	Ply_5	-0.390893	Ply_6	-0.22012	Ply_6	-0.207621	Ply_6	-0.291842	Ply_6	-0.394074	Ply_5	-0.394074

Output for large correlation matrices in R

Is there anything wrong with

z <- matrix(rnorm(10000),100)
write.csv(cor(z),file="cortmp.csv")

? View(cor(z)) works for me, although I don't know if it's copy-and-pasteable.

For psych::corr.test

dimnames(z) <- list(1:100,1:100)
z[1,2] <- NA  ## unbalance to induce sample size matrix
ct <- psych::corr.test(z)
write.csv(ct$n,file="ntmp.csv")  ## sample sizes
write.csv(ct$t,file="ttmp.csv")  ## t statistics
write.csv(ct$p,file="ptmp.csv")  ## p-values

et cetera. (See str(ct).)

R's paradigm is that if you want to transfer information to another program you're going to output it to a file rather than copying and pasting it from the console ...

Display Correlation Tables as Descending List