## Use .corr to get the correlation between two columns

Without actual data it is hard to answer the question but I guess you are looking for something like this:

`Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])`

That calculates the correlation between your two columns `'Citable docs per Capita'`

and `'Energy Supply per Capita'`

.

To give an example:

`import pandas as pd`

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

A B

0 0 0

1 1 2

2 2 4

3 3 6

Then

`df['A'].corr(df['B'])`

gives `1`

as expected.

Now, if you change a value, e.g.

`df.loc[2, 'B'] = 4.5`

A B

0 0 0.0

1 1 2.0

2 2 4.5

3 3 6.0

the command

`df['A'].corr(df['B'])`

returns

`0.99586`

which is still close to 1, as expected.

If you apply `.corr`

directly to your dataframe, it will return all pairwise correlations between your columns; that's why you then observe `1s`

at the diagonal of your matrix (each column is perfectly correlated with itself).

`df.corr()`

will therefore return

` A B`

A 1.000000 0.995862

B 0.995862 1.000000

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get `NaN`

s in your solution - check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question.

If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.

## Correlation coefficient of two columns in pandas dataframe with .corr()

Calling `.corr()`

on the entire DataFrame gives you a full correlation matrix:

`>>> table.corr()`

Group Age

Group 1.0000 -0.1533

Age -0.1533 1.0000

You can use the separate Series instead:

`>>> table['Group'].corr(table['Age'])`

-0.15330486289034567

This should be faster than using the full matrix and indexing it (with `df.corr().iat['Group', 'Age']`

). Also, this should work whether `Group`

is bool or int dtype.

## Calculate correlation between columns of strings

You can convert datatype to categorical and then do it

`df['profession']=df['profession'].astype('category').cat.codes`

df['media']=df['media'].astype('category').cat.codes

df.corr()

## Python Pandas pandas correlation one column vs all

The most efficient method it to use `corrwith`

.

Example:

`df.corrwith(df['A'])`

Setup of example data:

`import numpy as np`

import pandas as pd

df = pd.DataFrame(np.random.randint(10, size=(5, 5)), columns=list('ABCDE'))

# A B C D E

# 0 7 2 0 0 0

# 1 4 4 1 7 2

# 2 6 2 0 6 6

# 3 9 8 0 2 1

# 4 6 0 9 7 7

output:

`A 1.000000`

B 0.526317

C -0.209734

D -0.720400

E -0.326986

dtype: float64

## Calculate correlation between two columns based on column names

You can create a function like this:

`cor_f <- function(x) {`

cor(test[,names(test)[grepl(x, names(test))]])[2]

}

cor_f('Obs1') #correlation between Obs1_grp1 and Obs1_grp2

#0.3159908

In case you need a loop, one way would be:

`vars <- c('Obs1', 'Obs2') `

sapply(vars, function(i) cor_f(i))

