## Use .corr to get the correlation between two columns

Without actual data it is hard to answer the question but I guess you are looking for something like this:

`Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])`

That calculates the correlation between your two columns `'Citable docs per Capita'`

and `'Energy Supply per Capita'`

.

To give an example:

`import pandas as pd`

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

A B

0 0 0

1 1 2

2 2 4

3 3 6

Then

`df['A'].corr(df['B'])`

gives `1`

as expected.

Now, if you change a value, e.g.

`df.loc[2, 'B'] = 4.5`

A B

0 0 0.0

1 1 2.0

2 2 4.5

3 3 6.0

the command

`df['A'].corr(df['B'])`

returns

`0.99586`

which is still close to 1, as expected.

If you apply `.corr`

directly to your dataframe, it will return all pairwise correlations between your columns; that's why you then observe `1s`

at the diagonal of your matrix (each column is perfectly correlated with itself).

`df.corr()`

will therefore return

` A B`

A 1.000000 0.995862

B 0.995862 1.000000

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get `NaN`

s in your solution - check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question.

If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.

## Correlation coefficient of two columns in pandas dataframe with .corr()

Calling `.corr()`

on the entire DataFrame gives you a full correlation matrix:

`>>> table.corr()`

Group Age

Group 1.0000 -0.1533

Age -0.1533 1.0000

You can use the separate Series instead:

`>>> table['Group'].corr(table['Age'])`

-0.15330486289034567

This should be faster than using the full matrix and indexing it (with `df.corr().iat['Group', 'Age']`

). Also, this should work whether `Group`

is bool or int dtype.

## Calculate correlation between columns of strings

You can convert datatype to categorical and then do it

`df['profession']=df['profession'].astype('category').cat.codes`

df['media']=df['media'].astype('category').cat.codes

df.corr()

## Python Pandas pandas correlation one column vs all

The most efficient method it to use `corrwith`

.

Example:

`df.corrwith(df['A'])`

Setup of example data:

`import numpy as np`

import pandas as pd

df = pd.DataFrame(np.random.randint(10, size=(5, 5)), columns=list('ABCDE'))

# A B C D E

# 0 7 2 0 0 0

# 1 4 4 1 7 2

# 2 6 2 0 6 6

# 3 9 8 0 2 1

# 4 6 0 9 7 7

output:

`A 1.000000`

B 0.526317

C -0.209734

D -0.720400

E -0.326986

dtype: float64

## Calculate correlation between two columns based on column names

You can create a function like this:

`cor_f <- function(x) {`

cor(test[,names(test)[grepl(x, names(test))]])[2]

}

cor_f('Obs1') #correlation between Obs1_grp1 and Obs1_grp2

#0.3159908

In case you need a loop, one way would be:

`vars <- c('Obs1', 'Obs2') `

sapply(vars, function(i) cor_f(i))

### Related Topics

Correct Way to Implement a Custom Popup Tkinter Dialog Box

How to Set the Figure Title and Axes Labels Font Size in Matplotlib

Concatenate Two Numpy Arrays Vertically

Underscore VS Double Underscore with Variables and Methods

How to Use Pip to Install a Package from a Private Github Repository

Convert Row to Column Header for Pandas Dataframe,

How to Move Pandas Data from Index to Column After Multiple Groupby

What's the Difference Between _Builtin_ and _Builtins_

Retrieving a Foreign Key Value with Django-Rest-Framework Serializers

Find P-Value (Significance) in Scikit-Learn Linearregression

How to Check If Character in a String Is a Letter? (Python)

Why Use Abstract Base Classes in Python

Select Pandas Rows Based on List Index

Splitting a Semicolon-Separated String to a Dictionary, in Python

Python Pandas Extract Year from Datetime: Df['Year'] = Df['Date'].Year Is Not Working