How to Make a Heatmap with a Large Matrix

Python : Plot heatmap for large matrix

I can think of two options by using numpy arrays.

  1. Assuming your data is mostly higher than zero but there are a lot of zeros.:

    vmin = some_value_higher_than_zero
    plt.matshow(k,aspect='auto',vmin=vmin)
  2. Setting all zeros to NaNs. they are automatically left out.

    k[k==0.0]=np.nan
    plt.matshow(k,aspect='auto')

NB. imshow and matshow work both here.

Another option, when your matrix is really sparse is to use scatterplots.

x,y = k.nonzero()
plt.scatter(x,y,s=100,c=k[x,y]) #color as the values in k matrix

How to render a heatmap for a large array

  • The original code didn't generate a plot for me
  • Changing fig, ax = plt.subplots() to plt.figure(figsize=(14, 14)), worked to create the plot.
    • At figsize=(10, 10), the figure didn't render in Jupyter, but the correct image did save to a file.
    • A figure smaller than figsize=(14, 14), wouldn't render in Jupyter.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# create matrix
size = 10000
similarity_matrix = np.random.rand(size, size)

# plot matrix

# create figure and set size
plt.figure(figsize=(14, 14))

# add heatmap
sns.heatmap(similarity_matrix, vmin=0, vmax=1)

# save the figure
plt.savefig('test.png', dpi=600)

# show the figure; this was slow
plt.show()

Sample Image

Heatmap for a large matrix and get the clear labels

since you stated that you need all the labels, the only way I see is reducing the font size. You can do this by setting the cexCol and cexRow parameters in your call to heatmap(); for example like this:

heatmap(as.matrix(iris[,1:3]),cexRow = 0.1, cexCol = 0.1,)

How to plot a heatmap of a big matrix with matplotlib (45K * 446)

I solved by downsampling the matrix to a smaller matrix.
I decided to try two methodologies:

  • supposing I want to down-sample a matrix of 45k rows to a matrix of 1k rows, I took a row value every 45 rows
  • another methodology is, to down-sample 45k rows to 1k rows, to group the 45k rows into 1k groups (composed by 45 adjacent rows) and to take the average for each group as representative row

Hope it helps.

How to make clustered heatmap of a large dataset look nicer?

The problem is in your vmax = 1 argument. If you look at the maximum value in the whole dataset using new_matrix.max().max() , it is about 0.17.
So, just removing vmax as:works like this or just set a lower value for vmax



Related Topics



Leave a reply



Submit