Heatmap-Like Plot, But for Categorical Variables

heatmap-like plot, but for categorical variables

I decided it would be easist to approach this with ggplot2 (for me anyway):

#recreate a data set
dat <- data.frame(person=factor(paste0("id#", 1:50),
levels =rev(paste0("id#", 1:50))), matrix(sample(LETTERS[1:3], 150, T), ncol = 3))

library(ggplot2); library(reshape2)
dat3 <- melt(dat, id.var = 'person')
ggplot(dat3, aes(variable, person)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_manual(values=c("red", "blue", "black"))

Sample Image

Write values in heatmap-like plot, but for categorical variables in seaborn

You can build the error texts and annotate manually:

c1, c2 = df_condition1.notna(), df_condition2.notna()
df_condition1,df_condition2 = df_condition1.fillna(''), df_condition2.fillna('')

errors = np.select((c1&c2, c1, c2),
(df_condition1+'\n'+df_condition2, df_condition1, df_condition2),
'')

fig, ax = plt.subplots(figsize = (12, 10))
cmap = ['#b3e6b3','#66cc66','#2d862d','#ffc299','#ff944d','#ff6600','#ccddff','#99bbff','#4d88ff','#0044cc','#002b80']
ax = sns.heatmap(df, cmap=cmap, linewidths = 0.005, annot = False)

for r in range(errors.shape[0]):
for c in range(errors.shape[1]):
ax.text(c+0.5,r+0.5, errors[r,c],
va='center',ha='center',
fontweight='bold')

plt.show()

Output:

Sample Image

Heatmap-like plot for three categorical variables

How about this?

library(dplyr)
library(ggplot2)

df_max <- df %>%
group_by(Color, Shape) %>%
slice(which.max(Freq))

head(df_max)
# Source: local data frame [4 x 4]
# Groups: Color, Shape [4]
#
# Color Shape Size Freq
# (chr) (chr) (chr) (int)
# 1 Red Square Medium 6
# 2 Red Triangle Big 12
# 3 Yellow Square Big 10
# 4 Yellow Triangle Small 8

ggplot(df_max, aes(x = Color, y = Shape, fill = Size)) +
geom_tile()

Sample Image

Creating a heatmap based on extracted proportions out of a categorical data

After some data manipulation, you can use geom_tile:

library(tidyverse)

#Data wrangling
df1 <-
df1 %>%
group_by(AgeGroup, Sex) %>%
summarise(across(starts_with("marker"),
~ sum(.x == "yes") / n())) %>%
ungroup() %>%
mutate(gp = paste0(AgeGroup, "_", Sex), .keep = "unused") %>%
pivot_longer(-gp)

# A tibble: 12 × 3
# gp name value
# <chr> <chr> <dbl>
# 1 O_f marker1 0.5
# 2 O_f marker2 1
# 3 O_f marker3 0
# 4 O_m marker1 0.5
# 5 O_m marker2 0.5
# 6 O_m marker3 0
# ...

#Plot
df1 %>%
ggplot() +
aes(x = name, y = gp, fill = value) +
geom_tile() +
theme_minimal()

Sample Image

plot a heatmap for binary categorical variables in R

You're on the right track with heatmap. Turn the "yes" / "no" columns of your df into a matrix of 0's and 1's and disable some of the defaults such as scaling and ordering.

mat1 <- 1*(df1[,-1]=="yes")

> mat1
var1 var2 var3
[1,] 1 0 1
[2,] 1 1 0
[3,] 0 0 0
[4,] 1 1 1
[5,] 0 0 1

# You only need this step if you want the IDs to be shown beside the plot

rownames(mat1) <- rownames(df1)

> mat1
var1 var2 var3
1 1 0 1
2 1 1 0
3 0 0 0
4 1 1 1
5 0 0 1

# reorder the matrix by rowSums before plotting

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA)

heatmap outcome

You can change the colour scheme by specifying the col parameter like

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA, col=c("lightgrey", "tomato"))

If you would prefer the plot to read left-to-right (one column per ID), just transpose the matrix

 heatmap(t(mat1[order(rowSums(mat1)),]), scale = "none", Rowv = NA, Colv = NA)


Related Topics



Leave a reply



Submit