Sorting of Categorical Variables in Ggplot

Sorting of categorical variables in ggplot

I'm pretty sure stat_sort does not exist, so it's not surprising that it doesn't work as you think it should. Luckily, there's the reorder() function which reorders the level of a categorical variable depending on the values of a second variable. I think this should do what you want:

trial.plot <- qplot( x = numbers, y = reorder(letters, numbers), data = trial)
trial.plot

Sample Image

sort columns with categorical variables by numerical varables in stacked barplot

The issue here is that all your percentages for a given category (name) in fact add up to 100%. So sorting by percentage, which is normally achieved via aes(x = reorder(name, percentage), y = percentage), won’t work here.

Instead, you probably want to order by the percentage of the data that has class = 1 (or class = -1). Doing this requires some trickery: Use ifelse to select the percentage for the rows where class == 1. For all other rows, select the value 0:

ggplot(df, aes(x = reorder(name, ifelse(class == 1, percentage, 0)), y = percentage, fill = factor(class))) +
geom_bar(stat = "identity") +
scale_fill_discrete(name = "Class") +
xlab('Names')

You might want to execute just the reorder instruction to see what’s going on:

reorder(df$name, ifelse(df$class == 1, df$percentage, 0))
# [1] A A B B C C D D
# attr(,"scores")
# A B C D
# 44.055 48.720 47.020 46.630
# Levels: A D C B

As you can see, your names got reordered based on the mean percentage for each category (by default, reorder uses the mean; see its manual page for more details). But the “mean” we calculated was between each name’s percentage for class = 1, and the value 0 (for class ≠ 1).

R - ggplot2 re-order of categorical variable (issue with reorder func)

Relevel the variable using sorted levels of the 'value'variable (I created a new one for comparison purposes):

data$value2 <- factor(data$value, levels = sort(levels(data$value)))

ggplot(data=data, aes(x= value2, y=mean, fill = value2 )) +
geom_bar(stat="identity") + theme_minimal()

Sample Image

How to reorder categorical variables in x axis in ggplot2?

We could use fct_relevel from forcats package, it is in tidyverse:

library(tidyverse)

S1_sub %>% filter(season == "season1", variable == "rainfall") %>%
filter(str_detect(source, 'Mada|IDW2')) %>%
mutate(aggregation_period = fct_relevel(aggregation_period, c("Mar", "Apr", "May", "Jun", "Jul", "AMJ", "MAMJJ"))) %>%
ggplot(aes(x = aggregation_period , y = value)) +
geom_boxplot(width = 0.3, col = "black", ) +
facet_grid(cols = vars(source), rows = vars(season) ,scales = "free_y") +
scale_y_continuous(breaks = seq(0, 1350, 175)) +
xlab("Rainfall Indices") +
ylab("Rainfall (mm)") +
theme_bw() +
theme(axis.title = element_text(size = 12), # all titles
axis.text = element_text(colour = "black"),
axis.text.x = element_text(angle = 90, hjust = 1,
size = 10, color = "black"),
# axis.text.y = element_text(size = 10),
panel.border = element_rect(color = "black",
size = .5))

Sample Image

ggplot reorders categorical variables

sort(xaxis)
[1] "100" "80" "90"

Sorting of character vectors is done by a character by character basis - ie it doesn't understand the numerical context of the data.

ggplot2 will convert character variables to factors and by default factors sort their levels:

factor(xaxis)
[1] 80 90 100
Levels: 100 80 90

Order categorical data in a stacked bar plot with ggplot2

I see that you have an order column in your data frame which I gather is your order. Hence you can simply do.

p0 = qplot(factor(kclust), fill = reorder(hhDomMil, order), position = 'fill', 
data = df1)

Here are the elements of this code that take care of your questions

  1. How do I plot such a ordered plot? reorder
  2. How do I set up x so that each bar is "on" one number? factor(kclust)
  3. How do I seperate the bars?
  4. How do I print all kclust numbers in x? factor(kclust)

I remember from a previous question of yours that the hhDomMil corresponded to different groups, and I suspect your ordering follows the grouping. In that case, you might want to use that information to choose a color palette that makes it simpler to follow the graph. Here is one way to do it.

mycols = c(brewer.pal(3, 'Oranges'), brewer.pal(3, 'Greens'), 
brewer.pal(2, 'Blues'), brewer.pal(2, 'PuRd'))

p0 + scale_fill_manual(values = mycols)

Sample Image

reorder x-axis variables by sorting a subset of the data

df$name2 <- factor(df$name, levels = xvals)
ggplot(df, aes(x = name2, y = percent, fill = cat)) +
geom_bar(stat = "identity", position = "fill")

Sample Image

Data frame variable order for ggplot

First off, you can set the order of factor levels for columns like region in your original dataframe. Then you don't end up with all these different slightly modified versions of the same data. Then sort the dataframe how you want it, and use forcats::fct_inorder to reassign the factor levels for my_name based on their current order in the dataframe:

library(tidyverse)
library(ggplot2)
library(forcats)

set.seed(1)
num_rows <- 12
sample_names <- do.call(paste0, replicate(5, sample(letters, num_rows, TRUE), FALSE))
df1 <- data.frame(region=sample(c("N", "S", "E", "W"), num_rows, replace = TRUE),
sub_region=sample(c("High", "Medium", "Low"), num_rows, replace = TRUE),
my_order = seq(1,num_rows),
my_name = sample_names,
var_1 = sample(100, num_rows, replace = TRUE))

df1$region <- factor(df1$region, levels = c("N","E","S","W"))
df1$sub_region <- factor(df1$sub_region, levels = c("High","Medium","Low"))
df1 <- df1[order(df1$region, df1$sub_region, df1$my_order, decreasing = TRUE), ]
# Order my_name levels based on current order
df1$my_name = fct_inorder(df1$my_name)
df1 %>% ggplot() + geom_point(aes( x = var_1, y = my_name, color=sub_region))

Note that I had to use decreasing = TRUE in the order() call to get the order going top to bottom.

For categorical variables like my_name, it's the order of factor levels that determines the order ggplot plots them in, not their current order in the dataframe which is what you were changing in your example code. This makes the tools in forcats very useful when you need to control the order in a plot.



Related Topics



Leave a reply



Submit