Ggplot2 Log Transformation for Data and Scales

Transform only one axis to log10 scale with ggplot2

The simplest is to just give the 'trans' (formerly 'formatter') argument of either the scale_x_continuous or the scale_y_continuous the name of the desired log function:

library(ggplot2)  # which formerly required pkg:plyr
m + geom_boxplot() + scale_y_continuous(trans='log10')

EDIT:
Or if you don't like that, then either of these appears to give different but useful results:

m <- ggplot(diamonds, aes(y = price, x = color), log="y")
m + geom_boxplot() 
m <- ggplot(diamonds, aes(y = price, x = color), log10="y")
m + geom_boxplot()

EDIT2 & 3:
Further experiments (after discarding the one that attempted successfully to put "$" signs in front of logged values):

# Need a function that accepts an x argument
# wrap desired formatting around numeric result
fmtExpLg10 <- function(x) paste(plyr::round_any(10^x/1000, 0.01) , "K $", sep="")

ggplot(diamonds, aes(color, log10(price))) + 
  geom_boxplot() + 
  scale_y_continuous("Price, log10-scaling", trans = fmtExpLg10)

alt text

Note added mid 2017 in comment about package syntax change:

scale_y_continuous(formatter = 'log10') is now scale_y_continuous(trans = 'log10') (ggplot2 v2.2.1)

How to set x-axes to the same scale after log-transformation with ggplot

I think the reason that you're unable to set identical scales is because the lower limit is invalid in log-space, e.g. log2(-100) evaluates to NaN. That said, have you considered facetting the data instead?

library(ggplot2)

set.seed(123); g1 <- data.frame(rlnorm(1000, 1, 3))
set.seed(123); g2 <- data.frame(rlnorm(2000, 0.4, 1.2))
colnames(g1) <- "value"; colnames(g2) <- "value"

df <- rbind(
  cbind(g1, name = "G1"),
  cbind(g2, name = "G2")
)

ggplot(df, aes(value)) +
  geom_histogram(aes(y = after_stat(density)),
                 binwidth = 0.5) +
  geom_density() +
  scale_x_continuous(
    trans = "log2",
    labels = scales::number_format(accuracy = 0.01, decimal.mark = '.'),
    breaks = c(0, 0.01, 0.1, 1, 10, 100, 10000), limits=c(1e-3, 20000)) +
  facet_wrap(~ name)
#> Warning: Removed 4 rows containing non-finite values (stat_bin).
#> Warning: Removed 4 rows containing non-finite values (stat_density).
#> Warning: Removed 4 rows containing missing values (geom_bar).

Sample Image

^{Created on 2021-03-20 by the reprex package (v1.0.0)}

difference in ggplot scaling with log transformation

It may be easier to see if you instead use scale_x_log10

ggplot(data.frame(x), aes(x)) + 
  geom_histogram(bins = 100) + 
  scale_x_log10()

gives

Sample Image

Then, we can do a few things to compare. First, we can change the labels:

myBreaks <-
  10^c(-61, -43, -25, -7)

ggplot(data.frame(x), aes(x)) + 
  geom_histogram(bins = 100) + 
  scale_x_log10(breaks = myBreaks
                , labels = log10(myBreaks))

gives

Sample Image

We can also get the same plot by transforming x before plotting it:

ggplot(data.frame(x = log10(x)), aes(x)) + 
  geom_histogram(bins = 100)

gives

Sample Image

and, we can compare all of these to the summary for the log10(x)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-74.1065  -5.5416  -2.5300  -3.8340  -0.7579   1.6531

See how that matches up with the graphs above pretty closely?

scale_x_log10 and scale_x_continuous(trans = "log") are not actually changing the data -- they are changing the scaling of the axis, but leaving the labels in the original units.

Bringing it back to your original values, log(5.8e-62) is -141 -- which is the value you would expect to see if the plot was of the converted data.

If you really must have the log-values displayed, you can also accomplish that within the mapping, with the added advantage that the axis-label defaults to a meaningful value as well:

ggplot(data.frame(x = x), aes(log10(x))) + 
  geom_histogram(bins = 100)

gives

Sample Image

ggplot2 log transformation for data and scales

Finally, I have figured out the issues, removed my previous answer and I'm providing my latest solution below (the only thing I haven't solved is legend panel for components - it doesn't appear for some reason, but for an EDA to demonstrate the presence of mixture distribution I think that it is good enough). The complete reproducible solution follows. Thanks to everybody on SO who helped w/this directly or indirectly.

library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- diamonds$price

calc.components <- function(x, mix, comp.number) {

  mix$lambda[comp.number] *
    dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}

overlayHistDensity <- function(data, calc.comp.fun) {

  # extract 'k' components from mixed distribution 'data'
  mix.info <- normalmixEM(data, k = NUM_COMPONENTS,
                          maxit = 100, epsilon = 0.01)
  summary(mix.info)

  numComponents <- length(mix.info$sigma)
  message("Extracted number of component distributions: ",
          numComponents)

  DISTRIB_COLORS <- 
    suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))

  # create (plot) histogram and ...
  g <- ggplot(as.data.frame(data), aes(x = data)) +
    geom_histogram(aes(y = ..density..),
                   binwidth = 0.01, alpha = 0.5) +
    theme(legend.position = 'top', legend.direction = 'horizontal')

  comp.labels <- lapply(seq(numComponents),
                        function (i) paste("Component", i))

  # ... fitted densities of components
  distComps <- lapply(seq(numComponents), function (i)
    stat_function(fun = calc.comp.fun,
                  args = list(mix = mix.info, comp.number = i),
                  size = 2, color = DISTRIB_COLORS[i]))

  legend <- list(scale_colour_manual(name = "Legend:",
                                     values = DISTRIB_COLORS,
                                     labels = unlist(comp.labels)))

  return (g + distComps + legend)
}

overlayPlot <- overlayHistDensity(log10(myData), 'calc.components')
print(overlayPlot)

Result:

Sample Image

Plotting of the mean in boxplot before axis log transformation in R

The mean calculated by stat_summary is the mean of log10(value), not of value.
Below I propose to define a new function my_mean for a correct calculation of the average value.

library(ggplot2)
library(dplyr)
library(tibble)
library(scales)

df <- c(2e-05, 0.38, 0.63, 0.98, 0.04, 0.1, 0.16, 
0.83, 0.17, 0.09, 0.48, 4.36, 0.83, 0.2, 0.32, 0.44, 
0.22, 0.23, 0.89, 0.23, 1.1, 0.62, 5, 340, 47) %>% as.tibble()

# Define the mean function   
my_mean <- function(x) {
   log10(mean(10^x))
}

df %>%
  ggplot(aes(x = 0, y = value)) +
  geom_boxplot(width = .12, outlier.color = NA) +
  stat_summary(fun=my_mean, geom="point", shape=21, size=3, color="black", fill="grey") +
  labs(
    x = "",
    y = "Particle counts (P/kg)"
  ) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), 
                labels = trans_format("log10", math_format(10^.x)))

Sample Image

R ggplot2: custom y-axis tick labels for log-transformed data?

You can just use math_format() with its default arguments:

test_plt <- ggplot(data = test_df,aes(x = xval,y = yval)) +
    geom_point() + 
    scale_y_continuous(breaks = seq(1,3,by = 1),
                       limits = c(1,3),
                       labels = math_format())

print(test_plt)

Sample Image

From help("math_format"):

Usage
... [Some content omitted]...

math_format(expr = 10^.x, format = force)

This is exactly the formatting you want, 10^.x, rather than 10^.x after a log10 transformation, which is what you get when you call it within trans_format("log10", math_format(10^.x))

Ggplot2 Log Transformation for Data and Scales

Transform only one axis to log10 scale with ggplot2

How to set x-axes to the same scale after log-transformation with ggplot

difference in ggplot scaling with log transformation

ggplot2 log transformation for data and scales

Plotting of the mean in boxplot before axis log transformation in R

R ggplot2: custom y-axis tick labels for log-transformed data?

Related Topics

Leave a reply