R/Ggplot2: Collapse or Remove Segment of Y-Axis from Scatter-Plot

R/ggplot2: Collapse or remove segment of y-axis from scatter-plot

You could do this by defining a coordinate transformation. A standard example are logarithmic coordinates, which can be achieved in ggplot by using scale_y_log10().

But you can also define custom transformation functions by supplying the trans argument to scale_y_continuous() (and similarly for scale_x_continuous()). To this end, you use the function trans_new() from the scales package. It takes as arguments the transformation function and its inverse.

I discuss first a special solution for the OP's example and then also show how this can be generalised.

OP's example

The OP wants to shrink the interval between -2 and 2. The following defines a function (and its inverse) that shrinks this interval by a factor 4:

library(scales)
trans <- function(x) {
ifelse(x > 2, x - 1.5, ifelse(x < -2, x + 1.5, x/4))
}
inv <- function(x) {
ifelse(x > 0.5, x + 1.5, ifelse(x < -0.5, x - 1.5, x*4))
}
my_trans <- trans_new("my_trans", trans, inv)

This defines the transformation. To see it in action, I define some sample data:

x_val <- 0:250
y_val <- c(-6:-2, 2:6)
set.seed(1234)
data <- data.frame(x = sample(x_val, 30, replace = TRUE),
y = sample(y_val, 30, replace = TRUE))

I first plot it without transformation:

p <- ggplot(data, aes(x, y)) + geom_point()
p + scale_y_continuous(breaks = seq(-6, 6, by = 2))

Sample Image

Now I use scale_y_continuous() with the transformation:

p + scale_y_continuous(trans = my_trans,
breaks = seq(-6, 6, by = 2))

Sample Image

If you want another transformation, you have to change the definition of trans() and inv() and run trans_new() again. You have to make sure that inv() is indeed the inverse of inv(). I checked this as follows:

x <- runif(100, -100, 100)
identical(x, trans(inv(x)))
## [1] TRUE

General solution

The function below defines a transformation where you can choose the lower and upper end of the region to be squished, as well as the factor to be used. It directly returns the trans object that can be used inside scale_y_continuous:

library(scales)
squish_trans <- function(from, to, factor) {

trans <- function(x) {

if (any(is.na(x))) return(x)

# get indices for the relevant regions
isq <- x > from & x < to
ito <- x >= to

# apply transformation
x[isq] <- from + (x[isq] - from)/factor
x[ito] <- from + (to - from)/factor + (x[ito] - to)

return(x)
}

inv <- function(x) {

if (any(is.na(x))) return(x)

# get indices for the relevant regions
isq <- x > from & x < from + (to - from)/factor
ito <- x >= from + (to - from)/factor

# apply transformation
x[isq] <- from + (x[isq] - from) * factor
x[ito] <- to + (x[ito] - (from + (to - from)/factor))

return(x)
}

# return the transformation
return(trans_new("squished", trans, inv))
}

The first line in trans() and inv() handles the case when the transformation is called with x = c(NA, NA). (It seems that this did not happen with the version of ggplot2 when I originally wrote this question. Unfortunately, I don't know with which version this startet.)

This function can now be used to conveniently redo the plot from the first section:

p + scale_y_continuous(trans = squish_trans(-2, 2, 4),
breaks = seq(-6, 6, by = 2))

The following example shows that you can squish the scale at an arbitrary position and that this also works for other geoms than points:

df <- data.frame(class = LETTERS[1:4],
val = c(1, 2, 101, 102))
ggplot(df, aes(x = class, y = val)) + geom_bar(stat = "identity") +
scale_y_continuous(trans = squish_trans(3, 100, 50),
breaks = c(0, 1, 2, 3, 50, 100, 101, 102))

Sample Image

Let me close by stressing what other already mentioned in comments: this kind of plot could be misleading and should be used with care!

ggplot with 2 y axes on each side and different scales

Sometimes a client wants two y scales. Giving them the "flawed" speech is often pointless. But I do like the ggplot2 insistence on doing things the right way. I am sure that ggplot is in fact educating the average user about proper visualization techniques.

Maybe you can use faceting and scale free to compare the two data series? - e.g. look here: https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page

Using ggplot2, can I insert a break in the axis?

As noted elsewhere, this isn't something that ggplot2 will handle well, since broken axes are generally considered questionable.

Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:

Sample Image

Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.

And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.

Finally, there is also the axis.break function in the plotrix package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.

Scatter plot in ggplot, one numeric variable across two groups

You've over-tidied. Tidying data isn't just the mechanism of making it as long as possible, its making it as wide as necessary..

For example, if you had location as X and Y for animal sightings you wouldn't have two rows, one with a "label" column containing "X" and the X coordinate in a "value" column and another with "Y" in the "label" column and the Y coordinate in the "value" column - unless you really where storing the data in a key-value store but that's another story...

Widen your data and put the test scores for male and female into test_core_male and test_score_female, then they are the x and y aesthetics for your scatter plot.

Linear Regression in ggplot2

You can use geom_smooth for the regression line and geom_text for the labels.

ggplot(df_TB_d, aes(x=Bevoelkerung, y=Studierende)) +
geom_text(aes(label = Land)) +
geom_smooth(method = "lm", se = FALSE)

Result:

Sample Image

Plotting data and manipulating y-axis

The scale you are drawing is not linear (the difference between 0 and 1 is not equal to the difference between 1 and 10, but the lines are equally far apart).

Therefore, you need to transform your data. In your case, you are looking for a log10 transform, since the distance between 0.1 and 1 on a log-scale equals the distance between 1 and 10 (note that 0 is not valid on a log-scale):

ggplot(data, aes(x, y)) + geom_point() + scale_y_log10()

Note that scale_y_log10 is the same as scale_y_continuous(trans = "log10").
This transforms your points to log-scale, while preserving the y-axis labels to be on the original scale.

Compare with

ggplot(data, aes(x, log(y))) + geom_point()

which transforms your points on a log-scale and also transforms the y-axis labels.



Related Topics



Leave a reply



Submit