R/ggplot2: Collapse or remove segment of y-axis from scatter-plot
You could do this by defining a coordinate transformation. A standard example are logarithmic coordinates, which can be achieved in ggplot
by using scale_y_log10()
.
But you can also define custom transformation functions by supplying the trans
argument to scale_y_continuous()
(and similarly for scale_x_continuous()
). To this end, you use the function trans_new()
from the scales
package. It takes as arguments the transformation function and its inverse.
I discuss first a special solution for the OP's example and then also show how this can be generalised.
OP's example
The OP wants to shrink the interval between -2 and 2. The following defines a function (and its inverse) that shrinks this interval by a factor 4:
library(scales)
trans <- function(x) {
ifelse(x > 2, x - 1.5, ifelse(x < -2, x + 1.5, x/4))
}
inv <- function(x) {
ifelse(x > 0.5, x + 1.5, ifelse(x < -0.5, x - 1.5, x*4))
}
my_trans <- trans_new("my_trans", trans, inv)
This defines the transformation. To see it in action, I define some sample data:
x_val <- 0:250
y_val <- c(-6:-2, 2:6)
set.seed(1234)
data <- data.frame(x = sample(x_val, 30, replace = TRUE),
y = sample(y_val, 30, replace = TRUE))
I first plot it without transformation:
p <- ggplot(data, aes(x, y)) + geom_point()
p + scale_y_continuous(breaks = seq(-6, 6, by = 2))
Now I use scale_y_continuous()
with the transformation:
p + scale_y_continuous(trans = my_trans,
breaks = seq(-6, 6, by = 2))
If you want another transformation, you have to change the definition of trans()
and inv()
and run trans_new()
again. You have to make sure that inv()
is indeed the inverse of inv()
. I checked this as follows:
x <- runif(100, -100, 100)
identical(x, trans(inv(x)))
## [1] TRUE
General solution
The function below defines a transformation where you can choose the lower and upper end of the region to be squished, as well as the factor to be used. It directly returns the trans
object that can be used inside scale_y_continuous
:
library(scales)
squish_trans <- function(from, to, factor) {
trans <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < to
ito <- x >= to
# apply transformation
x[isq] <- from + (x[isq] - from)/factor
x[ito] <- from + (to - from)/factor + (x[ito] - to)
return(x)
}
inv <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < from + (to - from)/factor
ito <- x >= from + (to - from)/factor
# apply transformation
x[isq] <- from + (x[isq] - from) * factor
x[ito] <- to + (x[ito] - (from + (to - from)/factor))
return(x)
}
# return the transformation
return(trans_new("squished", trans, inv))
}
The first line in trans()
and inv()
handles the case when the transformation is called with x = c(NA, NA)
. (It seems that this did not happen with the version of ggplot2
when I originally wrote this question. Unfortunately, I don't know with which version this startet.)
This function can now be used to conveniently redo the plot from the first section:
p + scale_y_continuous(trans = squish_trans(-2, 2, 4),
breaks = seq(-6, 6, by = 2))
The following example shows that you can squish the scale at an arbitrary position and that this also works for other geoms than points:
df <- data.frame(class = LETTERS[1:4],
val = c(1, 2, 101, 102))
ggplot(df, aes(x = class, y = val)) + geom_bar(stat = "identity") +
scale_y_continuous(trans = squish_trans(3, 100, 50),
breaks = c(0, 1, 2, 3, 50, 100, 101, 102))
Let me close by stressing what other already mentioned in comments: this kind of plot could be misleading and should be used with care!
ggplot with 2 y axes on each side and different scales
Sometimes a client wants two y scales. Giving them the "flawed" speech is often pointless. But I do like the ggplot2 insistence on doing things the right way. I am sure that ggplot is in fact educating the average user about proper visualization techniques.
Maybe you can use faceting and scale free to compare the two data series? - e.g. look here: https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page
Using ggplot2, can I insert a break in the axis?
As noted elsewhere, this isn't something that ggplot2
will handle well, since broken axes are generally considered questionable.
Other strategies are often considered better solutions to this problem. Brian mentioned a few (faceting, two plots focusing on different sets of values). One other option that people too often overlook, particularly for barcharts, is to make a table:
Looking at the actual values, the 500 doesn't obscure the differences in the other values! For some reason tables don't get enough respect as data a visualization technique. You might object that your data has many, many categories which becomes unwieldy in a table. If so, it's likely that your bar chart will have too many bars to be sensible as well.
And I'm not arguing for tables all the time. But they are definitely something to consider if you are making barcharts with relatively few bars. And if you're making barcharts with tons of bars, you might need to rethink that anyway.
Finally, there is also the axis.break
function in the plotrix
package which implements broken axes. However, from what I gather you'll have to specify the axis labels and positions yourself, by hand.
Scatter plot in ggplot, one numeric variable across two groups
You've over-tidied. Tidying data isn't just the mechanism of making it as long as possible, its making it as wide as necessary..
For example, if you had location as X and Y for animal sightings you wouldn't have two rows, one with a "label" column containing "X" and the X coordinate in a "value" column and another with "Y" in the "label" column and the Y coordinate in the "value" column - unless you really where storing the data in a key-value store but that's another story...
Widen your data and put the test scores for male and female into test_core_male
and test_score_female
, then they are the x and y aesthetics for your scatter plot.
Linear Regression in ggplot2
You can use geom_smooth
for the regression line and geom_text
for the labels.
ggplot(df_TB_d, aes(x=Bevoelkerung, y=Studierende)) +
geom_text(aes(label = Land)) +
geom_smooth(method = "lm", se = FALSE)
Result:
Plotting data and manipulating y-axis
The scale you are drawing is not linear (the difference between 0 and 1 is not equal to the difference between 1 and 10, but the lines are equally far apart).
Therefore, you need to transform your data. In your case, you are looking for a log10
transform, since the distance between 0.1 and 1 on a log-scale equals the distance between 1 and 10 (note that 0 is not valid on a log-scale):
ggplot(data, aes(x, y)) + geom_point() + scale_y_log10()
Note that scale_y_log10
is the same as scale_y_continuous(trans = "log10")
.
This transforms your points to log-scale, while preserving the y-axis labels to be on the original scale.
Compare with
ggplot(data, aes(x, log(y))) + geom_point()
which transforms your points on a log-scale and also transforms the y-axis labels.
Related Topics
Align Edges of Ggplot Choropleth (Legend Title Varies)
When Writing My Own R Package, I Can't Seem to Get Other Packages to Import Correctly
How to Increase the Space Between Grouped Bars in Ggplot2
Warning: Unable to Access Index for Repository Https://Www.Stats.Ox.Ac.Uk/Pub/Rwin/Src/Contrib:
Plotting Continuous and Discrete Series in Ggplot with Facet
Error in Eval(Expr, Envir, Enclos) - Contradiction
Different Font Faces and Sizes Within Label Text Entries in Ggplot2
Plot Only a Select Few Facets in Facet_Grid
Rotate Labels in a Chorddiagram (R Circlize)
Shiny R - Download the Result of a Table
Rcpp Function Calling Another Rcpp Function
S4 Classes: Multiple Types Per Slot
Access Data Frame Column Using Variable
Ggplot2 Time Series Plotting: How to Omit Periods When There Is No Data Points