Ggplot2: Geom_Smooth Confidence Band Does Not Extend to Edge of Graph, Even with Fullrange=True

ggplot2: geom_smooth confidence band does not extend to edge of graph, even with fullrange=TRUE

You probably need to add coord_cartesian in addition to scale_x/y_continuous. scale_x/y_continuous removes points that are outside the range of the graph, but coord_cartesian overrides this and uses all of the data, even if some of it is not visible in the graph. In your plot, the confidence band for the red points ends where the top of the band exceeds the y-range of the graph.

There's no actual "data" in the extended range of your graph, but geom_smooth treats the points it generates for plotting the confidence bands as "data" for the purposes of deciding what to plot.

Take a look at the examples below. The first plot uses only scale_x/y_continuous. The second adds coord_cartesian, but note that the confidence bands are still not plotted. In the third plot, we still use coord_cartesian, but we expand the scale_y_continuous range downward so that points in the confidence band below zero are included in the y-range. However, coord_cartesian is what determines the range that's actually plotted and also prevents points outside the range from being excluded.

I actually find this behavior confusing. I would have thought that you could just use coord_cartesian alone with the desired x and y ranges and still have the confidence bands and regression lines plotted all the way to the edges of the graph. In any case, hopefully this will get you what you're looking for.

p1 = ggplot(mtcars, aes(wt, mpg, colour=factor(am))) + 
geom_smooth(fullrange=TRUE, method="lm") +
scale_x_continuous(expand=c(0,0), limits=c(0,10)) +
scale_y_continuous(expand=c(0,0), limits=c(0,100)) +
ggtitle("scale_x/y_continuous")

p2 = ggplot(mtcars, aes(wt, mpg, colour=factor(am))) +
geom_smooth(fullrange=TRUE, method="lm") +
scale_x_continuous(expand=c(0,0), limits=c(0,10)) +
scale_y_continuous(expand=c(0,0), limits=c(0,100)) +
coord_cartesian(xlim=c(0,10), ylim=c(0,100)) +
ggtitle("Add coord_cartesian; same y-range")

p3 = ggplot(mtcars, aes(wt, mpg, colour=factor(am))) +
geom_smooth(fullrange=TRUE, method="lm") +
scale_x_continuous(expand=c(0,0), limits=c(0,10)) +
scale_y_continuous(expand=c(0,0), limits=c(-50,100)) +
coord_cartesian(xlim=c(0,10), ylim=c(0,100)) +
ggtitle("Add coord_cartesian; expanded y-range")

gridExtra::grid.arrange(p1, p2, p3)

Sample Image

ggplot: geom_smooth() lines don't extend to left edge at x=0 with log10 transformation

Try setting expand argument in y and x scale:

  scale_x_continuous(trans='log10', expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))

geom_smooth is not spanning the whole range of data

Your problem is with geom_jitter. Looking at the mpg dataset it appears there are only two years, 1999 and 2008. geom_jitter is making the range appear to be much wider than and it, but geom_smooth only draws a line in the range of the data. For example, using

ggplot(mpg, aes(year, cty)) + geom_point() + geom_smooth(method = "lm", se = TRUE, span=3, fullrange=TRUE)

gives us a plot like this instead

Sample Image

geom_jitter is jittering not just the y values (cty) but also the x values (year) which makes it appear as though the date range of the data is wider than it actually is. Since geom_smooth only interpolate inside the range, it doesn't span the whole plot like you want.

geom_smooth moves based on scale_y_continuous limits

Limits in scales() first set the values outside of the limits to missing and then calculates the geom.

Limits in coords() first calculates the geoms and then plots only the information within the limits.

See http://rpubs.com/INBOstats/zoom_in for some reproducible examples.

fullrange = TRUE ignored in stat_smooth

You have to add + xlim(0,200)!

extend geom_smooth in a single direction

In the internal workings of stat_smooth, predictdf is called to create the smoothed line. The difficulty here is : This is an S3 method not exported. It also don't take ... parameters so it is really difficult to extend it.

Here the idea is to create a new dummy classes lm_right and lm_left where we call the default lm method.

## decorate lm object with a new class lm_right
lm_right <- function(formula,data,...){
mod <- lm(formula,data)
class(mod) <- c('lm_right',class(mod))
mod
}

## decorate lm object with a new class lm_left
lm_left <- function(formula,data,...){
mod <- lm(formula,data)
class(mod) <- c('lm_left',class(mod))
mod
}

Then for each method we create a predict_df specialization where we truncate the x values in the opposite side.

predictdf.lm_right <- 
function(model, xseq, se, level){
## here the main code: truncate to x values at the right
init_range = range(model$model$x)
xseq <- xseq[xseq >=init_range[1]]
ggplot2:::predictdf.default(model, xseq[-length(xseq)], se, level)
}

Same thing for the left extension :

predictdf.lm_left <- 
function(model, xseq, se, level){
init_range = range(model$model$x)
## here the main code: truncate to x values at the left
xseq <- xseq[xseq <=init_range[2]]
ggplot2:::predictdf.default(model, xseq[-length(xseq)], se, level)
}

Finally a using example:

library(ggplot2)
library(gridExtra)
## you should set the fullrange option to a true
p1 <- ggplot(mtcars, aes(y=wt, x=mpg)) + xlim(0,50) + geom_point() +
stat_smooth(method="lm_left", fullrange=TRUE,col='green')
p2 <- ggplot(mtcars, aes(y=wt, x=mpg)) + xlim(0,50) + geom_point() +
stat_smooth(method="lm_right", fullrange=TRUE,col='red')

grid.arrange(p1,p2)

Sample Image

How to prevent line to extend across whole graph

You could use geom_segment instead of geom_abline if you want to manually define the line. If your slope is derived from the dataset you are plotting from, the easiest thing to do is use stat_smooth with method = "lm".

Here is an example with some toy data:

set.seed(16)
x = runif(100, 1, 9)
y = -8.3 + (1/1.415)*x + rnorm(100)

dat = data.frame(x, y)

Estimate intercept and slope:

coef(lm(y~x))

(Intercept) x
-8.3218990 0.7036189

First make the plot with geom_abline for comparison:

ggplot(dat, aes(x, y)) +
geom_point() +
geom_abline(intercept = -8.32, slope = 0.704) +
xlim(1, 9)

Using geom_segment instead, have to define the start and end of the line for both x and y. Make sure line is truncated between 1 and 9 on the x axis.

ggplot(dat, aes(x, y)) +
geom_point() +
geom_segment(aes(x = 1, xend = 9, y = -8.32 + .704, yend = -8.32 + .704*9)) +
xlim(1, 9)

Using stat_smooth. This will draw the line only within the range of the explanatory variable by default.

ggplot(dat, aes(x, y)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, color = "black") +
xlim(1, 9)

R : confidence interval being partially displayed with ggplot2 (using geom_smooth())

For the first three segments of the confidence interval, the top end of the range is at least partially out of bounds (the bounds being [-1, 1], not the slightly expanded range on the axes). ggplot's default behavior is to not display any object that is partially out of bounds. You can fix this by adding oob=scales::rescale_none to scale_y_continuous:

library(scales)
graph <- ggplot(df.m, aes(group=1,disciplines,value,colour=variable,shape=variable)) +
geom_point() +
geom_smooth(stat="smooth", method=loess, level=0.95) +
scale_x_discrete(name="Disciplines") +
scale_y_continuous(limits=c(-1,1), name="Measurement", oob=rescale_none)


Related Topics



Leave a reply



Submit