Messy Plot When Plotting Predictions of a Polynomial Regression Using Lm() in R

Messy plot when plotting predictions of a polynomial regression using lm() in R

You need order():

P <- predict(quadratic.model)
plot(y~x)
reorder <- order(x)
lines(x[reorder], P[reorder])

My answer here is related: Problems displaying LOESS regression line and confidence interval

Plot polynomial regression curve in R

Try:

lines(sort(hp), fitted(fit)[order(hp)], col='red', type='b') 

Because your statistical units in the dataset are not ordered, thus, when you use lines it's a mess.

R: generate plot for multiple regression model with interaction between polynomial numeric predictor and factor

The predict function handles all the messy calculations with the orthogonal polynomials:

x.two <- df$x2
lines(x = sort(x.two),
y = predict(mod, newdata=data.frame(x1=factor("1"), x2=sort(x.two) ) ) ,
col="red")
lines(x = sort(x.two),
y = predict(mod, newdata=data.frame(x1=factor("2"), x2=sort(x.two) ) ) ,
col="green")
lines(sort(x.two),
predict(mod, newdata=data.frame(x1=factor("3"),x2=sort(x.two) ) ) , col="orange")

Sample Image

Problems displaying LOESS regression line and confidence interval

correct code

I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.

No, no. The ordering issue is not related to the error you see. To overcome the error, You need to replace

lines(animals$X15p5, animals.lo, col="red") 

with

lines(animals$Period, animals.lo$fitted, col="red") 

Here are reasons:

  1. loess returns a list of objects, not a single vector. See str(animals.lo) or names(animals.lo).
  2. why do you use animals$X15p5 as x-axis? You fit your model: X15p5 ~ Period, so x-axis should be Period.

about reordering

You need to do ordering, because by default, R lines up points in order. Take this as an example:

set.seed(0); x <- runif(100, 0, 10)  ## x is not in order
set.seed(1); y <- sqrt(x) ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l") ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l") ## this is nice

foo

Similarly, do:

a <- order(animals$Period)    
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)

follow-up on confidence interval

Try this:

plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)

You forgot about reordering again. You need to reorder both fitted values, as well as standard errors.

Now, the dist ~ speed model for cars data has no need for reordering. Because:

is.unsorted(cars$speed)  ## FALSE

Yes, data are already sorted there.

Note I have made two other changes to your code:

  1. I have separated loess call and predict call; Maybe you don't need to do this, but it is generally a good habit to separate model fitting and model prediction, and keeps a copy of both objects.
  2. I have changed loess(animals$X15p5 ~ animals$Period) to loess(X15p5 ~ Period, animals). It is a bad habit to use $ sign in specifying model formula. I have another answer at https://stackoverflow.com/a/37307270/4891738 showing the draw back of such style. You can read on the "update" section over there. I have used the glm as an example, but for lm, glm, loess, things are the same.

How do I change colours of confidence interval lines when using `matlines` for prediction plot?

col, lty and lwd are vectorized. You can use

R6cl <- lm(log(y) ~ x, data = R6)  ## don't use $ in formula
pR6cl <- predict(R6cl, interval = "confidence")
plot(log(y) ~ x, data = R6) ## Read `?plot.formula`
matlines(R6$x, pR6cl, lwd = 2, lty = c(1, 2, 2), col = c(1, 2, 2))

You can check the last figure in Piecewise regression with a quadratic polynomial and a straight line joining smoothly at a break point for what this code would produce.

If you are unclear why I advise against the use of $ in model formula, read Predict() - Maybe I'm not understanding it.


A side notice for other readers

OP has a dataset where x is sorted. If your x is not sorted, make sure you sort it first. See Messy plot when plotting predictions of a polynomial regression using lm() in R for more.

in R, plot a nonlinear curve

lines plots the data in whatever order it happens to be in. As a result, if you don't sort by the x-value first, you'll get a mess of lines going back and forth as the x-value jumps back and forth from one row to the next. Try this, for example:

plot(c(1,3,2,0), c(1,9,4,0), type="l", lwd=7)
lines(0:3, c(0,1,4,9), col='red', lwd=4)

To get a nice curve, sort by horsepower first:

curve.dat = data.frame(x=Auto$horsepower, y=predict(lm.fit2))
curve.dat = curve.dat[order(curve.dat$x),]

lines(curve.dat, col=4)

Sample Image

Whereas, if you don't sort by horsepower, here's what you get:

Sample Image



Related Topics



Leave a reply



Submit