how to plot a linear regression in R
As already mentioned in the comments, you do not apply the plot
function correctly. The function plots y
against x
by plot(x,y)
. The arguments "xlab" and "ylab" merely name the axes of the plot.
I think what you want to do is:
plot(yft_tuna$length, yft_tuna$weight)
This however does not plot the results of your linear regression but just the data.
EDIT:
What I guess you want to do is to plot the data and then add a regression line. This you can do by:
plot(yft_tuna$length, yft_tuna$weight)
abline(a=lm1$coefficients[1], b=lm1$coefficients[2])
how to plot the linear regression in R?
Are you looking for the predict
function?
E.g.: using lines(predict(fit))
will give:
You could also use this for predicting future data aligning with the calculated coefficients. E.g.
# plot the existing data with space for the predicted line
plot(c(cpi,rep(NA,12)),xaxt="n",ylab="CPI",xlab="",ylim=c(162,190))
# plot the future predictions as a line using the next 3 year periods
lines(13:24,
predict(
fit,
newdata=data.frame(year=rep(c(2011,2012,2013),each=4),quarter=rep(1:4,3))
)
)
year<-rep(2008:2013,each=4)
axis(1,labels=paste(year,quarter,sep="C"),at=1:24,las=3)
Plotting linear regression line of a calculation
It sounds like you want the predicted values of food
for 10 and 15 dogs per room. You can do that with predict
. First I'll turn the matrix into a dataframe to make things a little easier:
# Turn you matrix into a dataframe.
df <- data.frame(dogs = v[,1], food = v[,2])
I can then compute my model and predictions based on the model:
# Compute the linear model.
lmout <- lm(food ~ dogs, df)
# Create a dataframe with new values of `dogs`.
df_new <- data.frame(dogs = c(10, 15))
# Use `predict` with your model and the new data.
df_new$food <- predict(lmout, newdata = df_new)
#### PREDICTIONS OUTPUT ####
dogs food
1 10 8.096774
2 15 11.040323
Now I can plot the data and new data with the regression line.
plot(df$dogs, df$food, pch = 21)
abline(lmout, lty="solid", col="royalblue")
points(df_new$dogs, df_new$food, pch = 3, col = "red")
Plotting and Linear Regression models with R
The t() function is creating a matrix with dimensions greater than 2. The error is saying that you are only giving the matrix two column names, but the matrix array needs more than two.
Is there a reason you are doing header = F
? If not, then the following may work:
Current_Data <- read.csv("nst-est2018-alldata.csv", header=T,stringsAsFactors=FALSE)
head(Current_Data)
x <- c("Name", "Washington")
CTD <- subset(Current_Data, grepl(paste(x, collapse = "|"), Current_Data$NAME))
CTD
# the data are in wide format, but you seem to want them in long format
C_T_D <- stack(CTD)[-1,]
C_T_D
# looks like our columns are switched
C_T_D2 <- C_T_D[,c(2,1)]
colnames(C_T_D2) <- c("Year","Population")
#to make the data easier to work with
C_T_D2$Year <- as.numeric(str_extract_all(C_T_D2$Year, "[0-9]+"))
C_T_D2$Population <- as.numeric(C_T_D2$Population)
Put two linear regression lines into one plot
Try
ggplot(data = merge1,aes(x=intp.trust,y=confidence, group = countryname))+
geom_point(size=0.5)+
geom_smooth(method = "lm",formula = y~x)
facet_wrap
puts your plots in different panels by countryname~
.
If you want to differentiate by countryname
add color
to your aes
: aes(...,color = countryname)
.
Trying to graph different linear regression models with ggplot and equation labels
If you are regressing Y
on both X
and Z
, and these are both numerical variables (as they are in your example) then a simple linear regression represents a 2D plane in 3D space, not a line in 2D space. Adding an interaction term means that your regression represents a curved surface in a 3D space. This can be difficult to represent in a simple plot, though there are some ways to do it : the colored lines in the smoking / cycling example you show are slices through the regression plane at various (aribtrary) values of the Z variable, which is a reasonable way to display this type of model.
Although ggplot has some great shortcuts for plotting simple models, I find people often tie themselves in knots because they try to do all their modelling inside ggplot. The best thing to do when you have a more complex model to plot is work out what exactly you want to plot using the right tools for the job, then plot it with ggplot.
For example, if you make a prediction data frame for your interaction model:
model2 <- lm(Y ~ X * Z, data = hw_data)
predictions <- expand.grid(X = seq(min(hw_data$X), max(hw_data$X), length.out = 5),
Z = seq(min(hw_data$Z), max(hw_data$Z), length.out = 5))
predictions$Y <- predict(model2, newdata = predictions)
Then you can plot your interaction model very simply:
ggplot(hw_data, aes(X, Y)) +
geom_point() +
geom_line(data = predictions, aes(color = factor(Z))) +
labs(color = "Z")
You can easily work out the formula from the coefficients table and stick it together with paste
:
labs <- trimws(format(coef(model2), digits = 2))
form <- paste("Y =", labs[1], "+", labs[2], "* x +",
labs[3], "* Z + (", labs[4], " * X * Z)")
form
#> [1] "Y = -69.07 + 5.58 * x + 2.00 * Z + ( -0.13 * X * Z)"
This can be added as an annotation to your plot using geom_text
or annotation
Update
A complete solution if you wanted to have only 3 levels for Z, effectively "high", "medium" and "low", you could do something like:
library(ggplot2)
model2 <- lm(Y ~ X * Z, data = hw_data)
predictions <- expand.grid(X = quantile(hw_data$X, c(0, 0.5, 1)),
Z = quantile(hw_data$Z, c(0.1, 0.5, 0.9)))
predictions$Y <- predict(model2, newdata = predictions)
labs <- trimws(format(coef(model2), digits = 2))
form <- paste("Y =", labs[1], "+", labs[2], "* x +",
labs[3], "* Z + (", labs[4], " * X * Z)")
form <- paste(form, " R\u00B2 =",
format(summary(model2)$r.squared, digits = 2))
ggplot(hw_data, aes(X, Y)) +
geom_point() +
geom_line(data = predictions, aes(color = factor(Z))) +
geom_text(x = 15, y = 25, label = form, check_overlap = TRUE,
fontface = "italic") +
labs(color = "Z")
Related Topics
Extract Survival Probabilities in Survfit by Groups
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
How to Find the First and Last Occurrences of an Element in a Data.Frame
Plotting Average of Multiple Variables in Time-Series Using Ggplot
Datatable Is Not Printed in Combination with Cat Command in Rmd/Rstudio
Using Filtered Datatables in Shiny
Ggplot2: Add P-Values to the Plot
Porting Set Operations from R's Data Frames to Data Tables: How to Identify Duplicated Rows
Using R to Fit a Sigmoidal Curve
Delete Rows Based on Multiple Conditions with Dplyr
Function Commenting Conventions in R
Annual, Monthly or Daily Mean for Irregular Time Series
Sum Multiple Columns by Group with Tapply
Subset Data Based on Partial Match of Column Names
How Does Branch Prediction Affect Performance in R
Generating Names Iteratively in R for Storing Plots
What Is the Fastest Way to Get a Vector of Sorted Unique Values from a Data.Table