Plotly Regression Line R

R plotly(): Adding regression line to a correlation scatter plot

I don't think there's a ready function like ggscatter, most likely you have to do it manually, like first fitting the linear model and adding the values to the data.frame.

I made a data.frame that's like your data:

set.seed(111)
df.dataCorrelation = data.frame(prod1=runif(50,20,60))
df.dataCorrelation$prod2 = df.dataCorrelation$prod1 + rnorm(50,10,5)

fit = lm(prod2 ~ prod1,data=df.dataCorrelation)
fitdata = data.frame(prod1=20:60)
prediction = predict(fit,fitdata,se.fit=TRUE)
fitdata$fitted = prediction$fit

The upper and lower bounds of the line are simply 1.96* standard error of prediction:

fitdata$ymin = fitdata$fitted - 1.96*prediction$se.fit
fitdata$ymax = fitdata$fitted + 1.96*prediction$se.fit

We calculate correlation:

COR = cor.test(df.dataCorrelation$prod1,df.dataCorrelation$prod2)[c("estimate","p.value")]
COR_text = paste(c("R=","p="),signif(as.numeric(COR,3),3),collapse=" ")

And put it into plotly:

library(plotly)

df.dataCorrelation %>%
plot_ly(x = ~prod1) %>%
add_markers(x=~prod1, y = ~prod2) %>%
add_trace(data=fitdata,x= ~prod1, y = ~fitted,
mode = "lines",type="scatter",line=list(color="#8d93ab")) %>%
add_ribbons(data=fitdata, ymin = ~ ymin, ymax = ~ ymax,
line=list(color="#F1F3F8E6"),fillcolor ="#F1F3F880" ) %>%
layout(
showlegend = F,
annotations = list(x = 50, y = 50,
text = COR_text,showarrow =FALSE)
)

Sample Image

R Plotly - Plotting Multiple Regression Lines

Try this:

library(plotly)
df <- as.data.frame(1:19)

df$CATEGORY <- c("C","C","A","A","A","B","B","A","B","B","A","C","B","B","A","B","C","B","B")
df$x <- c(126,40,12,42,17,150,54,35,21,71,52,115,52,40,22,73,98,35,196)
df$y <- c(92,62,4,23,60,60,49,41,50,76,52,24,9,78,71,25,21,22,25)

df[,1] <- NULL

df$fv <- df %>%
filter(!is.na(x)) %>%
lm(y ~ x*CATEGORY,.) %>%
fitted.values()

p <- plot_ly(data = df,
x = ~x,
y = ~y,
color = ~CATEGORY,
type = "scatter",
mode = "markers"
) %>%
add_trace(x = ~x, y = ~fv, mode = "lines")

p

Sample Image

R Plotly how to remove the grouping on the added linear regression line

@AmadeusNing you are basically there. Plotly lets you "configure" every trace and the legend that comes with it (including having it combined with other traces).
You can easily fix this with the showlegend parameter in the trace call. To remove the legend: showlegend = FALSE.

I do not have your data, so I revert back to the classical mpg dataset and define a simple linear regression for the line. I also stress the line width for presentation purposes in my example graph.

    # define a regression line
lr <- lm(hwy ~ displ, mpg)

# draw basis plotly plot, create grouping by setting cyl as factor
fig1 <- plot_ly(data = mpg, x = ~displ, y = ~hwy, color = ~as.factor(cyl)
, type = "scatter", mode = "markers")

# add regression line
fig1 <- fig1 %>%
add_lines(data=mpg %>% group_by(cyl), x = ~displ, y = fitted(lr)
, line = list(width = 2, dash = "dot", color="red")
#----------------- remove legend for line - comment out to see it displayed
, showlegend = FALSE
)

remove legend by setting showlegend = FALSE

Plotly - how to add multiple regressions for each color using R?

Your lm object did not fit model by groups(I guess it's continent)

If your purpose is to plot the points and regression lines,

you may try using ggplot2.

As you did not provide your data, I use iris as an example.

library(dplyr)
library(ggplot2)

iris %>%
ggplot(aes(Sepal.Width, Sepal.Length, group = Species, color = Species)) +
geom_smooth(method = "lm") +
geom_point()

Sample Image

Plotly: How to embed data like regression results into legend?

With your setup and some synthetic data you can retrieve px OLS estimates using:

model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]

And then include those findings in your legend and make the necessary layout adjustments direclty using:

fig.data[0].name = 'observations'
fig.data[0].showlegend = True
fig.data[1].name = fig.data[1].name + ' y = ' + str(round(alpha, 2)) + ' + ' + str(round(beta, 2)) + 'x'
fig.data[1].showlegend = True

Plot 1:

Sample Image

Edit: R-squared

Following up on your comment, I'll show you how to include other values of interest from the regression analysis. However, it does not make as much sense anymore to keep including estimates in the legend. Nevertheless, that's exactly what the following addition does:

rsq = model.iloc[0]["px_fit_results"].rsquared
fig.add_trace(go.Scatter(x=[100], y=[100],
name = "R-squared" + ' = ' + str(round(rsq, 2)),
showlegend=True,
mode='markers',
marker=dict(color='rgba(0,0,0,0)')
))

Plot 2: R-squared included in legend

Sample Image

Complete code with synthetic data:

import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()

df_linear = pd.DataFrame({'Days_ct': X, 'Conf_ct':Y})

#Ploting the graph
fig = px.scatter(df_linear, x="Days_ct", y="Conf_ct", trendline="ols")
fig.update_traces(name = "OLS trendline")

fig.update_layout(template="ggplot2",title_text = '<b>Linear Regression Model</b>',
font=dict(family="Arial, Balto, Courier New, Droid Sans",color='black'), showlegend=True)
fig.update_layout(
legend=dict(
x=0.01,
y=.98,
traceorder="normal",
font=dict(
family="sans-serif",
size=12,
color="Black"
),
bgcolor="LightSteelBlue",
bordercolor="dimgray",
borderwidth=2
))

# retrieve model estimates
model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]

# restyle figure
fig.data[0].name = 'observations'
fig.data[0].showlegend = True
fig.data[1].name = fig.data[1].name + ' y = ' + str(round(alpha, 2)) + ' + ' + str(round(beta, 2)) + 'x'
fig.data[1].showlegend = True

# addition for r-squared
rsq = model.iloc[0]["px_fit_results"].rsquared
fig.add_trace(go.Scatter(x=[100], y=[100],
name = "R-squared" + ' = ' + str(round(rsq, 2)),
showlegend=True,
mode='markers',
marker=dict(color='rgba(0,0,0,0)')
))

fig.show()

Plotly: How to plot a regression line using plotly and plotly express?

Update 1:

Now that Plotly Express handles data of both long and wide format (the latter in your case) like a breeze, the only thing you need to plot a regression line is:

fig = px.scatter(df, x='X', y='Y', trendline="ols")

Complete code snippet for wide data at the end of the question

Sample Image

If you'd like the regression line to stand out, you can specify trendline_color_override in:

fig = `px.scatter([...], trendline_color_override = 'red') 

Or include the line color after building your figure through:

fig.data[1].line.color = 'red'

Sample Image

You can access regression parameters like alpha and beta through:

model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]

And you can even request a non-linear fit through:

fig = px.scatter(df, x='X', y='Y', trendline="lowess")

Sample Image

And what about those long formats? That's where Plotly Express reveals some of its real powers. If you take the built-in dataset px.data.gapminder as an example, you can trigger individual lines for an array of countries by specifying color="continent":

Sample Image

Complete snippet for long format

import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
fig.show()

And if you'd like even more flexibility with regards to model choice and output, you can always resort to my original answer to this post below. But first, here's a complete snippet for those examples at the start of my updated answer:

Complete snippet for wide data

import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# figure with regression
# fig = px.scatter(df, x='X', y='Y', trendline="ols")
fig = px.scatter(df, x='X', y='Y', trendline="lowess")

# make the regression line stand out
fig.data[1].line.color = 'red'

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()


Original answer:

For regression analysis I like to use statsmodels.api or sklearn.linear_model. I also like to organize both the data and regression results in a pandas dataframe. Here's one way to do what you're looking for in a clean and organized way:

Plot using sklearn or statsmodels:

Sample Image

Code using sklearn:

from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# regression
reg = LinearRegression().fit(np.vstack(df['X']), Y)
df['bestfit'] = reg.predict(np.vstack(df['X']))

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

Code using statsmodels:

import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()

df = pd.DataFrame({'X': X, 'Y':Y})

# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

R Plotly - Plotting Multiple Polynomial Regression Lines

Your formula is incorrect. Try:

df %>%
filter(!is.na(x)) %>%
lm(y ~ poly(x,2, raw=TRUE)*CATEGORY, data=.) %>%
fitted.values()
  • df1 is not defined in your sample code, so assuming df here.
  • Use the data=. to reference the data source.
  • Inside the poly function define the power, in this case 2.
  • Move CATEGORY outside the poly function. x* CATEGORY + x^2* CATEGORY etc.

Plotly in R - Diagonal AB line

A line shape could be used to achive this:

library(plotly)

fig <- plot_ly(data = iris, x = ~Sepal.Length, y = ~Petal.Length)
fig %>%
layout(shapes = list(list(
type = "line",
x0 = 0,
x1 = ~max(Sepal.Length, Petal.Length),
xref = "x",
y0 = 0,
y1 = ~max(Sepal.Length, Petal.Length),
yref = "y",
line = list(color = "black")
)))

result

Also see this related answer.

Btw. via xref = "paper" we don't need to specify start and end points for the line, however the line is no longer aligned with the axes.



Related Topics



Leave a reply



Submit