﻿ Plotly Regression Line R - ITCodar

# Plotly Regression Line R

## R plotly(): Adding regression line to a correlation scatter plot

I don't think there's a ready function like ggscatter, most likely you have to do it manually, like first fitting the linear model and adding the values to the data.frame.

``set.seed(111)df.dataCorrelation = data.frame(prod1=runif(50,20,60))df.dataCorrelation\$prod2 = df.dataCorrelation\$prod1 + rnorm(50,10,5)fit = lm(prod2 ~ prod1,data=df.dataCorrelation)fitdata = data.frame(prod1=20:60)prediction = predict(fit,fitdata,se.fit=TRUE)fitdata\$fitted = prediction\$fit``

The upper and lower bounds of the line are simply 1.96* standard error of prediction:

``fitdata\$ymin = fitdata\$fitted - 1.96*prediction\$se.fitfitdata\$ymax = fitdata\$fitted + 1.96*prediction\$se.fit``

We calculate correlation:

``COR = cor.test(df.dataCorrelation\$prod1,df.dataCorrelation\$prod2)[c("estimate","p.value")]COR_text = paste(c("R=","p="),signif(as.numeric(COR,3),3),collapse=" ")``

And put it into plotly:

``library(plotly)df.dataCorrelation %>%plot_ly(x = ~prod1) %>%add_markers(x=~prod1, y = ~prod2) %>%add_trace(data=fitdata,x= ~prod1, y = ~fitted, mode = "lines",type="scatter",line=list(color="#8d93ab")) %>%add_ribbons(data=fitdata, ymin = ~ ymin, ymax = ~ ymax,line=list(color="#F1F3F8E6"),fillcolor ="#F1F3F880" ) %>%layout(    showlegend = F,    annotations = list(x = 50, y = 50,    text = COR_text,showarrow =FALSE))``

## R Plotly - Plotting Multiple Regression Lines

Try this:

``library(plotly)df <-  as.data.frame(1:19)df\$CATEGORY <- c("C","C","A","A","A","B","B","A","B","B","A","C","B","B","A","B","C","B","B")df\$x <- c(126,40,12,42,17,150,54,35,21,71,52,115,52,40,22,73,98,35,196)df\$y <- c(92,62,4,23,60,60,49,41,50,76,52,24,9,78,71,25,21,22,25)df[,1] <- NULLdf\$fv <- df %>%  filter(!is.na(x)) %>%  lm(y ~ x*CATEGORY,.) %>%  fitted.values()p <- plot_ly(data = df,         x = ~x,         y = ~y,         color = ~CATEGORY,         type = "scatter",         mode = "markers") %>%  add_trace(x = ~x, y = ~fv, mode = "lines")p``

## R Plotly how to remove the grouping on the added linear regression line

@AmadeusNing you are basically there. Plotly lets you "configure" every trace and the legend that comes with it (including having it combined with other traces).
You can easily fix this with the showlegend parameter in the trace call. To remove the legend: showlegend = FALSE.

I do not have your data, so I revert back to the classical mpg dataset and define a simple linear regression for the line. I also stress the line width for presentation purposes in my example graph.

``    # define a regression line    lr <- lm(hwy ~ displ, mpg)    # draw basis plotly plot, create grouping by setting cyl as factor    fig1 <- plot_ly(data = mpg, x = ~displ, y = ~hwy, color = ~as.factor(cyl)                    , type = "scatter", mode = "markers")    # add regression line    fig1 <- fig1 %>%        add_lines(data=mpg %>% group_by(cyl), x = ~displ, y = fitted(lr)                 , line = list(width = 2, dash = "dot", color="red")           #----------------- remove legend for line - comment out to see it displayed                 , showlegend = FALSE     )``

## Plotly - how to add multiple regressions for each color using R?

Your `lm` object did not fit model by groups(I guess it's `continent`)

If your purpose is to plot the points and regression lines,

you may try using `ggplot2`.

As you did not provide your data, I use `iris` as an example.

``library(dplyr)library(ggplot2)iris %>%  ggplot(aes(Sepal.Width, Sepal.Length, group = Species, color = Species)) +  geom_smooth(method = "lm") +  geom_point()``

## Plotly: How to embed data like regression results into legend?

With your setup and some synthetic data you can retrieve px OLS estimates using:

``model = px.get_trendline_results(fig)alpha = model.iloc[0]["px_fit_results"].params[0]beta = model.iloc[0]["px_fit_results"].params[1]``

And then include those findings in your legend and make the necessary layout adjustments direclty using:

``fig.data[0].name = 'observations'fig.data[0].showlegend = Truefig.data[1].name = fig.data[1].name  + ' y = ' + str(round(alpha, 2)) + ' + ' + str(round(beta, 2)) + 'x'fig.data[1].showlegend = True``

### Edit: R-squared

Following up on your comment, I'll show you how to include other values of interest from the regression analysis. However, it does not make as much sense anymore to keep including estimates in the legend. Nevertheless, that's exactly what the following addition does:

``rsq = model.iloc[0]["px_fit_results"].rsquaredfig.add_trace(go.Scatter(x=[100], y=[100],                         name = "R-squared" + ' = ' + str(round(rsq, 2)),                         showlegend=True,                         mode='markers',                         marker=dict(color='rgba(0,0,0,0)')                         ))``

### Complete code with synthetic data:

``import plotly.graph_objects as goimport plotly.express as pximport statsmodels.api as smimport pandas as pdimport numpy as npimport datetime# datanp.random.seed(123)numdays=20X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()df_linear = pd.DataFrame({'Days_ct': X, 'Conf_ct':Y})#Ploting the graphfig = px.scatter(df_linear, x="Days_ct", y="Conf_ct", trendline="ols")fig.update_traces(name = "OLS trendline")fig.update_layout(template="ggplot2",title_text = '<b>Linear Regression Model</b>',                  font=dict(family="Arial, Balto, Courier New, Droid Sans",color='black'), showlegend=True)fig.update_layout(    legend=dict(        x=0.01,        y=.98,        traceorder="normal",        font=dict(            family="sans-serif",            size=12,            color="Black"        ),        bgcolor="LightSteelBlue",        bordercolor="dimgray",        borderwidth=2    ))# retrieve model estimatesmodel = px.get_trendline_results(fig)alpha = model.iloc[0]["px_fit_results"].params[0]beta = model.iloc[0]["px_fit_results"].params[1]# restyle figurefig.data[0].name = 'observations'fig.data[0].showlegend = Truefig.data[1].name = fig.data[1].name  + ' y = ' + str(round(alpha, 2)) + ' + ' + str(round(beta, 2)) + 'x'fig.data[1].showlegend = True# addition for r-squaredrsq = model.iloc[0]["px_fit_results"].rsquaredfig.add_trace(go.Scatter(x=[100], y=[100],                         name = "R-squared" + ' = ' + str(round(rsq, 2)),                         showlegend=True,                         mode='markers',                         marker=dict(color='rgba(0,0,0,0)')                         ))fig.show()``

## Plotly: How to plot a regression line using plotly and plotly express?

### Update 1:

Now that Plotly Express handles data of both long and wide format (the latter in your case) like a breeze, the only thing you need to plot a regression line is:

``fig = px.scatter(df, x='X', y='Y', trendline="ols")``

Complete code snippet for wide data at the end of the question

If you'd like the regression line to stand out, you can specify `trendline_color_override` in:

``fig = `px.scatter([...], trendline_color_override = 'red') ``

Or include the line color after building your figure through:

``fig.data[1].line.color = 'red'``

You can access regression parameters like `alpha` and beta `through`:

``model = px.get_trendline_results(fig)alpha = model.iloc[0]["px_fit_results"].params[0]beta = model.iloc[0]["px_fit_results"].params[1]``

And you can even request a non-linear fit through:

``fig = px.scatter(df, x='X', y='Y', trendline="lowess")``

And what about those long formats? That's where Plotly Express reveals some of its real powers. If you take the built-in dataset `px.data.gapminder` as an example, you can trigger individual lines for an array of countries by specifying `color="continent"`:

### Complete snippet for long format

``import plotly.express as pxdf = px.data.gapminder().query("year == 2007")fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")fig.show()``

And if you'd like even more flexibility with regards to model choice and output, you can always resort to my original answer to this post below. But first, here's a complete snippet for those examples at the start of my updated answer:

### Complete snippet for wide data

``import plotly.graph_objects as goimport plotly.express as pximport statsmodels.api as smimport pandas as pdimport numpy as npimport datetime# datanp.random.seed(123)numdays=20X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()df = pd.DataFrame({'X': X, 'Y':Y})# figure with regression# fig = px.scatter(df, x='X', y='Y', trendline="ols")fig = px.scatter(df, x='X', y='Y', trendline="lowess")# make the regression line stand outfig.data[1].line.color = 'red'# plotly figure layoutfig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')fig.show()``

For regression analysis I like to use `statsmodels.api` or `sklearn.linear_model`. I also like to organize both the data and regression results in a pandas dataframe. Here's one way to do what you're looking for in a clean and organized way:

Plot using sklearn or statsmodels:

Code using sklearn:

``from sklearn.linear_model import LinearRegressionimport plotly.graph_objects as goimport pandas as pdimport numpy as npimport datetime# datanp.random.seed(123)numdays=20X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()df = pd.DataFrame({'X': X, 'Y':Y})# regressionreg = LinearRegression().fit(np.vstack(df['X']), Y)df['bestfit'] = reg.predict(np.vstack(df['X']))# plotly figure setupfig=go.Figure()fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))# plotly figure layoutfig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')fig.show()``

Code using statsmodels:

``import plotly.graph_objects as goimport statsmodels.api as smimport pandas as pdimport numpy as npimport datetime# datanp.random.seed(123)numdays=20X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()df = pd.DataFrame({'X': X, 'Y':Y})# regressiondf['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues# plotly figure setupfig=go.Figure()fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))# plotly figure layoutfig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')fig.show()``

## R Plotly - Plotting Multiple Polynomial Regression Lines

``df %>%   filter(!is.na(x)) %>%   lm(y ~ poly(x,2, raw=TRUE)*CATEGORY, data=.) %>%   fitted.values()``
• df1 is not defined in your sample code, so assuming df here.
• Use the data=. to reference the data source.
• Inside the poly function define the power, in this case 2.
• Move CATEGORY outside the poly function. x* CATEGORY + x^2* CATEGORY etc.

## Plotly in R - Diagonal AB line

A line shape could be used to achive this:

``library(plotly)fig <- plot_ly(data = iris, x = ~Sepal.Length, y = ~Petal.Length)fig %>%  layout(shapes = list(list(    type = "line",     x0 = 0,     x1 = ~max(Sepal.Length, Petal.Length),     xref = "x",    y0 = 0,     y1 = ~max(Sepal.Length, Petal.Length),    yref = "y",    line = list(color = "black")  )))``

Btw. via `xref = "paper"` we don't need to specify start and end points for the line, however the line is no longer aligned with the axes.