Regression Line for the Entire Data Set Together with Regression Lines Based on Groups

Regression line for the entire data set together with regression lines based on groups

Try placing the colour, shape, linetype aesthetics not in the original call to ggplot2

You can then add the overall line with a different colour

set.seed(1)
library(plyr)
alldata <- ddply(data.frame(group = letters[1:5], x = rnorm(50)), 'group',
mutate, y=runif(1,-1,1) * x +rnorm(10))

ggplot(alldata,aes(y = y, x = x)) +
geom_point(aes(colour = group, shape = group), size = 3, alpha = .8) +
geom_smooth(method = "lm", se = FALSE, size = 1,
aes(linetype = group, group = group)) +
geom_smooth(method = "lm", size = 1, colour = 'black', se = F) +
theme_bw()

Sample Image

100 samples of 20 from the dataset and drawing regression lines along with population regression line

Using a loop:

n=100
for(i in 1:n){
df = grades[sample(1:nrow(grades), 20),]
g = g + geom_smooth(method = lm, data=df, color="red", size=0.5, alpha = 0)
}
plot(g)

Output:

Sample Image

I encourage you to mess with the aesthetics of it, adding a dashed line for example:

Sample Image

How to make regression based on grouped rows and loop over columns?

1) Use lmList in nlme (which comes with R so you don't have to install it).

library(nlme)
regs <- lmList(cbind(y1, y2, y3) ~ x | group, dat)

giving an lmList object having a component for each group. We show the component for group a and the other groups are similar.

> regs$a

Call:
lm(formula = object, data = dat, na.action = na.action)

Coefficients:
y1 y2 y3
(Intercept) 0.2943 0.1395 0.4539
x 0.3721 -0.2206 -0.2255

2) Another approach is to perform one overall lm giving an lm object having the same coefficients as above.

lm(cbind(y1, y2, y3) ~ group + x:group + 0, dat)

3) We could also use one of several list comprehension packages. This gives a list of 9 components. The names of the components identify the combination used as does the call component (shown in the Call: line of the output) within each main component. Note t hat the current CRAN version is 0.1.0 but the code below relies on listcompr 0.1.1 which can be obtained from github until it is put on CRAN.

# install.github("patrickroocks/listcompr")
library(listcompr)
packageVersion("listcompr") # need version 0.1.1 or later

regs <- gen.named.list("{y}.{g}",
do.call("lm",
list(reformulate("x", y), quote(dat), subset = bquote(dat$group == .(g)))
), y = c("y1", "y2", "y3"), g = unique(dat$group)
)

If you don't mind that the Call: line in the output is less descriptive then it can be simplified to:

gen.named.list("{y}.{g}", lm(reformulate("x", y), dat, subset = group == g),
y = c("y1", "y2", "y3"), g = unique(dat$group))

Note

The input corrected from question which had two y2's.

set.seed(123)
dat <- data.frame(group=c(rep("a",10), rep("b",10), rep("c",10)),
x=rnorm(30), y1=rnorm(30), y2=rnorm(30), y3=rnorm(30))

ggplot2; single regression line when colour is coded for by a variable?

data.male <- read.table(header=TRUE,text="
mid_year mean_tc survey_type
2000 4 Community
2001 5 National
2002 5.1 Subnational
2003 4.3 National
2004 4.5 Community
2005 5.2 Subnational
2006 4.4 National")
  • Use aes(group=1) in the geom_smooth() specification to ignore the grouping by survey type induced by assigning the colour mapping to survey type. (Alternatively, you can put the colour mapping into geom_point() rather than the overall ggplot() specification.)
  • If you want to specify colour you need to give it as the name of a variable in your data frame (i.e., survey_type); if you want to change the name in the legend to condition you can do that in the colour scale specification (example below).
library(ggplot2); theme_set(theme_bw())
ggplot(data=data.male,aes(x=mid_year, y=mean_tc, colour=survey_type)) +
geom_point(shape=1) +
## use aes(group=1) for single regression line across groups;
## don't need to re-specify data argument
## set colour to black (from default blue) to avoid confusion
## with national (blue) points
geom_smooth(method=lm, na.rm = TRUE, fullrange= TRUE,
aes(group=1),colour="black")+
scale_colour_manual(name="condition",
values=c("red","blue","green"))
## in factor level order; probably better to
## specify 'breaks' explicitly ...
  • Out of courtesy to colour-blind people I would suggest not using primary red/green/blue as your colour specifications (try scale_colour_brewer(palette="Dark1") instead).

Sample Image



Related Topics



Leave a reply



Submit