Labelling Points with Ggplot2 and Directlabels

Labelling points with ggplot2 and directlabels

One way to avoid overlapping (to some degree at least) would be to offset each label by an amount which is determined by the closest point to it. So for example if a point's closest neighbouring point is directly to the right of it, its label would be placed to the left, etc.

# centre and normalise variables
test$yy <- (test$y - min(test$y)) / (max(test$y) - min(test$y))
test$xx <- (test$x - min(test$x)) / (max(test$x) - min(test$x))
test$angle <- NA
for (i in 1:nrow(test)) {
dx <- test[-i, ]$xx - test[i, ]$xx
dy <- test[-i, ]$yy - test[i, ]$yy
j <- which.min(dx ^ 2 + dy ^ 2)
theta <- atan2((test[-i, ]$yy[j] - test[i, ]$yy), (test[-i, ]$xx[j] - test[i, ]$xx))
test[i, ]$angle <- theta + pi
}
sc <- 0.5
test$nudge.x <- cos(test$angle) * sc
test$nudge.y <- sin(test$angle) * sc

ggplot(test, aes(x=x, y=y)) +
geom_point(aes(colour=group)) +
geom_text(aes(x = x + nudge.x, y = y + nudge.y, label = ID), size = 3, show.legend = FALSE)

Sample Image

You can try playing around with the scaling parameter sc (the larger it is, the further away the labels will be from the points) to avoid overlapping labels. (I guess it may happen that not the same sc can be applied to all points to avoid overlaps - in that case you need to change the scaling parameter for each point maybe by defining sc using dx and dy).

How to add a label with directlabels when using multiple geoms?

Try using ggrepel as an alternative to directlabels.

(Updated approach following revised question)

Note it might be more elegant to include the average data line and label in the test data adapted for labelling. This approach requires some manual tweaking for the "Average" label.
There are other geom_text_repel() arguments not used which might allow improvement of positioning.

library(dplyr)
library(ggplot2)
library(tidyr)
library(ggrepel)

set.seed(1)
test <- tibble(year = as.factor(rep(1990:2000, 4)),
label = rep(replicate(4, paste0(sample(letters, 20), collapse = "")), each =11), #create long random labels
value = rnorm(44))
test[which(test$year==2000),]$value <- seq(0,0.1, length.out = 4) # make final values very similar

average <- test %>%
group_by(year) %>%
summarize(value = mean(value)) %>%
bind_cols(label = "average")

# initial plot with labels for lines
# For fuller description of possible arguments to repel function, see:
# https://ggrepel.slowkow.com/articles/examples.html
p <-
ggplot(test, aes(x = year, y = value, group = label, color = label)) +
geom_line() +
geom_smooth(data = average,
mapping = aes(x = year, y = value, group = label, color = label),
inherit.aes = F, col = "black") +
geom_text_repel(data = filter(test, year == 2000),
aes(label = label,
color = label),
direction = "y",
vjust = 1.6,
hjust = 0.5,
segment.size = 0.5,
segment.linetype = "solid",
box.padding = 0.4,
seed = 123) +
coord_cartesian(clip = 'off')+
scale_x_discrete(expand = expansion(mult = c(0.06, 0.0)))+
theme(legend.position = "none",
plot.margin = unit(c(5, 50, 5, 5), "mm"))

# find coordinates for last point of geom_smooth line, by inspection of ggplot_buildt

lab_avg <-
slice_tail(ggplot_build(p)$data[[2]], n = 1) %>%
mutate(label = "Average")

# plot with label for geom_smooth line
# positioning of the Average label achieved manually varying vjust and hjust,
# there is probably a better way of doing this

p1 <-
p +
geom_text_repel(data = lab_avg,
aes(x = x, y = y, label = label),
colour = "black",
direction = "y",
vjust = 3.5,
hjust = -7,
segment.size = 0.5,
segment.linetype = "solid",
segment.angle = 10,
box.padding = 0.4,
seed = 123)
p1

Sample Image

Created on 2021-08-22 by the reprex package (v2.0.0)

Initial answer to original question.

You could try with geom_text() using data from the average dataset and adjusting the location of "Average" using hjust and vjust.

Use scale_x_discrete(expand...) to create a bit of extra space for the text label.


ggplot(test, aes(x = year, y = value, group = label, color = label)) +
geom_line() +
geom_smooth(data = average,
mapping = aes(x = year, y = value, group = label, color = label),
inherit.aes = F, col = "black") +
geom_dl(aes(label = label,
color = label),
method = list(dl.combine("last.bumpup"))) +
scale_x_discrete(expand = expansion(mult = c(0.06, 0.2)))+
geom_text(data = slice_tail(average, n = 1),
aes(x = year, y = value, label = "Average"),
colour = "black",
hjust = -0.2,
vjust = 1.5)+
theme(legend.position = "none")

Sample Image

Created on 2021-08-21 by the reprex package (v2.0.0)

Selecting factor for labels (ggplot2, directlabels)

Inspecting the code of direct.label.ggplot() shows that geom_dl() is called in the end. This function expects an aesthetic mapping and a positioning method. The positioning method used by default is the return value of default.picker("ggplot"), which uses call stack examination and in your case is equivalent to calling defaultpf.ggplot("point",,,). The following works for me:

p1 <- ggplot(df, aes(x=value1, y=value2)) +
geom_point(aes(colour=group)) +
geom_dl(aes(label = id), method = defaultpf.ggplot("point",,,))
p1

Plot

(Note that you don't need to call direct.label() anymore.)

The documentation of the directlabels package is indeed a bit scarce.

Properly aligning directlabels with ggplot2

a) Is there anyway I can place labels on top of linegraphs and toward
the end
of the linegraph [...]?

For example,

direct.label(d, list('last.qp', cex=.75, hjust = 1, vjust = 0))

gives you

Sample Image

b) [...] Is there anyway I can modify the label names?

For example

d + geom_dl(
aes(label = letters[1:5][p[,1]]),
method = list('last.qp', cex=.75, hjust = 1, vjust = 0)
)

gives you

Sample Image

The mapping is

cbind(levels(p[,1]), letters[1:5])
# [,1] [,2]
# [1,] "American Gymnastics" "a"
# [2,] "American Swimmers" "b"
# [3,] "Boxing" "c"
# [4,] "European Gymnastics" "d"
# [5,] "Running" "e"

You can adjust it as you wish.

Rearanging labels of ggplot scatterplot with the direct labels library in R

From your comments, it sounds a bit more like a clustering exercise. So, let's go ahead and actually do so:

set.seed(9234970)
d <- data.frame(Name=mytable$Name,
x=mytable$Consensus.length,
y=mytable$Average.coverage)
d$kmeans <- as.factor(kmeans(d[-1],20)$cluster)
ggplot(d, aes(x, y, color=kmeans)) +
geom_point() +
theme(legend.position="bottom")

kmeans clusters
ggplot(d, aes(x, x, label=Name)) +
geom_text(aes(x,y)) +
facet_wrap(~kmeans, scales="free")

Cluster Breakout

I chose 20 clusters at random

You could also use heirarchical clustering to see a dendogram.

plot(hclust(dist(d[-3]))) # -3 drops kmeans column

I'd recommend playing around with the cluster package in general as it may provide a more useful solution to your problem.

direct.label in ggplot (scatterplot) not working

directlabels was never really intended for labelling points in a scatterplot. A lot of the time, directlabels does a reasonable job, but the new package ggrepel might be better suited to the labelling of points in scatterplots. (requires ggplot2 v2.0.0)

library(ggplot2)
library(ggrepel)

p <- ggplot(d, aes(x = ILE2, y = TE, label = CA)) +
geom_point(shape = 20, size = 5) +
geom_smooth(method = lm, se = F) +
scale_colour_hue(l = 50) +
ggtitle("Tasa de Empleo según Índice de Libertad Económica") +
labs(x = "Índice de Libertad Económica", y = "Tasa de Empleo")

p + geom_text_repel(aes(label = CA), segment.color = "black",
box.padding = unit(0.45, "lines"))

Sample Image

How to use direct.label() with simple (two variables) ggplot2 chart

Maria, you initially need to structure your data frame by "stacking data". I like to use the melt function of the reshape2 package. This will allow you to use only one geom_line.

Later you need to generate an object from ggplot2. And this object you must apply the directlabels package.

library(ggplot2)
library(directlabels)
library(tibble)
library(dplyr)
library(reshape2)
set.seed(1)
df <- tibble::tibble(number = 1:10,
var1 = runif(10)*10,
var2 = runif(10)*10)

df <- df %>%
reshape2::melt(id.vars = "number")
p <- ggplot2::ggplot(df) +
geom_line(aes(x = number, y = value, col = variable), show.legend = F) +
scale_color_manual(values = c("red", "blue"))
p

Sample Image

directlabels::direct.label(p, 'last.points') 

Sample Image

Positioning of label text in ggplot2 in R

The issue is that geom_dl uses the value of gedu in your dataset and not the fitted values computed by geom_smooth.

I tried

  1. passing stat="smooth" to geom_dl which resulted in no labels appearing on the plot

  2. adding labels at the endpoints manually using stat_smooth with geom="text" which however did not work

Therefore the only solution I could figure out is to compute the fitted values manually which can then be mapped on y in geom_dl. For computing the fittes values I

  1. tidyr::nest the df by Country and Region
  2. use purrr::map and broom::augment to do the estimation and to get the fitted values
  3. tidyr::unnest and keep only the desired columns
library(ggplot2)
library(directlabels)

library(purrr)
library(broom)
library(tidyr)
library(dplyr)

gedu5 %>%
# Manually compute the fitted values
nest(data = -c(Region, Country)) %>%
mutate(mod = map(data, ~ lm(gedu ~ poly(Year, 3), data = .x)),
mod = map2(mod, data, augment)) %>%
unnest(mod) %>%
select(Country, Region, Year, gedu, .fitted) %>%
ggplot(aes(x = Year, y = gedu, colour = Country, group = Country)) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3), se = FALSE) +
geom_dl(aes(y = .fitted, label = Country), method = list(dl.combine("first.points", "last.points"), cex = 1)) +
geom_point(stat = "identity") +
scale_colour_discrete(guide = "none") +
labs(title = "Government expenditure on education, total (% of GDP)", cex.main = 2.5) +
theme(axis.title = element_blank()) +
facet_wrap(.~Region, 2, scales="free")

Sample Image



Related Topics



Leave a reply



Submit