How to Give Color to Each Class in Scatter Plot in R

How to give color to a class in scatter plot in R?

Using

txt <- "ACTIVITY     LAT            LONG
Resting 21.14169444 70.79052778
Feeding 21.14158333 70.79313889
Resting 21.14158333 70.79313889
Walking 21.14163889 70.79266667
Walking 21.14180556 70.79222222
Sleeping 21.14180556 70.79222222"
dat <- read.table(text = txt, header = TRUE)

One option is to index into a vector of colours of length nlevels(ACTIVITY) using the ACTIVITY variable as the index.

cols <- c("red","green","blue","orange")
plot(LAT ~ LONG, data = dat, col = cols[dat$ACTIVITY], pch = 19)
legend("topleft", legend = levels(dat$ACTIVITY), col = cols, pch = 19, bty = "n")

This produces

Sample Image

To see why this works, cols is expanded to

> cols[dat$ACTIVITY]
[2] "green" "red" "green" "orange" "orange" "blue"

because ACTIVITY is a factor but stored numerically as 1,2,...,n.

Other higher-level solutions are available, so consider the ggplot2 package for simple creation of the same plot.

library("ggplot2")
plt <- ggplot(dat, aes(x = LONG, y = LAT, colour = ACTIVITY)) +
geom_point()
plt

which produces

Sample Image

How do you color points in a scatterplot by factors in one of the data's columns?

If you read ?pairs, it gives as an example a pairs(.) plot as well. In that example, it uses

     pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])

which seems fairly similar to yours. You might try to add unclass(.) to your own data, but it doesn't work. Why? Because with the default base R iris (there seems no need to download your UCI version, btw), "species" is a factor, so internally it is stored as an integer and strings referenced based on that.

str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

str(iris.uci)
# 'data.frame': 150 obs. of 5 variables:
# $ sepal.length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ sepal.width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ petal.length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ petal.width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ species : chr "setosa" "setosa" "setosa" "setosa" ...

However ... a few R versions ago, read.csv changed its default behavior from stringsAsFactors=TRUE (which had been in place for decades?) to stringsAsFactors=FALSE, so you get simple strings. If you look at what you're trying to pass to bg=, you'll see it is uninteresting:

head( c("black", "grey", "white")[iris.uci$species] )
# [1] NA NA NA NA NA NA
table( c("black", "grey", "white")[iris.uci$species], useNA = "always" )
# <NA>
# 150

That's because you are indexing your index of three colors based on strings, and since your vectors are not named, it finds nothing and returns NA for everything.

There are two ways to work around this:

  1. Name your colors:

    pairs(iris.uci[,1:4], lower.panel=NULL, cex=2, pch=21,
    bg = c(setosa="black", versicolor="grey", verginica="white")[iris.uci$species])

    Sample Image

    This has the advantage of you tightly controlling which color goes with each species.

  2. factor and then unclass:

    pairs(iris.uci[,1:4], lower.panel=NULL, cex=2, pch=21,\
    bg = c("black", "grey", "white")[unclass(factor(iris.uci$species))])

    Sample Image

    (Notice that the order of species I assigned in option 1 are different than shown here; that was intentional, to highlight that doing it this way actually reduces some functionality. There are ways to use factors and still control the color, but I find option 1's explicit color assignment to be clearer.)

(FYI, in the original iris dataset and in the iris.uci download, it's spelled virginica.)

Coloring the points by category in R

The trick is to use df_prob1$y as an index to the colors vector, c("red", "blue"). This can easily be done if the column y is coerced to a factor, since factors are coded internally as consecutive integers starting at 1. The following code uses the built-in data set iris, processed at the end of this answer.

clrs <- c("red", "blue")[factor(df_prob1$y)]
plot(df_prob1$x1, df_prob1$x2, pch = df_prob1$y, col = clrs)

Sample Image

Test data.

set.seed(1234)
df_prob1 <- subset(iris[c(1, 2, 5)], Species != "virginica")
df_prob1 <- df_prob1[sample(nrow(df_prob1), 50), ]
df_prob1[[3]] <- as.numeric(df_prob1[[3]] == "setosa")
names(df_prob1) <- c("x1", "x2", "y")

How to make a scatterplot in R with category-specific colored text labels?

A="red"; B="blue"
text( x= matrix[,1], y= matrix[,2], labels=df[,1],
col=c(A, B, "black")[ as.numeric(df[,2]) ])

Basic practice is to build a color vector and then run a selection vector through "[".

In R, how can I use different colors for each range in my scatterplot?

You could provide a colour vector if the data are already ordered.

mydata <- runif(120)
plot(mydata, col = rep(rainbow(3), each = 40))

rainbow(3) makes a colour vector of 3 colours, and rep with each = 40 makes 40 copies of each.

Colouring plot by factor in R

data<-iris
plot(data$Sepal.Length, data$Sepal.Width, col=data$Species)
legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1)

should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R.



Related Topics



Leave a reply



Submit