Converting Factors to Binary in R

Converting factors to binary in R

In base R, you could use sapply() on the levels, using == to check for presence and as.integer() to coerce it to binary.

cbind(df[1:2], sapply(levels(df$c), function(x) as.integer(x == df$c)), df[4])
# a b Pink Red Rose d
# 1 1 1 0 0 1 2
# 2 2 1 1 0 0 3
# 3 3 2 0 1 0 4

But since you have a million rows, you may want to go with data.table.

library(data.table)
setDT(df)[, c(levels(df$c), "c") :=
c(lapply(levels(c), function(x) as.integer(x == c)), .(NULL))]

which gives

df
# a b d Pink Red Rose
# 1: 1 1 2 0 0 1
# 2: 2 1 3 1 0 0
# 3: 3 2 4 0 1 0

And you can reset the column order if you need to with setcolorder(df, c(1, 2, 4:6, 3)).

R: Converting multiple binary columns into one factor variable whose factors are binary columns

I am guessing that you want to revert a "one-hot encoding" of a variable. Here is a quick way to do it.

apply(df ,1,\(x) names(which(x == "yes"))) |>
purrr::map_chr(~ifelse(length(.x) == 0, NA_character_, .x))

#+ [1] "v1" "v1" "v3" "v1" "v1" "v2" "v1" "v2" "v3" NA

A tidyverse approach would be:

df |>
mutate(ID = row_number()) |>
pivot_longer(cols = c(v1,v2,v3), names_to = "var") |>
filter(value == "yes")

##> ID var value
##> <int> <chr> <chr>
##> 1 1 v1 yes
##> 2 2 v1 yes
##> 3 3 v3 yes
##> 4 4 v1 yes
##> 5 5 v1 yes
##> 6 6 v2 yes
##> 7 7 v1 yes
##> 8 8 v2 yes
##> 9 9 v3 yes

automatically code binary variables as factors?

Below we assume that a column is regarded as binary as long as

  • it is not all NA and
  • aside from NAs it is made up only of numeric 0 and 1 values.

Note that a column which is entirely 0 and NA or entirely 1 and NA is regarded as binary but if that is undesirable we show how to change the code to require that binary columns have both 0 and 1.

First define a function is_binary that defines whether a column is to be regarded as binary or not. This function can be changed if you want to change the definition of binary. In particular change 1:2 to 2 in the code below if a column must have both 0 and 1 in order to consider it as binary. Other definitions are possible if needed.

Next apply is_binary to each column returning a logical vector ok with one component per column that is TRUE if that column is binary or FALSE otherwise.

In the line computing the answer DF2 we apply factor to each binary column using the argument levels = 0:1 to ensure that columns that only have 0's or only have 1's still have both levels.

No packages are used.

DF <- data.frame(a = c(0:1, NA), b = 1:3, c = NA, d = 0) # test data frame

is_binary <- function(x) {
x0 <- na.omit(x)
is.numeric(x) && length(unique(x0)) %in% 1:2 && all(x0 %in% 0:1)
}
ok <- sapply(DF, is_binary)
DF2 <- replace(DF, ok, lapply(DF[ok], factor, levels = 0:1))

str(DF2)
## 'data.frame': 3 obs. of 4 variables:
## $ a: Factor w/ 2 levels "0","1": 1 2 NA
## $ b: int 1 2 3
## $ c: logi NA NA NA
## $ d: Factor w/ 2 levels "0","1": 1 1 1

We could alternately use dplyr with is_binary like this:

DF %>% mutate(across(where(is_binary), ~ factor(., levels = 0:1)))

How transform a factor to numeric binary variable?

dplyr approach, handy when there are more than two if-else conditions.

df <- read.table(stringsAsFactors = T, header = T, text = "Localisation
+ A
+ A
+ B
+ A
+ B
+ B")

df %>% mutate(Binom = case_when(Localisation == "A" ~ 1, #condition1
Localisation == "B" ~ 0) #condition2
)

Converting R Factors into Binary Matrix Values

Assuming dat is your data frame:

cbind(dat, model.matrix( ~ 0 + C1, dat))

C1 C2 C3 C1A C1B
1 A 3 5 1 0
2 B 3 4 0 1
3 A 1 1 1 0

This solution works with any number of factor levels and without manually specifying column names.

If you want to exclude the column C1, you could use this command:

cbind(dat[-1], model.matrix( ~ 0 + C1, dat))


Related Topics



Leave a reply



Submit