Converting factors to binary in R
In base R, you could use sapply()
on the levels, using ==
to check for presence and as.integer()
to coerce it to binary.
cbind(df[1:2], sapply(levels(df$c), function(x) as.integer(x == df$c)), df[4])
# a b Pink Red Rose d
# 1 1 1 0 0 1 2
# 2 2 1 1 0 0 3
# 3 3 2 0 1 0 4
But since you have a million rows, you may want to go with data.table.
library(data.table)
setDT(df)[, c(levels(df$c), "c") :=
c(lapply(levels(c), function(x) as.integer(x == c)), .(NULL))]
which gives
df
# a b d Pink Red Rose
# 1: 1 1 2 0 0 1
# 2: 2 1 3 1 0 0
# 3: 3 2 4 0 1 0
And you can reset the column order if you need to with setcolorder(df, c(1, 2, 4:6, 3))
.
R: Converting multiple binary columns into one factor variable whose factors are binary columns
I am guessing that you want to revert a "one-hot encoding" of a variable. Here is a quick way to do it.
apply(df ,1,\(x) names(which(x == "yes"))) |>
purrr::map_chr(~ifelse(length(.x) == 0, NA_character_, .x))
#+ [1] "v1" "v1" "v3" "v1" "v1" "v2" "v1" "v2" "v3" NA
A tidyverse approach would be:
df |>
mutate(ID = row_number()) |>
pivot_longer(cols = c(v1,v2,v3), names_to = "var") |>
filter(value == "yes")
##> ID var value
##> <int> <chr> <chr>
##> 1 1 v1 yes
##> 2 2 v1 yes
##> 3 3 v3 yes
##> 4 4 v1 yes
##> 5 5 v1 yes
##> 6 6 v2 yes
##> 7 7 v1 yes
##> 8 8 v2 yes
##> 9 9 v3 yes
automatically code binary variables as factors?
Below we assume that a column is regarded as binary as long as
- it is not all NA and
- aside from NAs it is made up only of numeric 0 and 1 values.
Note that a column which is entirely 0 and NA or entirely 1 and NA is regarded as binary but if that is undesirable we show how to change the code to require that binary columns have both 0 and 1.
First define a function is_binary
that defines whether a column is to be regarded as binary or not. This function can be changed if you want to change the definition of binary. In particular change 1:2 to 2 in the code below if a column must have both 0 and 1 in order to consider it as binary. Other definitions are possible if needed.
Next apply is_binary
to each column returning a logical vector ok
with one component per column that is TRUE if that column is binary or FALSE otherwise.
In the line computing the answer DF2
we apply factor
to each binary column using the argument levels = 0:1
to ensure that columns that only have 0's or only have 1's still have both levels.
No packages are used.
DF <- data.frame(a = c(0:1, NA), b = 1:3, c = NA, d = 0) # test data frame
is_binary <- function(x) {
x0 <- na.omit(x)
is.numeric(x) && length(unique(x0)) %in% 1:2 && all(x0 %in% 0:1)
}
ok <- sapply(DF, is_binary)
DF2 <- replace(DF, ok, lapply(DF[ok], factor, levels = 0:1))
str(DF2)
## 'data.frame': 3 obs. of 4 variables:
## $ a: Factor w/ 2 levels "0","1": 1 2 NA
## $ b: int 1 2 3
## $ c: logi NA NA NA
## $ d: Factor w/ 2 levels "0","1": 1 1 1
We could alternately use dplyr with is_binary
like this:
DF %>% mutate(across(where(is_binary), ~ factor(., levels = 0:1)))
How transform a factor to numeric binary variable?
dplyr approach, handy when there are more than two if-else conditions.
df <- read.table(stringsAsFactors = T, header = T, text = "Localisation
+ A
+ A
+ B
+ A
+ B
+ B")
df %>% mutate(Binom = case_when(Localisation == "A" ~ 1, #condition1
Localisation == "B" ~ 0) #condition2
)
Converting R Factors into Binary Matrix Values
Assuming dat
is your data frame:
cbind(dat, model.matrix( ~ 0 + C1, dat))
C1 C2 C3 C1A C1B
1 A 3 5 1 0
2 B 3 4 0 1
3 A 1 1 1 0
This solution works with any number of factor levels and without manually specifying column names.
If you want to exclude the column C1
, you could use this command:
cbind(dat[-1], model.matrix( ~ 0 + C1, dat))
Related Topics
Reason Behind Speed of Fread in Data.Table Package in R
How to Implement a Cleanup Routine in R Shiny
Collect All User Inputs Throughout the Shiny App
Efficiently Computing a Linear Combination of Data.Table Columns
Rounding Numbers in R to Specified Number of Digits
How to Use Subscripts in Ggplot2 Legends [R]
Ggplot2: How to Use Same Colors in Different Plots for Same Factor
Rename Multiple Columns Given Character Vectors of Column Names and Replacement
R: What Do You Call the :: and ::: Operators and How Do They Differ
Replace Empty Values with Value from Other Column in a Dataframe
Split the Title Onto Multiple Lines
Convert a Date Vector into Julian Day in R
Change Day of the Month in a Date to First Day (01)
Group by and Filter Data Management Using Dplyr
Replace a Value Na with the Value from Another Column in R
How to Plot the Survival Curve Generated by Survreg (Package Survival of R)