How to implement coalesce efficiently in R
On my machine, using Reduce
gets a 5x performance improvement:
coalesce2 <- function(...) {
Reduce(function(x, y) {
i <- which(is.na(x))
x[i] <- y[i]
x},
list(...))
}
> microbenchmark(coalesce(a,b,c),coalesce2(a,b,c))
Unit: microseconds
expr min lq median uq max neval
coalesce(a, b, c) 97.669 100.7950 102.0120 103.0505 243.438 100
coalesce2(a, b, c) 19.601 21.4055 22.8835 23.8315 45.419 100
How to use Coalesce function on a dataframe
For coalesce
to work you need NA
's and not blanks. Change the blanks to NA
and try :
library(dplyr)
df[df == ''] <- NA
df %>% mutate(RCC = coalesce(RC3, RC2, RC1))
# R1 R2 RC1 RC2 RC3 RCC
#1 15515 515 AW SSSBB KKAJDJHW KKAJDJHW
#2 5156 5156.11- FG <NA> XVVJAKWA XVVJAKWA
#3 65656 415- ZA <NA> <NA> ZA
#4 1566 1455- ZI ZXXQA <NA> ZXXQA
#5 2857 886 <NA> <NA> <NA> <NA>
#6 8888 888 CW CQAER CDDGAJJA CDDGAJJA
#7 65656 777 <NA> <NA> GGGAJTTD GGGAJTTD
#8 1566 666 <NA> KKHDY <NA> KKHDY
#9 65651 4457 <NA> TTQWW BBNMNJJI BBNMNJJI
Is there an R function that unifies multiple columns?
We can use coalesce
library(dplyr)
df <- df %>%
mutate(C = coalesce(A, B))
Iteratively dplyr::coalesce()
If columns are like something
and somthing.etc
shape,
you may try
library(dplyr)
library(stringr)
df %>%
split.default(str_remove(names(.), "\\..*")) %>%
map_df(~ coalesce(!!! .x))
a b c
<dbl> <dbl> <dbl>
1 1 2 3
2 1 2 3
3 1 2 3
Merge two columns of data table on condition
You can try fcoalesce
if you are working with data.table
> setDT(df)[, lab3 := fcoalesce(lab2, lab1)][]
lab1 lab2 lab3
1: 5 7 7
2: 8 10 10
3: NA 3 3
4: 9 NA 9
5: NA NA NA
Dealing with values that occur on the same date
An option with coalesce
which would return the first non-NA element across different columns given as argument for each row
library(dplyr)
df1 %>%
transmute(Date, A01 = coalesce(A01, A01_CD), A01_CD = NA_real_)
# Date A01 A01_CD
#1 1966/05/07 4.870000 NA
#2 1966/05/08 4.918333 NA
#3 1966/05/09 4.892000 NA
#4 1966/05/10 4.858917 NA
#5 1966/05/11 4.842000 NA
#6 1967/03/18 5.950000 NA
Or in base R
with row/column indexing
df1$A01 <- df1[-1][cbind(seq_len(nrow(df1)), max.col(!is.na(df1[-1]), 'first'))]
df1$A01
#[1] 4.870000 4.918333 4.892000 4.858917 4.842000 5.950000
data
df1 <- structure(list(Date = c("1966/05/07", "1966/05/08", "1966/05/09",
"1966/05/10", "1966/05/11", "1967/03/18"), A01 = c(4.87, 4.918333,
4.892, 4.858917, 4.842, NA), A01_CD = c(4.87, NA, 4.86, NA, NA,
5.95)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "211"))
Coalesce columns in df
coalesce()
only works on "real" missing values. In your data, "N/A"
is character, so at first you need to convert them to NA
.
library(dplyr)
df %>%
mutate(across(where(is.character), na_if, "N/A"),
TotalWarning = coalesce(Primary.Warning.Vertical,
Primary.Warning.Horizontal,
Secondary.Sensor.Warning.Vertical,
Secondary.Sensor.Warning.Horizontal))
# Primary.Warning.Vertical Primary.Warning.Horizontal Secondary.Sensor.Warning.Vertical Secondary.Sensor.Warning.Horizontal TotalWarning
# 1 <NA> 2 <NA> <NA> 2
# 2 <NA> 2 <NA> <NA> 2
# 3 <NA> 1.1 <NA> <NA> 1.1
# 4 <NA> 2 <NA> <NA> 2
# 5 <NA> 2 <NA> <NA> 2
# 6 <NA> 2 <NA> <NA> 2
# 7 <NA> 1.7 <NA> <NA> 1.7
# 8 <NA> 2 <NA> <NA> 2
# 9 <NA> 2 <NA> <NA> 2
# 10 <NA> 2 <NA> <NA> 2
Your variable names are too tedious. To simplify the code, you can also do this:
df %>%
mutate(across(where(is.character), na_if, "N/A"),
TotalWarning = do.call(coalesce, cur_data()))
Coalesce two string columns with alternating missing values to one
You may try pmax
df$c <- pmax(df$a, df$b)
df
# a b c
# 1 dog <NA> dog
# 2 mouse <NA> mouse
# 3 <NA> cat cat
# 4 bird <NA> bird
...or ifelse
:
df$c <- ifelse(is.na(df$a), df$b, df$a)
For more general solutions in cases with more than two columns, you find several ways to implement coalesce in R here.
Related Topics
How to Get Rowsums for Selected Columns in R
How to Loop Through List and Create Separate Dataframes in R
Create and Assign Multiple New Dataframe Columns in Ifelse Statement
Splitting a Large Data Frame into Smaller Segments
How to Write Ifelse Statement With Multiple Conditions in R
Convert Data from Long Format to Wide Format With Multiple Measure Columns
Annotating Text on Individual Facet in Ggplot2
Combine Legends For Color and Shape into a Single Legend
R Collapse Multiple Rows into 1 Row - Same Columns
How to Delete Rows Where All the Columns Are Zero
How to Select Variables in an R Dataframe Whose Names Contain a Particular String
R: How to Check If All Columns in a Data.Frame Are the Same
How to Change Y Axis Limits in Decimal Points in R
How to View the Source Code For a Function
Aggregating by Unique Identifier and Concatenating Related Values into a String