Conditional Merge/Replacement in R

Conditional merge/replacement, by multiple columns

I'm not sure if you would consider this 'smarter', but here is a way to do it with just one join call:

library(dplyr)

left_join(df1, df2, by = c('x1', 'x2')) %>%
mutate(x3 = if_else(is.na(x3.y), x3.x, x3.y)) %>%
select(-x3.y, -x3.x)

x1 x2 x3
1 1 a xx
2 1 b b
3 2 a c
4 2 b zz

Merge Row into one with condition and replace value in one row with value in the other R

A data.table option

setDT(df)[
,
c(
lapply(
setNames(.(A, B), c("A", "B")),
function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
),
.(D = D)
),
C
][
,
lapply(.SD, function(x) toString(unique(x))),
C
][,
.SD,
.SDcols = names(df)
]

gives

   A     B        C              D
1: X apple december Winter, Summer
2: Z apple june Winter, Summer
3: U pear march Summer

Data

> dput(df)
structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
"pear", "apple", "pear", "pear"), C = c("december", "december",
"june", "june", "march"), D = c("Winter", "Summer", "Winter",
"Summer", "Summer")), class = "data.frame", row.names = c(NA,
-5L))

How to conditionally replace R data.table columns upon merge?

We can use the on based approach

dt1[dt2, column1 := i.column1, on = .(index_column)]
dt1
# index_column column1 column2
#1: 12 dog 482
#2: 17 cat 391
#3: 29 penguin 567
#4: 34 elephant 182
#5: 46 bird 121

Conditionally merge rows

Here is how we could do it:

Credits to MartinGal for the regex "(?<=[A-Z])[A-Z]+") (upvote!)

  1. Replace empty values with NA
  2. Use lead to move rows up in X3 conditional on NA else not
  3. filter if is not NA in X1
  4. Extract the important information with str_extract and regex "(?<=[A-Z])[A-Z]+" -> combine this info with column X2 with str_c and finally coalesce both.
  5. Remove the string to keep relevant one with regex and str_remove
library(dyplr)
library(stringr)

df %>%
mutate(across(everything(), ~sub("^\\s*$", NA, .)),
X3= ifelse(is.na(X3), lead(X2), X3)) %>%
filter(!is.na(X1)) %>%
mutate(X2 = coalesce(str_c(X2," ", str_extract(X3, "(?<=[A-Z])[A-Z]+")), X2),
X3 = str_remove_all(X3, "(?<=[A-Z])[A-Z]+"))

Output:

   X1                  X2    X3
1 111 House M. Bab A
2 2 House M. Cac A - C
3 121 Street M. Bak D
4 121 House M. Aba SMITH A
5 141 Garden Harris WHITE A - B
6 141 Villa Thomas BURNEY B - D

How to merge two dataframes in R conditionally (common column, condition)

Here's how to do it with dplyr.

inner_join(X[,1:3],Y, by=c("Tab.No"))%>%
mutate(AC.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, AC.Name ,NA),
Mandal.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Mandal.Name ,NA),
Village.Name = ifelse(Survey.Date>=Survey.Start.Date & Survey.Date<=Survey.End.Date, Village.Name ,NA))%>%
group_by(Tab.No)%>%
filter(!is.na(AC.Name)|n()==1)%>%
select(Response.No,Tab.No,Survey.Date,AC.Name,Mandal.Name,Village.Name)

result

   Response.No Tab.No Survey.Date   AC.Name   Mandal.Name Village.Name
(int) (int) (date) (chr) (chr) (chr)
1 9530 1 2015-05-26 Nandigama Chanderlapadu Punnavalli
2 6702 1 2015-05-30 Nandigama Chanderlapadu Kasarabada
3 26744 1 2015-05-31 Nandigama Chanderlapadu Kasarabada
4 8925 1 2015-06-03 Nandigama Chanderlapadu Kasarabada
5 20242 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
6 21316 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
7 28056 1 2015-06-04 Nandigama Chanderlapadu Kasarabada
8 12661 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
9 17187 1 2015-06-05 Nandigama Chanderlapadu Kasarabada
10 28795 1 2015-06-05 Nandigama Chanderlapadu Kasarabada

data

X<-read.table(text="     Response.No Tab.No Survey.Date AC.Name Mandal.Name Village.Name
9530 1 2015-05-26 NA NA NA
6702 1 2015-05-30 NA NA NA
26744 1 2015-05-31 NA NA NA
8925 1 2015-06-03 NA NA NA
20242 1 2015-06-04 NA NA NA
21316 1 2015-06-04 NA NA NA
28056 1 2015-06-04 NA NA NA
12661 1 2015-06-05 NA NA NA
17187 1 2015-06-05 NA NA NA
28795 1 2015-06-05 NA NA NA
", header=T,stringsAsFactors =F)

Y<-read.table(text="AC.Name Mandal.Name Village.Name Tab.No Survey.Start.Date Survey.End.Date
Nandigama Chanderlapadu Punnavalli 1 2015-05-23 2015-05-27
Nandigama Chanderlapadu Kasarabada 1 2015-05-30 2015-06-07
Nandigama Chanderlapadu Kodavatikallu 1 2015-06-09 2015-06-28
Nandigama Chanderlapadu Thurlapadu 1 2015-06-29 2015-07-13
Nandigama Chanderlapadu Chanderlapadu 1 2015-07-14 2015-07-25
Nandigama Chanderlapadu Popuru 2 2015-05-23 2015-05-27
Nandigama Chanderlapadu Kandrapadu 2 2015-05-30 2015-06-08
Nandigama Chanderlapadu Vibhareethalapadu 3 2015-05-30 2015-06-04
Nandigama Chanderlapadu Eturu 3 2015-06-10 2015-06-23
Nandigama Chanderlapadu Bobbillapadu 3 2015-06-26 2015-07-03
", header=T,stringsAsFactors =F)

X$Survey.Date <-as.Date(X$Survey.Date)
Y$Survey.Start.Date <-as.Date(Y$Survey.Start.Date)
Y$Survey.End.Date <-as.Date(Y$Survey.End.Date)

Merging dataframes and replacing values with multiple conditions in R (1 0 NA)

I think you can get this with a simple pmax (parallel maximum). It most naturally works on matrices, not data frames. Using @R Schifini's data:

pmax(as.matrix(df1), as.matrix(df2), na.rm = T)
# d1 d2 d3
# [1,] 0 1 1
# [2,] 0 1 0
# [3,] 0 0 0
# [4,] 1 0 NA


Related Topics



Leave a reply



Submit