Left_Join Two Data Frames and Overwrite

Overwrite left_join dplyr to update data

You could do something along the lines of

> x %>%
left_join(y = y, by = c("name", "location")) %>%
within(., val1.x <- ifelse(!is.na(val1.y), val1.y, val1.x)) %>%
select(-val1.y)
# # A tibble: 6 x 5
# name location val1.x val2 val3
# <chr> <dbl> <dbl> <int> <int>
# 1 hans 1 10 1 1
# 2 dieter 1 2 2 2
# 3 bohlen 1 3 3 3
# 4 hans 2 10 4 4
# 5 dieter 2 10 5 5
# 6 alf 3 6 6 6

and then rename val1.x.

Merge data frames and overwrite values

merdat <- merge(dfrm1,dfrm2, by="Date")  # seems self-documenting

# explanation for next line in text below.
merdat$Col2.y[ is.na(merdat$Col2.y) ] <- merdat$Col2.x[ is.na(merdat$Col2.y) ]

Then just rename 'merdat$Col2.y' to 'merdat$Col2' and drop 'merdat$Col2.x'.

In reply to request for more comments: One way to update only sections of a vector is to construct a logical vector for indexing and apply it using "[" to both sides of an assignment. Another way is to devise a logical vector that is only on the LHS of an assignment but then make a vector using rep() that has the same length as sum(logical.vector). The goal is both instances is to have the same length (and order) for assignment as the items being replaced.

Merge R data frame or data table and overwrite values of multiple columns

You can do this by using dplyr::coalesce, which will return the first non-missing value from vectors.

(EDIT: you can use dplyr::coalesce directly on the data frames also, no need to create the function below. Left it there just for completeness, as a record of the original answer.)

Credit where it's due: this code is mostly from this blog post, it builds a function that will take two data frames and do what you need (taking values from the x data frame if they are present).

coalesce_join <- function(x, 
y,
by,
suffix = c(".x", ".y"),
join = dplyr::full_join, ...) {
joined <- join(x, y, by = by, suffix = suffix, ...)
# names of desired output
cols <- union(names(x), names(y))

to_coalesce <- names(joined)[!names(joined) %in% cols]
suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
# remove suffixes and deduplicate
to_coalesce <- unique(substr(
to_coalesce,
1,
nchar(to_coalesce) - nchar(suffix_used)
))

coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
joined[[paste0(.x, suffix[1])]],
joined[[paste0(.x, suffix[2])]]
))
names(coalesced) <- to_coalesce

dplyr::bind_cols(joined, coalesced)[cols]
}

join data frames and replace one column with another

There are a few cases:

If you always want the value from correctID, just drop the ID column from df.full first:

df.full %>%
select(-ID) %>%
left_join(correctID, by = "value")

If correctID isn't complete, and you only want to use it when present:

df.full %>%
left_join(correctID, by = "value") %>%
mutate(ID = coalesce(ID.y, ID.x)) %>%
select(-ID.y, -ID.x)

You can, of course, reverse that in the opposite case (only want to use correctID when df.full$ID is missing).

Overwrite values from selected columns and matching rows from one data frame into another, R

The experimental function rows_update (introduced in dplyr version 1.0) does what you want nicely:

rows_update(
df1,
## use only the columns from df2 that you want to update
## plus the joining column
select(df2, field_name, tgp, ends_with("4")),
by = "field_name"
)
# field_name A3 A4 B3 B4 tgp
# 1 a 23 56 35 13 1154
# 2 b 35 11 64 64 1200
# 3 c 2 67 87 22 758
# 4 d 5 16 70 12 900

See ?rows_update for details. There is also rows_insert which adds new rows, rows_upsert which adds new rows and updates existing rows, and a couple other options.

Join two dataframes and overwrite matching rows [R]

I'd just use %in% to test for Names in b that aren't present in a, and then only rbind() those rows onto a.

rbind(a, b[!b$Name %in% a$Name,])
# Name Value
# 1 Foo 1
# 2 Moo 2
# 3 Boo 3
# 21 Bar 12
# 31 Bat 13

Combine dataframes and overwrite values in table 1 with all values in table 2

You can remove SN values in x that match SN values in y then row bind the the two dataframes.

rbind(x[!x$SN %in% y$SN,], y) 

SN Age Name
1 1 21 John
2 2 15 Dora
3 3 44 <NA>
4 4 100 B

Can I replace NAs when joining two data frames with dplyr?

coalesce might be something you need. It fills the NA from the first vector with values from the second vector at corresponding positions:

library(dplyr)
df1 %>%
left_join(df2, by = "fruit") %>%
mutate(var2 = coalesce(var2.x, var2.y)) %>%
select(-var2.x, -var2.y)

# fruit var1 var3 var2
# 1 apples 1 NA 3
# 2 oranges 2 7 5
# 3 bananas 3 NA 6
# 4 grapes 4 8 6

Or use data.table, which does in-place replacing:

library(data.table)
setDT(df1)[setDT(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)]
df1
# fruit var1 var2 var3
# 1: apples 1 3 NA
# 2: oranges 2 5 7
# 3: bananas 3 6 NA
# 4: grapes 4 6 8


Related Topics



Leave a reply



Submit