Overwrite left_join dplyr to update data
You could do something along the lines of
> x %>%
left_join(y = y, by = c("name", "location")) %>%
within(., val1.x <- ifelse(!is.na(val1.y), val1.y, val1.x)) %>%
select(-val1.y)
# # A tibble: 6 x 5
# name location val1.x val2 val3
# <chr> <dbl> <dbl> <int> <int>
# 1 hans 1 10 1 1
# 2 dieter 1 2 2 2
# 3 bohlen 1 3 3 3
# 4 hans 2 10 4 4
# 5 dieter 2 10 5 5
# 6 alf 3 6 6 6
and then rename val1.x.
Merge data frames and overwrite values
merdat <- merge(dfrm1,dfrm2, by="Date") # seems self-documenting
# explanation for next line in text below.
merdat$Col2.y[ is.na(merdat$Col2.y) ] <- merdat$Col2.x[ is.na(merdat$Col2.y) ]
Then just rename 'merdat$Col2.y' to 'merdat$Col2' and drop 'merdat$Col2.x'.
In reply to request for more comments: One way to update only sections of a vector is to construct a logical vector for indexing and apply it using "[" to both sides of an assignment. Another way is to devise a logical vector that is only on the LHS of an assignment but then make a vector using rep()
that has the same length as sum(logical.vector)
. The goal is both instances is to have the same length (and order) for assignment as the items being replaced.
Merge R data frame or data table and overwrite values of multiple columns
You can do this by using dplyr::coalesce
, which will return the first non-missing value from vectors.
(EDIT: you can use dplyr::coalesce
directly on the data frames also, no need to create the function below. Left it there just for completeness, as a record of the original answer.)
Credit where it's due: this code is mostly from this blog post, it builds a function that will take two data frames and do what you need (taking values from the x
data frame if they are present).
coalesce_join <- function(x,
y,
by,
suffix = c(".x", ".y"),
join = dplyr::full_join, ...) {
joined <- join(x, y, by = by, suffix = suffix, ...)
# names of desired output
cols <- union(names(x), names(y))
to_coalesce <- names(joined)[!names(joined) %in% cols]
suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
# remove suffixes and deduplicate
to_coalesce <- unique(substr(
to_coalesce,
1,
nchar(to_coalesce) - nchar(suffix_used)
))
coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
joined[[paste0(.x, suffix[1])]],
joined[[paste0(.x, suffix[2])]]
))
names(coalesced) <- to_coalesce
dplyr::bind_cols(joined, coalesced)[cols]
}
join data frames and replace one column with another
There are a few cases:
If you always want the value from correctID
, just drop the ID
column from df.full
first:
df.full %>%
select(-ID) %>%
left_join(correctID, by = "value")
If correctID
isn't complete, and you only want to use it when present:
df.full %>%
left_join(correctID, by = "value") %>%
mutate(ID = coalesce(ID.y, ID.x)) %>%
select(-ID.y, -ID.x)
You can, of course, reverse that in the opposite case (only want to use correctID
when df.full$ID
is missing).
Overwrite values from selected columns and matching rows from one data frame into another, R
The experimental function rows_update
(introduced in dplyr
version 1.0) does what you want nicely:
rows_update(
df1,
## use only the columns from df2 that you want to update
## plus the joining column
select(df2, field_name, tgp, ends_with("4")),
by = "field_name"
)
# field_name A3 A4 B3 B4 tgp
# 1 a 23 56 35 13 1154
# 2 b 35 11 64 64 1200
# 3 c 2 67 87 22 758
# 4 d 5 16 70 12 900
See ?rows_update
for details. There is also rows_insert
which adds new rows, rows_upsert
which adds new rows and updates existing rows, and a couple other options.
Join two dataframes and overwrite matching rows [R]
I'd just use %in%
to test for Name
s in b that aren't present in a, and then only rbind()
those rows onto a
.
rbind(a, b[!b$Name %in% a$Name,])
# Name Value
# 1 Foo 1
# 2 Moo 2
# 3 Boo 3
# 21 Bar 12
# 31 Bat 13
Combine dataframes and overwrite values in table 1 with all values in table 2
You can remove SN
values in x
that match SN
values in y
then row bind the the two dataframes.
rbind(x[!x$SN %in% y$SN,], y)
SN Age Name
1 1 21 John
2 2 15 Dora
3 3 44 <NA>
4 4 100 B
Can I replace NAs when joining two data frames with dplyr?
coalesce
might be something you need. It fills the NA from the first vector with values from the second vector at corresponding positions:
library(dplyr)
df1 %>%
left_join(df2, by = "fruit") %>%
mutate(var2 = coalesce(var2.x, var2.y)) %>%
select(-var2.x, -var2.y)
# fruit var1 var3 var2
# 1 apples 1 NA 3
# 2 oranges 2 7 5
# 3 bananas 3 NA 6
# 4 grapes 4 8 6
Or use data.table
, which does in-place replacing:
library(data.table)
setDT(df1)[setDT(df2), on = "fruit", `:=` (var2 = i.var2, var3 = i.var3)]
df1
# fruit var1 var2 var3
# 1: apples 1 3 NA
# 2: oranges 2 5 7
# 3: bananas 3 6 NA
# 4: grapes 4 6 8
Related Topics
Saving a Data Frame as a Binary File
Generating a Very Large Matrix of String Combinations Using Combn() and Bigmemory Package
R Pheatmap: Change Annotation Colors and Prevent Graphics Window from Popping Up
How to Prevent Functions Polluting Global Namespace
Voronoi Diagram Polygons Enclosed in Geographic Borders
How to Use Aws Cli to Only Copy Files in S3 Bucket That Match a Given String Pattern
Read-Write Pipe() Communication in R
R Ggplot Boxplot: Change Y-Axis Limit
Setting Hex Bins in Ggplot2 to Same Size
R: Selecting Subset Without Copying
Plot Margin of PDF Plot Device: Y-Axis Label Falling Outside Graphics Window
Convert Begin and End Coordinates into Spatial Lines in R
Xpath to Extract Text After Br Tags in R
Rank Per Row Over Multiple Columns in R
How to Assign Your Color Scale on Raw Data in Heatmap.2()