Update/Replace Values in Dataframe with Tidyverse Join

Picking up Alistaire's and Nettle's suggestions and transforming into a working solution

df1 %>% 
  left_join(lookup_df, by = "state_abbrev") %>% 
  mutate(state_name = coalesce(state_name.x, state_name.y)) %>% 
  select(-state_name.x, -state_name.y)

# A tibble: 10 x 3
   state_abbrev value state_name 
   <chr>        <int> <chr>      
 1 AL             671 Alabama    
 2 AK             501 Alaska     
 3 AZ            1030 Arizona    
 4 AR             694 Arkansas   
 5 CA             881 California 
 6 CO             821 Colorado   
 7 CT             742 Connecticut
 8 DE             665 Delaware   
 9 FL             948 Florida    
10 GA             790 Georgia

The OP has stated to prefer a "tidyverse" solution. However, update joins are already available with the data.table package:

library(data.table)
setDT(df1)[setDT(lookup_df), on = "state_abbrev", state_name := i.state_name]
df1

    state_abbrev  state_name value
 1:           AL     Alabama  1103
 2:           AK      Alaska  1036
 3:           AZ     Arizona   811
 4:           AR    Arkansas   604
 5:           CA  California   868
 6:           CO    Colorado  1129
 7:           CT Connecticut   819
 8:           DE    Delaware  1194
 9:           FL     Florida   888
10:           GA     Georgia   501

Benchmark

library(bench)
bm <- press(
  na_share = c(0.1, 0.5, 0.9),
  n_row = length(state.abb) * 2 * c(1, 100, 10000),
  {
    n_na <- na_share * length(state.abb)
    set.seed(1)
    na_idx <- sample(length(state.abb), n_na)
    tmp <- data.table(state_abbrev = state.abb, state_name = state.name)
    lookup_df <-tmp[na_idx] 
    tmp[na_idx, state_name := NA]
    df0 <- as_tibble(tmp[sample(length(state.abb), n_row, TRUE)])
    mark(
      dplyr = {
        df1 <- copy(df0)
        df1 <- df1 %>% 
          left_join(lookup_df, by = "state_abbrev") %>% 
          mutate(state_name = coalesce(state_name.x, state_name.y)) %>% 
          select(-state_name.x, -state_name.y)
        df1
      },
      upd_join = {
        df1 <- copy(df0)
        setDT(df1)[setDT(lookup_df), on = "state_abbrev", state_name := i.state_name]
        df1
      }
    )
  }
)
ggplot2::autoplot(bm)

Sample Image

data.table's upate join is always faster (note the log time scale).

As the update join modifies the data object, a fresh copy is used for each benchmark run.

Replace a subset of a data frame with dplyr join operations

What you describe is a join operation in which you update some values in the original dataset. This is very easy to do with great performance using data.table because of its fast joins and update-by-reference concept (:=).

Here's an example for your toy data:

library(data.table)
setDT(df)             # convert to data.table without copy
setDT(sub_df)         # convert to data.table without copy

# join and update "df" by reference, i.e. without copy 
df[sub_df, on = c("id", "animal"), weight := i.weight]

The data is now updated:

#   id animal weight
#1:  1    dog   23.0
#2:  2    cat    2.2
#3:  3   duck    1.2
#4:  4  fairy    0.2
#5:  5  snake    1.3

You can use setDF to switch back to ordinary data.frame.

Overwrite a specific value in a dataframe, based on matching values

Using mutate and if_else:

library(tidyverse)

df %>% 
mutate(value = if_else(country %in% df2$country & year %in% df2$year, df2$value, value))

Results in:

country year value
1     AUS 2019   100
2     USA 2019   120
3     AUS 2018   500

Overwrite left_join dplyr to update data

You could do something along the lines of

> x %>%
    left_join(y = y, by = c("name", "location")) %>%
    within(., val1.x <- ifelse(!is.na(val1.y), val1.y, val1.x)) %>%
    select(-val1.y)
# # A tibble: 6 x 5
#   name   location val1.x  val2  val3
#   <chr>     <dbl>  <dbl> <int> <int>
# 1 hans          1     10     1     1
# 2 dieter        1      2     2     2
# 3 bohlen        1      3     3     3
# 4 hans          2     10     4     4
# 5 dieter        2     10     5     5
# 6 alf           3      6     6     6

and then rename val1.x.

R::dplyr combine tibbles on matching columns and replace NA values

library(tidyverse)
x <- tibble(name = c("hans", "dieter", "bohlen", "alf"), 
            location = c(NA,NA,1,2), 
            val1 = 1:4)
y <- tibble(name = c("hans", "dieter"), 
            location = c(2,2), 
            val1 = 1:2)


out <- tibble(name = c("hans", "dieter", "bohlen", "alf"), 
              location = c(2,2,1,2), 
              val1 = 1:4)

rows_update(x = x, y = y, by = "name")
#> # A tibble: 4 x 3
#>   name   location  val1
#>   <chr>     <dbl> <int>
#> 1 hans          2     1
#> 2 dieter        2     2
#> 3 bohlen        1     3
#> 4 alf           2     4

^{Created on 2021-09-28 by the reprex package (v2.0.1)}

replace values throughout a tibble

You can use mutate_all with replace:

df = data.frame(x = c(1.2, 0.4, NA, 0.6), y = c(NA, 0.3, 0.992, 0.5))
df %>% mutate_all(~ replace(., . > 0.99 | is.na(.), 0))
#    x   y
#1 0.0 0.0
#2 0.4 0.3
#3 0.0 0.0
#4 0.6 0.5

Or use funs:

df %>% mutate_all(funs(replace(., . > 0.99 | is.na(.), 0)))

If you just need to integrate this as part of the pipe, replace method should work:

df %>% replace(. > 0.99 | is.na(.), 0)

Replace a part of dataframe with new data

In my opinion, the data.table-package is better suited for such a task. Using:

# create a vector with names from 'data2' that are not used to join by
nms <- names(data2)[-1]

# load the 'data.table'-package
library(data.table)

# convert the dataframes to data,table's
setDT(data1)
setDT(data2)

# join and update the column in 'data1' with the matching values from 'data2'
data1[data2, on = 'date', (nms) := mget(paste0('i.',nms))][]

gives:

          date  a  b
 1: 2017-11-02 21 11
 2: 2017-11-03 22 12
 3: 2017-11-04 23 13
 4: 2017-11-05 24 14
 5: 2017-11-06 25 15
 6: 2017-11-07 26 16
 7: 2017-11-08 27 17
 8: 2017-11-09 28 18
 9: 2017-11-10  9 19
10: 2017-11-11 10 20

What this does:

With setDT(data1) you convert the dataframes/tibbles to a data.table.
With data1[data2, on = 'date'] you can do a join the data.table-way.
By adding (nms) := mget(paste0('i.',nms)) to the join, you tell data.table to update the columns in data1 with the columns that are also present in data2 only where the dates match.

As an alternative approach you could also reshape both datasets into long format and then do the join:

library(data.table)
melt(data1, id = 'date')[melt(data2, id = 'date')
                         , on = .(date, variable)
                         , value := i.value
                         ][, dcast(.SD, date ~ variable)]

A translation of this approach to the tidyverse:

library(dplyr)
library(tidyr)

gather(data1, key, value, -1) %>% 
  left_join(., gather(data2, key, value, -1), by = c('date','key')) %>% 
  mutate(value.x = ifelse(!is.na(value.y), value.y, value.x)) %>% 
  select(date, key, value = value.x) %>% 
  spread(key, value)

Both will give you the same output.

Used data:

data1 <- data.frame(date = as.Date("2017-11-1") + c(1:10), a = 1:10, b = 11:20)
data2 <- data.frame(date = as.Date("2017-11-1") + c(1:8), a = 21:28)

R: Updating a data frame with another data frame

merge then aggregate:

aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )

#  Index B C A
#1     1 1 1 1
#2     2 2 2 2
#3     3 3 3 3
#4     4 4 5 4
#5     5 4 5 5
#6     6 4 5 6

Or in dplyr speak:

df1 %>% 
    full_join(df2) %>%
    group_by(Index) %>%
    summarise_each(funs(na.omit))

#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
#  Index     A     B     C
#  (dbl) (int) (dbl) (dbl)
#1     1     1     1     1
#2     2     2     2     2
#3     3     3     3     3
#4     4     4     4     5
#5     5     5     4     5
#6     6     6     4     5

Update/Replace Values in Dataframe with Tidyverse Join