Replace Values in Data Frame Based on Other Data Frame in R

Replace values in data frame based on other data frame in R

Use match:

userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]

Replace values in a dataframe by values of other dataframe

Call first sample df old_df, call second new_df. It sounds like essentially you want to update rows in new_df with values from old_df, retaining all non-matching rows in new_df:

library(dplyr)
new_df %>% rows_update(old_df, by = "ID")

Gives:

# A tibble: 9 x 5
ID a b c d
<dbl> <chr> <dbl> <chr> <dbl>
1 1 hi 1 ri 2
2 2 ho 1 ro 2
3 3 NA NA NA NA
4 4 hu 1 ru 2
5 5 ha 1 NA NA
6 6 NA NA NA NA
7 7 he 1 re 2
8 10 hii 1 NA NA
9 11 hoo 1 roo 2

R How can I replace values with values of another dataframe?

sample data:

df = data.frame(a = c(1,1,2,3,3,3), b = rep('val1', 6), c = rep('val2', 6))
df

# a b c
# 1 1 val1 val2
# 2 1 val1 val2
# 3 2 val1 val2
# 4 3 val1 val2
# 5 3 val1 val2
# 6 3 val1 val2

using dplyr's recode(), you can achieve this:

df %>% mutate(a = recode(a, '1' = 'cat', '2' = 'dog', '3' = 'rabbit'))

# a b c
# 1 cat val1 val2
# 2 cat val1 val2
# 3 dog val1 val2
# 4 rabbit val1 val2
# 5 rabbit val1 val2
# 6 rabbit val1 val2

Replace values in one column based on part of text in another dataframe in R

This seems to be a case for fuzzy_join with regex_left_join. After the regex_left_join, coalecse the columns together so that it will return the first non-NA element per each row

library(fuzzyjoin)
library(dplyr)
regex_left_join(df1, df2, by = 'Supplier') %>%
transmute(Supplier = coalesce(New_Supplier, Supplier.x), Value)

-output

 Supplier Value
1 AAA 100
2 Red 200
3 Red 300
4 DDD 400
5 Blue 200
6 Blue 100
7 Green 200
8 HHH 40
9 III 150
10 JJJ 70

Replace Dataframe column with another dataframe based on conditions - R

I think you can use the following solution:

library(dplyr)

df1 %>%
left_join(df2, by = c("ID1", "ID2")) %>%
mutate(VALUE1.x = ifelse(ID1 == 5 & ID2 < 100, VALUE1.y, VALUE1.x)) %>%
select(-VALUE1.y) %>%
rename_with(~ sub("\\.x", "", .), contains(".x"))

ID1 ID2 VALUE1 NAME SURNAME
1 1 10 100 Juan perez
2 2 20 200 Rodrigo jones
3 3 30 300 Pedro bla
4 4 40 400 Lucas lopez
5 5 50 40 d martinez
6 5 150 100 e rodriguez
7 5 200 200 f jerez
8 4 99 40 g dieguez
9 3 10 150 x gimenez
10 5 25 200 a mendez

Replace all values in dataframe using another dataframe as key in R

An option is match the elements with the 'Cell_ID' of second dataset and use that as index to return the corresponding 'value' from 'df2'

library(dplyr)
df1 %>%
mutate(across(everything(), ~ df2$value[match(., df2$Cell_ID)]))

-output

#  Cell_ID n_1 n_2  n_3 n_4 n_5 n_6  n_7
#1 700 5 900 1000 NA NA NA NA
#2 200 5 100 400 500 700 900 1000
#3 300 5 400 500 NA NA NA NA
#4 1000 5 100 200 400 600 800 300

Or another option is to use a named vector to do the match

library(tibble)
df1 %>%
mutate(across(everything(), ~ deframe(df2)[as.character(.)]))

The base R equivalent is

df1[] <- lapply(df1, function(x) df2$value[match(x, df2$Cell_ID)])

Replace value in data frame with value from other data frame based on set of conditions

Using the data.table package:

# load the 'data.table' package
library(data.table)

# convert the data.frame's to data.table's
setDT(df1)
setDT(df2)

# update df1 by reference with a join with df2
df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean]

which gives:

> df1
ID cond block correct msec
1: rs 1 2 1 456
2: rs 1 2 0 545
3: rs 2 4 1 756
4: tr 1 2 1 654
5: tr 1 2 1 625
6: tr 2 4 0 765

Note: The above code will update df1 instead of creating a new dataframe, which is more memory-efficient.

Replace specific values based on another dataframe

You could use the join functionality of the data.table-package for this:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

which gives:

> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC

When you have many columns in both datasets, it is easier to use mget instead off typing all the column names. For the used data in the question it would look like:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]


Related Topics



Leave a reply



Submit