Substitute Dt1.X with Dt2.Y When Dt1.X and Dt2.X Match in R

Substitute DT1.x with DT2.y when DT1.x and DT2.x match in R

Use an update join:

dtMain[statesFile, on=.(state), state := i.stateExpan ]

The i.* prefix indicates that it's from the i table in x[i, on=, j]. It is optional here.

See ?data.table for details.

Match column and rows then replace

I have a long way of doing it with data.frames. If you are looking to code in R long term I would suggest checking out either (i) dplyr package, part of the tidyverse suite or (ii) data.table package. The first one has the most popular syntax, and is tied together nicely with a bunch of useful packages. The second is harder to learn but quicker. For your size data, this is negligible though.

In base data.frames, here is something I hope matches your request. Let me know if I've mistaken anything, or been unclear.

# sellers data eg
dt1 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 1,
Overcharging = NA)
# buyers data eg
dt2 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 2,
Overcharging = c(1,0,0,1))
# make my current data view
dt <- rbind(dt1, dt2)
dt[]

# split in to two data frames, on the Type column:
dt_split <- split(dt, dt$Type)
dt_split

# move out of list
dt_suffix <- seq_along(dt_split)
dt_names <- sprintf("dt%s", dt_suffix)
for(name in dt_names){
assign(name, dt_split[match(name, dt_names)][[1]])
}
dt1[]
dt2[]

# define the columns in which to match up the buyer to seller
merge_cols <- c("Period", "MatchGroup", "Group")
# define the columns you want to merge, that you know are NA
na_cols <- c("Overcharging")
# now use merge operation, and filter dt2, to pull in only columns you want
# I suggest dropping the na_cols first in dt1, as otherwise it will create two
# columns post-merge: Overcharging, i.Overcharging
dt1 <- dt1[,setdiff(names(dt1), na_cols)]
dt1_new <- merge(dt1,
dt2[, c(merge_cols, na_cols)], # filter dt2
by = merge_cols, # columns to match on
all.x = TRUE) # dt1 is x, dt2 is y. Want to keep all of dt1

# if you want to bind them back together, ensure the column order matches, and
# bind e.g.
dt1_new <- dt1_new[, names(dt2)]
dt_final <- rbind(dt1_new, dt2)
dt_final[]

What my line of thinking is to make these buyers and sellers data frames in to two separate ones. Then identify how they join, and migrate the data you need from buyers to sellers. Then finally bring them back together if so desired.

Why is R data.table adding columns to a another data table that I did not reference?

Yes, data.table changes its values by reference. If you'd like to retain a copy of the original, you should use copy:

library(data.table)         
DT1 <- data.table(x = 1:100)
DT2 <- DT1
identical(DT1, DT2)
#> [1] TRUE
DT1[, y := x + 1]
identical(DT1, DT2)
#> [1] TRUE
DT2 <- copy(DT1)
DT2[, y := x + 2]
identical(DT1, DT2)
#> [1] FALSE

Update data frame / table from data with similar structure

Of course we need to identify which rows for each column has NAs. There's no getting around it AFAICT. With that in mind, this is what was able to think of (a variation of @akrun's solution really):

# get DT1's matching indices for each row of DT2, handle multiple matches as well
idx = DT1[DT2, which = TRUE, on = "Categ", mult = "first"]
for (col in c("x", "y")) {
nas = which(is.na(DT2[[col]]))
this_idx = idx[nas]
set(DT2, i = nas, j = col, value = DT1[[col]][this_idx])
}

This assumes identical column names in both data tables.

Merge with replacement based on multiple non-unique columns

First merge in a way which guarantees all values from the original will be present:

merged = merge(original, update, by = c("x","y"), all.x = TRUE)

Then use dplyr to choose update's values where possible, and original's value otherwise:

library(dplyr)
middle = mutate(merged, value = ifelse(is.na(value.y), value.x, value.y))
final = select(middle, x, y, value)

R- combining two data frames by replacing common referenced values

Using data.table we can join the two data.tables and update y by reference

library(data.table)   ## version 1.9.6

## Using your original data.frame objects you would use
# dt1 <- as.data.table(df1)
# dt2 <- as.data.table(df2)

dt1 <- data.table(id = c(4,2,3,5,1,7),
y = c(12, 65, 7, 878, 1, 122))

dt2 <- data.table(id = c(2,5,1),
z = c(90, 16, 22))

dt1[ dt2, on="id", y := z ]
dt1
# id y
# 1: 4 12
# 2: 2 90
# 3: 3 7
# 4: 5 16
# 5: 1 22
# 6: 7 122

You can also specify the join column in the keys (which will work for older versions of data.table)

setkey(dt1, id)
setkey(dt2, id)

dt1[ dt2, y := z ]
dt1

data.table roll nearest left join for single best match (rest to NA)

You can try a proper left update join and assign the desired variables from dt2 explicitely

library(data.table)

set.seed(42)

timestamp <- sort(rnorm(10, mean = 1, sd = 1))

dt1 <- data.table(
id = letters[1:10],
timestamp = timestamp,
timestamp1 = timestamp,
other1 = 1:10,
other2 = 11:20
)

dt2 <- data.table(
timestamp = timestamp[c(3, 5, 8)] + 0.1,
timestamp2 = timestamp[c(3, 5, 8)] + 0.1,
other3 = c("x", "y", "z"),
other4 = c(333, 444, 555)
)

# left join: leading table on the left
dt1[dt2,
roll = "nearest",
on = "timestamp",
# assign desired values explicitely
`:=`(other3 = i.other3,
other4 = i.other4)]
dt1[]
#> id timestamp timestamp1 other1 other2 other3 other4
#> 1: a 0.4353018 0.4353018 1 11 <NA> NA
#> 2: b 0.8938755 0.8938755 2 12 <NA> NA
#> 3: c 0.9053410 0.9053410 3 13 <NA> NA
#> 4: d 0.9372859 0.9372859 4 14 x 333
#> 5: e 1.3631284 1.3631284 5 15 <NA> NA
#> 6: f 1.4042683 1.4042683 6 16 y 444
#> 7: g 1.6328626 1.6328626 7 17 <NA> NA
#> 8: h 2.3709584 2.3709584 8 18 <NA> NA
#> 9: i 2.5115220 2.5115220 9 19 z 555
#> 10: j 3.0184237 3.0184237 10 20 <NA> NA


Related Topics



Leave a reply



Submit