Substitute DT1.x with DT2.y when DT1.x and DT2.x match in R
Use an update join:
dtMain[statesFile, on=.(state), state := i.stateExpan ]
The i.*
prefix indicates that it's from the i
table in x[i, on=, j]
. It is optional here.
See ?data.table
for details.
Match column and rows then replace
I have a long way of doing it with data.frames. If you are looking to code in R long term I would suggest checking out either (i) dplyr package, part of the tidyverse suite or (ii) data.table package. The first one has the most popular syntax, and is tied together nicely with a bunch of useful packages. The second is harder to learn but quicker. For your size data, this is negligible though.
In base data.frames, here is something I hope matches your request. Let me know if I've mistaken anything, or been unclear.
# sellers data eg
dt1 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 1,
Overcharging = NA)
# buyers data eg
dt2 <- data.frame(Period = 1:4, MatchGroup = 73, Group = 1, Type = 2,
Overcharging = c(1,0,0,1))
# make my current data view
dt <- rbind(dt1, dt2)
dt[]
# split in to two data frames, on the Type column:
dt_split <- split(dt, dt$Type)
dt_split
# move out of list
dt_suffix <- seq_along(dt_split)
dt_names <- sprintf("dt%s", dt_suffix)
for(name in dt_names){
assign(name, dt_split[match(name, dt_names)][[1]])
}
dt1[]
dt2[]
# define the columns in which to match up the buyer to seller
merge_cols <- c("Period", "MatchGroup", "Group")
# define the columns you want to merge, that you know are NA
na_cols <- c("Overcharging")
# now use merge operation, and filter dt2, to pull in only columns you want
# I suggest dropping the na_cols first in dt1, as otherwise it will create two
# columns post-merge: Overcharging, i.Overcharging
dt1 <- dt1[,setdiff(names(dt1), na_cols)]
dt1_new <- merge(dt1,
dt2[, c(merge_cols, na_cols)], # filter dt2
by = merge_cols, # columns to match on
all.x = TRUE) # dt1 is x, dt2 is y. Want to keep all of dt1
# if you want to bind them back together, ensure the column order matches, and
# bind e.g.
dt1_new <- dt1_new[, names(dt2)]
dt_final <- rbind(dt1_new, dt2)
dt_final[]
What my line of thinking is to make these buyers and sellers data frames in to two separate ones. Then identify how they join, and migrate the data you need from buyers to sellers. Then finally bring them back together if so desired.
Why is R data.table adding columns to a another data table that I did not reference?
Yes, data.table
changes its values by reference. If you'd like to retain a copy of the original, you should use copy
:
library(data.table)
DT1 <- data.table(x = 1:100)
DT2 <- DT1
identical(DT1, DT2)
#> [1] TRUE
DT1[, y := x + 1]
identical(DT1, DT2)
#> [1] TRUE
DT2 <- copy(DT1)
DT2[, y := x + 2]
identical(DT1, DT2)
#> [1] FALSE
Update data frame / table from data with similar structure
Of course we need to identify which rows for each column has NAs. There's no getting around it AFAICT. With that in mind, this is what was able to think of (a variation of @akrun's solution really):
# get DT1's matching indices for each row of DT2, handle multiple matches as well
idx = DT1[DT2, which = TRUE, on = "Categ", mult = "first"]
for (col in c("x", "y")) {
nas = which(is.na(DT2[[col]]))
this_idx = idx[nas]
set(DT2, i = nas, j = col, value = DT1[[col]][this_idx])
}
This assumes identical column names in both data tables.
Merge with replacement based on multiple non-unique columns
First merge in a way which guarantees all values from the original will be present:
merged = merge(original, update, by = c("x","y"), all.x = TRUE)
Then use dplyr
to choose update
's values where possible, and original
's value otherwise:
library(dplyr)
middle = mutate(merged, value = ifelse(is.na(value.y), value.x, value.y))
final = select(middle, x, y, value)
R- combining two data frames by replacing common referenced values
Using data.table
we can join the two data.tables and update y
by reference
library(data.table) ## version 1.9.6
## Using your original data.frame objects you would use
# dt1 <- as.data.table(df1)
# dt2 <- as.data.table(df2)
dt1 <- data.table(id = c(4,2,3,5,1,7),
y = c(12, 65, 7, 878, 1, 122))
dt2 <- data.table(id = c(2,5,1),
z = c(90, 16, 22))
dt1[ dt2, on="id", y := z ]
dt1
# id y
# 1: 4 12
# 2: 2 90
# 3: 3 7
# 4: 5 16
# 5: 1 22
# 6: 7 122
You can also specify the join column in the keys
(which will work for older versions of data.table
)
setkey(dt1, id)
setkey(dt2, id)
dt1[ dt2, y := z ]
dt1
data.table roll nearest left join for single best match (rest to NA)
You can try a proper left update join and assign the desired variables from dt2
explicitely
library(data.table)
set.seed(42)
timestamp <- sort(rnorm(10, mean = 1, sd = 1))
dt1 <- data.table(
id = letters[1:10],
timestamp = timestamp,
timestamp1 = timestamp,
other1 = 1:10,
other2 = 11:20
)
dt2 <- data.table(
timestamp = timestamp[c(3, 5, 8)] + 0.1,
timestamp2 = timestamp[c(3, 5, 8)] + 0.1,
other3 = c("x", "y", "z"),
other4 = c(333, 444, 555)
)
# left join: leading table on the left
dt1[dt2,
roll = "nearest",
on = "timestamp",
# assign desired values explicitely
`:=`(other3 = i.other3,
other4 = i.other4)]
dt1[]
#> id timestamp timestamp1 other1 other2 other3 other4
#> 1: a 0.4353018 0.4353018 1 11 <NA> NA
#> 2: b 0.8938755 0.8938755 2 12 <NA> NA
#> 3: c 0.9053410 0.9053410 3 13 <NA> NA
#> 4: d 0.9372859 0.9372859 4 14 x 333
#> 5: e 1.3631284 1.3631284 5 15 <NA> NA
#> 6: f 1.4042683 1.4042683 6 16 y 444
#> 7: g 1.6328626 1.6328626 7 17 <NA> NA
#> 8: h 2.3709584 2.3709584 8 18 <NA> NA
#> 9: i 2.5115220 2.5115220 9 19 z 555
#> 10: j 3.0184237 3.0184237 10 20 <NA> NA
Related Topics
Create Barplot from Data.Frame
Collapse All Columns by an Id Column
How to Create a Bar Plot for Two Variables Mirrored Across the X-Axis in R
Add a New Column Between Other Dataframe Columns
Canonical Tidyverse Method to Update Some Values of a Vector from a Look-Up Table
How to View an HTML Table in the Viewer Pane
Remove Unused Factor Levels from a Ggplot Bar Plot
Read Multiple Xlsx Files with Multiple Sheets into One R Data Frame
Filter a Vector of Strings Based on String Matching
R: What's the How to Overwrite a Function from a Package
Legends for Multiple Fills in Ggplot
Dual Y Axis in Ggplot2 for Multiple Panel Figure
How to Handle Vectors Without Knowing the Type in Rcpp
Ternary Plot and Filled Contour
Count the Number of Non-Zero Elements of Each Column
Recommended Way to Initialize Js Renderer in 'Asis' R Markdown Chunk
Difference Between 'Names(Df[1]) <- ' and 'Names(Df)[1] <- '