Update subset of data.table based on join
The easiest way I can think of is to key by id1
as well.
eg
setkey(DT1, id2,id1)
DT2[, id1 := 3]
setkey(DT2, id2, id1)
# use i.v1 to reference v1 from the i component
DT1[DT2, v1 := i.v1 ]
DT1
id1 id2 v1
1: 2 e 0.7383247
2: 1 g 1.5952808
3: 2 j 0.3295078
4: 3 n 0.0000000
5: 3 s 0.5757814
6: 1 u 0.4874291
Update a subset of values in data.table column with values from another data.table column
You can use get
to grab the i.name
variable programmatically in the update join, and stay within standard data.table join operations. Example data and code:
library(data.table)
data <- data.table(snp.gene.key=1:5, dval = letters[1:5])
all_tmp <- data.table(snp.gene.key=1:3, dval=letters[11:13])
setkey(data, snp.gene.key)
setkey(all_tmp, snp.gene.key)
data
# snp.gene.key dval
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: 5 e
Then specify (name)
on the RHS of the :=
assignment so it is interpreted rather than treated literally, along with using get
on the LHS to grab the variable you want for the update join.
name <- "dval"
data[all_tmp, (name) := get(paste0("i.", name)) ]
data
# snp.gene.key dval
#1: 1 k
#2: 2 l
#3: 3 m
#4: 4 d
#5: 5 e
R Data Table - join but filter with update
You can create a dummy
variable a
in DT2
, join on both columns a and b and then Update:
DT[DT2[, c(a = 3, .SD)], c := i.c, on = c("a", "b")]
DT
# a b c
#1: 1 1 NA
#2: 2 2 NA
#3: 3 3 10
#4: 1 4 NA
#5: 2 5 NA
#6: 3 6 10
#7: 1 7 NA
#8: 2 8 NA
#9: 3 9 10
assigning a subset of data.table rows and columns by join
You can use the :=
operator along with the join simultaneously as follows:
First prepare data:
require(data.table) ## >= 1.9.0
setDT(x) ## converts DF to DT by reference
setDT(y)
setkey(x, key) ## set key column
setkey(y, key)
Now the one-liner:
x[y, c("a", "b") := list(i.a, i.b)]
:=
modifies by reference (in-place). The rows to modify are provided by the indices computed from the join in i
.
i.a
and i.b
are the column names data.table
internally generates for easy access to i
's columns when both x
and i
have identical column names, when performing a join of the form x[i]
.
HTH
PS: In your example y
's columns a and b are of type numeric and x
's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place.
Subset data.table by another data.table without merging all columns
Following @akrun's answer, you can identify the rows in the join and use them to subset the table:
w = sort(DT1[DT2, on=.(A,B), which=TRUE, nomatch=0])
DT1[w]
# A B C
# 1: 1 1 1
# 2: 3 1 3
# 3: 2 3 1
or more compactly
DT1[sort(DT1[DT2, on=.(A,B), which=TRUE, nomatch=0])]
If you want to keep rows in the order from DT2, don't sort; and if you want unmatched rows included, skip nomatch=0
.
Conditional binary join and update by reference using the data.table package
Copying from Arun's updated answer here
TK[venue_id %in% 1:2, New_id := DFT[.SD, New_id]][]
# venue_id DFT_id New_id
# 1: 1 1 3
# 2: 2 1 3
# 3: 1 2 4
# 4: 3 2 9401
# 5: 2 3 2
# 6: 3 3 456
His answer gives the details of what is going on.
Subset-join-replace using R data.table
I think you are looking for:
id_name[is.na(name), name :=
id_name[!is.na(name)][.SD, on=.(id), x.name]
]
id_name[is.na(id), id :=
id_name[!is.na(id)][.SD, on=.(name), mult="first", x.id]
]
id_name
Related Topics
How to Sort Letters in a String
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
Deleting Reversed Duplicates with R
Load Multiple Packages at Once
Dplyr Mutate Rowwise Max of Range of Columns
Twitter, Roauth and Windows: Register Ok, But Certificate Verify Failed
Extract Matrix Column Values by Matrix Column Name
How to Find All Functions in an R Package
Installation of Rodbc/Roracle Packages on Os X Mavericks
How to Run R on a Server Without X11, and Avoid Broken Dependencies
Examples of the Perils of Globals in R and Stata
Argument Is of Length Zero in If Statement
Controlling Line Color and Line Type in Ggplot Legend
How to Draw a Line Across a Multiple-Figure Environment in R