Update Subset of Data.Table Based on Join

Update subset of data.table based on join

The easiest way I can think of is to key by id1 as well.
eg

setkey(DT1, id2,id1)
DT2[, id1 := 3]
setkey(DT2, id2, id1)

# use i.v1 to reference v1 from the i component
DT1[DT2, v1 := i.v1 ]


DT1
id1 id2 v1
1: 2 e 0.7383247
2: 1 g 1.5952808
3: 2 j 0.3295078
4: 3 n 0.0000000
5: 3 s 0.5757814
6: 1 u 0.4874291

Update a subset of values in data.table column with values from another data.table column

You can use get to grab the i.name variable programmatically in the update join, and stay within standard data.table join operations. Example data and code:

library(data.table)
data <- data.table(snp.gene.key=1:5, dval = letters[1:5])
all_tmp <- data.table(snp.gene.key=1:3, dval=letters[11:13])
setkey(data, snp.gene.key)
setkey(all_tmp, snp.gene.key)

data
# snp.gene.key dval
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: 5 e

Then specify (name) on the RHS of the := assignment so it is interpreted rather than treated literally, along with using get on the LHS to grab the variable you want for the update join.

name <- "dval"
data[all_tmp, (name) := get(paste0("i.", name)) ]

data
# snp.gene.key dval
#1: 1 k
#2: 2 l
#3: 3 m
#4: 4 d
#5: 5 e

R Data Table - join but filter with update

You can create a dummy variable a in DT2, join on both columns a and b and then Update:

DT[DT2[, c(a = 3, .SD)], c := i.c, on = c("a", "b")]

DT
# a b c
#1: 1 1 NA
#2: 2 2 NA
#3: 3 3 10
#4: 1 4 NA
#5: 2 5 NA
#6: 3 6 10
#7: 1 7 NA
#8: 2 8 NA
#9: 3 9 10

assigning a subset of data.table rows and columns by join

You can use the := operator along with the join simultaneously as follows:

First prepare data:

require(data.table) ## >= 1.9.0
setDT(x) ## converts DF to DT by reference
setDT(y)
setkey(x, key) ## set key column
setkey(y, key)

Now the one-liner:

x[y, c("a", "b") := list(i.a, i.b)]

:= modifies by reference (in-place). The rows to modify are provided by the indices computed from the join in i.

i.a and i.b are the column names data.table internally generates for easy access to i's columns when both x and i have identical column names, when performing a join of the form x[i].

HTH

PS: In your example y's columns a and b are of type numeric and x's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place.

Subset data.table by another data.table without merging all columns

Following @akrun's answer, you can identify the rows in the join and use them to subset the table:

w = sort(DT1[DT2, on=.(A,B), which=TRUE, nomatch=0])
DT1[w]

# A B C
# 1: 1 1 1
# 2: 3 1 3
# 3: 2 3 1

or more compactly

DT1[sort(DT1[DT2, on=.(A,B), which=TRUE, nomatch=0])]

If you want to keep rows in the order from DT2, don't sort; and if you want unmatched rows included, skip nomatch=0.

Conditional binary join and update by reference using the data.table package

Copying from Arun's updated answer here

TK[venue_id %in% 1:2, New_id := DFT[.SD, New_id]][]
# venue_id DFT_id New_id
# 1: 1 1 3
# 2: 2 1 3
# 3: 1 2 4
# 4: 3 2 9401
# 5: 2 3 2
# 6: 3 3 456

His answer gives the details of what is going on.

Subset-join-replace using R data.table

I think you are looking for:

id_name[is.na(name), name :=
id_name[!is.na(name)][.SD, on=.(id), x.name]
]
id_name[is.na(id), id :=
id_name[!is.na(id)][.SD, on=.(name), mult="first", x.id]
]
id_name


Related Topics



Leave a reply



Submit