Keyed lookup on data.table without 'with'
There is an item in the NEWS for 1.8.2 that suggests a ..()
syntax will be added at some point, allowing this
New DT[.(...)] syntax (in the style of package plyr) is identical to
DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so
that .() and ..() are analogous to the file system's ./ and ../; i.e., .()
evaluates within the frame of DT and ..() in the parent scope.
In the mean time, you can get
from the appropriate environment
dt[J(get('x', envir = parent.frame(3)))]
## x y
## 1: 3 5
## 2: 4 6
or you could eval
the whole call to list(x)
or J(x)
dt[eval(list(x))]
dt[eval(J(x))]
dt[eval(.(x))]
variable usage in data.table
Since you have two variables named b
, one inside DT
and one outside the scope of DT
, we have to go and get b <- 7
from the global environment. We can do that with get()
.
DT[b == get("b", globalenv())]
# ID a b c
# 1: b 1 7 13
Update: You mention in the comments that the variables are inside a function environment. In that case, you can use parent.frame()
instead of globalenv()
.
f <- function(b, dt) dt[b == get("b", parent.frame(3))]
f(7, DT)
# ID a b c
# 1: b 1 7 13
f(12, DT)
# ID a b c
# 1: c 6 12 18
Using a key to replace values across a whole data.table
You can use melt
and dcast
:
dcast(
rating[melt(df, id=c("V1", "V2"),value.name = "Rating"), on="Rating"],
V1+V2~variable, value.var = "CreditQuality"
)
Output:
V1 V2 V3 V4 V5 V6 V7 V8 V9
1: XS0041971275 TR.IssuerRating 1 1 1 1 2 2 1
2: XS0043098127 TR.IssuerRating 6 6 6 6 6 6 6
3: XS0285400197 TR.IssuerRating 2 2 2 2 2 2 2
Note: I'm assuming your source data is df
, and your Rating data is rating
. I see that your frames are already of class data.table
How to do an X[Y] data.table join, without losing an existing main key on X?
With secondary keys implemented (since v1.9.6) and the recent bug fix on retaining/discarding keys properly (in v1.9.7), you can now do this using on=
:
# join
DT[x2y, on="x"] # key is removed as row order gets changed.
# update using joins
DT[x2y, y:=y, on="x"] # key is retained, as row order isn't changed.
data.table - subsetting based on variable whose name is a column, too
Data.table runs in the environment of the data table itself right, so you might need to specify where you want to get the value from
DT[cyl == get("cyl", envir = parent.frame())]
How to efficiently add a date column from a lookup table, without plyr?
When the columns you want to merge by do not have the same name in both data frames, you need to specify how they should line up. In merge
, this is done with the by
argument. You can also use data table's [
syntax for merging, in which uses an on
argument.
Whether or not you set key, either of these will work:
merge(proc, allo, by.x = "Pseudonym", by.y = "pseudonym")
proc[allo, on = .(Pseudonym = pseudonym)]
So, what does setkey
do? Most importantly, it will speed up any merges involving key columns. As far as the merge defaults, we can look at ?data.table::merge
, which begins:
...by default, it attempts to merge
at first based on the shared key columns, and if there are none,
then based on key columns of the first argument
x
, and if there are none,then based on the common columns between the two data.tables.
Set the
by
, orby.x
andby.y
arguments explicitly to override this default.
This is different than base::merge
, in that base::merge
will always try to merge on all shared columns. data.table::merge
will prioritize shared columns that are keyed to merge on. None of these will attempt to merge columns with different names.
data.table := assignments when variable has same name as a column
You can always use get
, which allows you to specify the environment:
dt1[1, a := get("a", envir = .GlobalEnv)]
# a
#1: 18
Or just:
a <- 42
dt1[1, a := .GlobalEnv$a]
# a
#1: 42
Chaining factors for lookup - is this the most efficient way?
You can do a lot better if you use lookup vectors instead of lookup lists. Basically, I changed list
to c()
, and then cut out all the as.character
bits.
vState <- c("A" = "Alaska", "T" = "Texas", "G" = "Georgia")
vCap <- c("Alaska" = "Juneau", "Texas" = "Austin", "Georgia" = "Atlanta")
vCap[vState[foo]]
Benchmarking methods so far:
microbenchmark::microbenchmark(
recode = foo %>%
dplyr::recode(!!!iState, .default = NA_character_) %>%
dplyr::recode(!!!sCap, .default = NA_character_),
lists = sCap[iState[foo] %>% as.character() %>% na_if("NULL") ] %>% as.character() %>% na_if("NULL"),
lists_no_pipe = na_if(as.character(sCap[na_if(as.character(iState[foo]), "NULL")]), "NULL"),
vectors = unname(vCap[vState[foo]])
)
# Unit: microseconds
# expr min lq mean median uq max neval
# recode 227.1 244.05 305.203 268.05 319.55 591.1 100
# lists 182.2 198.85 244.964 222.10 254.20 562.6 100
# lists_no_pipe 11.4 13.25 17.726 15.45 18.70 64.5 100
# vectors 2.5 3.85 5.269 4.90 6.40 12.9 100
If you want things to be as fast as possible, don't use %>%
- it's extra overhead. If you are doing complicated things, the extra microseconds from piping don't really matter. But in this case, the operations being done are already so quick that the few microseconds of piping actually account for a significant percentage of the execution time.
You may be able to go even faster--especially if your look-up tables are large, by using a join to a keyed data.table
instead.
Related Topics
Current Time in Iso 8601 Format
Plot Background Colour in Gradient
Assign a Value, If a Number Is in Between Two Numbers
Identify Duplicates and Mark First Occurrence and All Others
Merging a Large List of Xts Objects
How to Use R Plotly Library in R Script Visual of Power Bi
Calculate Euclidean Distance Matrix Using a Big.Matrix Object
Efficient String Similarity Grouping
Add Text on Top of a Faceted Dodged Bar Chart
How to Deal with Nas in Residuals in a Regression in R
Calculate Number of Days Between Two Dates in R
Activate Tabpanel from Another Tabpanel
Warning Message: Line Appears to Contain Embedded Nulls
Pivot_Longer with Multiple Classes Causes Error ("No Common Type")
Rlang::Sym in Anonymous Functions
Passing Data Within Shiny Modules from Module 1 to Module 2
Plotting During a Loop in Rstudio
R Random Forest Error - Type of Predictors in New Data Do Not Match