Dynamically build call for lookup multiple columns
In recent development version it has been made much easier
ID[JN, (select) := .list_of_fields,
env=list(.list_of_fields=as.list(paste0('i.', select)))]
OLD solution before 1.14.1
Instead of mget
or eval-parse
there is still possibility to build the lookup call. While the mget
is the most user friendly, this one is both flexible and actually corresponds to building the j
expression.
Solution wrapped into batch.lookup
helper function taking character vector of column names to lookup.
library(data.table)
set.seed(1)
ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
select <- c("value","meta") # my fields to lookup
batch.lookup = function(x) {
as.call(list(
as.name(":="),
x,
as.call(c(
list(as.name("list")),
sapply(x, function(x) as.name(paste0("i.",x)), simplify=FALSE)
))
))
}
batch.lookup(select)
#`:=`(c("value", "meta"), list(value = i.value, meta = i.meta))
ID[JN, eval(batch.lookup(select))][]
# id meta value
#1: 1 1 x
#2: 2 1 v
#3: 3 1 f
To be fair this answer actually address call construction issue described by me as OP.
Conditionally Set Multiple Column Values with Dynamic Values Using Loc or Apply
You can use Series.fillna with DataFrame.add_suffix:
index_modifier = '_Male'
init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)
lower_limit upper_limit
A 3.0 5.0
B 2.0 NaN
C 4.0 6.0
D NaN NaN
Dynamic Lookup array and Column index in Vlookup
The equivalent of your formula if you have separate sheets for each month would be
=VLOOKUP($A2,INDIRECT(TEXT($B$1+COLUMNS($B:B)-1,"mmmm")&"!A:AK"),DAY($B$1+COLUMNS($B:B)-1)+1,FALSE)
starting in B2 and assuming you have the first day of the year in B1.
But you can make it easier if you have headings in the first row of the master sheet for each date of the year
=VLOOKUP($A2,INDIRECT(TEXT(B$1,"mmmm")&"!A:AK"),DAY(B$1)+1,FALSE)
This is for sheets called January, February etc. If the names are Jan, Feb etc. then change "mmmm" to "mmm".
Fast data.table assign of multiple columns by group from lookup
You can also use lapply
:
cols <- noquote(paste0("value_",1:10))
random[lookup, (cols) := lapply (cols, function(x) get(x) * get(paste0("i.", x))), by = .EACHI ]
In case your dataset is too big and you want to see a progress bar of your operation, you can use pblapply
:
library(pbapply)
random[lookup, (cols) := pblapply(cols, function(x) get(x) * get(paste0("i.", x))), by = .EACHI ]
vlookup with multiple columns
INDEX works fast than VLOOKUP, I would recommend using that. It'll reduce the strain that many vlookups would put on your system.
First find the row that contains what you need in a helper column with MATCH:
=MATCH(A1,'mySheet'!$A:$A,0)
Then an INDEX using that number, that you can drag across and populate all your columns with:
=INDEX('mySheet'!B:B,$B1)
Your output would be akin to:
ID|Name|Match |Column 1 |Column 2
-------------------------
1|AB |Match1|IndexCol1|IndexCol2
2|CD |Match2|IndexCol1|IndexCol2
3|EF |Match3|IndexCol1|IndexCol2
Also! I'd recomend setting these ranges to actually cover the data, rather than referencing the whole column, for additional speed gains, e.g.:
=INDEX('mySheet'!B1:B100000,$B1)
Let lookup return an array of multi rows and multi columns
You could use the following:=INDEX(D3:E6,MATCH(H3:H4,C3:C6,0),SEQUENCE(1,2))
Or if it may contain blank values and don't want them displayed as null:=SUBSTITUTE(INDEX(D3:E6,MATCH(H3:H4,C3:C6,0),SEQUENCE(1,2)),"","")
Formula VLOOKUP with dynamic lookup value
Using R1C1 referencing would be easier here:
WS1.Cells(4 + j - 1, 3 + ((i - 1) * 9)).FormulaR1C1 = "=VLOOKUP(RC[1],WS2NAME!C2:C3,2,FALSE)"
Dynamically build multiple expressions
If you want to dynamically build a function call where you change parameter names, you'll need to build a named list where the names of the list are the parameter names. You'll also need to delay evaluation of your parameters so you'll want to pass language object. There are bunch of different ways to do that but in this case bquote()
is probably most helpful. So for example
set.seed(15)
n <- rpois(n = 1, lambda = 5)
expressions <- LETTERS[rpois(n = n, lambda = 10)]
times <- runif(n = n, max = 1e-5)
params <- setNames(
lapply(times, function(x) bquote(Sys.sleep(.(x)))),
expressions)
params
# $G
# Sys.sleep(7.06628567539156e-06)
# $H
# Sys.sleep(8.62313656602055e-06)
# $L
# Sys.sleep(8.41785145225003e-06)
# $K
# Sys.sleep(4.47443719021976e-06)
# $F
# Sys.sleep(9.64666954241693e-06)
Once you have this named list of expressions, you can call microbenchmark
using the do.call
function to turn your list of parameters into actual parameters.
do.call("microbenchmark", params)
# expr min lq mean median uq max neval
# G 200 300 312 300 300 1600 100
# H 200 200 478 300 300 19100 100
# L 200 250 288 300 300 500 100
# K 200 200 284 300 300 700 100
# F 200 300 300 300 300 1700 100
Dynamically concatenate Columns in R
You can use the do.call(paste0, ...)
idiom:
mtcars$lookup <- do.call(paste0, mtcars[c(ColA, ColB, ColC)])
mtcars$lookup
## [1] "216160" "216160" "22.84108" "21.46258" "18.78360" "18.16225" "14.38360"
## [8] "24.44146.7" "22.84140.8" "19.26167.6" "17.86167.6" "16.48275.8" "17.38275.8" "15.28275.8"
## [15] "10.48472" "10.48460" "14.78440" "32.4478.7" "30.4475.7" "33.9471.1" "21.54120.1"
## [22] "15.58318" "15.28304" "13.38350" "19.28400" "27.3479" "264120.3" "30.4495.1"
## [29] "15.88351" "19.76145" "158301" "21.44121"
Replace c(ColA, ColB, ColC)
with a vector of the column names or even the column positions.
In the "tidyverse", you can also use unite
. Try the following to see what it does:
library(tidyverse) ## `unite` comes from the tidyr package, FYI
mtcars %>% unite(output, mpg, cyl, disp, sep = "")
Related Topics
Passing Command Line Arguments to R Cmd Batch
Append Value to Empty Vector in R
Getting Strings Recognized as Variable Names in R
What's the Best Way to Use R Scripts on the Command Line (Terminal)
File Path Issues in R Using Windows ("Hex Digits in Character String" Error)
Read.CSV Warning 'Eof Within Quoted String' Prevents Complete Reading of File
How to Overlay Density Plots in R
Prevent Row Names to Be Written to File When Using Write.Csv
How to Add Table of Contents in Rmarkdown
Why Is 'Vapply' Safer Than 'Sapply'
How to Check Whether a Function Call Results in a Warning
Replacing Numbers Within a Range with a Factor
Remove Groups with Less Than Three Unique Observations
Is There a More Elegant Way to Convert Two-Digit Years to Four-Digit Years with Lubridate
Data.Table with Two String Columns of Set Elements, Extract Unique Rows with Each Row Unsorted