Dynamically Build Call for Lookup Multiple Columns

Dynamically build call for lookup multiple columns

In recent development version it has been made much easier

ID[JN, (select) := .list_of_fields,
env=list(.list_of_fields=as.list(paste0('i.', select)))]

OLD solution before 1.14.1

Instead of mget or eval-parse there is still possibility to build the lookup call. While the mget is the most user friendly, this one is both flexible and actually corresponds to building the j expression.

Solution wrapped into batch.lookup helper function taking character vector of column names to lookup.

    library(data.table)
set.seed(1)
ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
select <- c("value","meta") # my fields to lookup
batch.lookup = function(x) {
as.call(list(
as.name(":="),
x,
as.call(c(
list(as.name("list")),
sapply(x, function(x) as.name(paste0("i.",x)), simplify=FALSE)
))
))
}
batch.lookup(select)
#`:=`(c("value", "meta"), list(value = i.value, meta = i.meta))
ID[JN, eval(batch.lookup(select))][]
# id meta value
#1: 1 1 x
#2: 2 1 v
#3: 3 1 f

To be fair this answer actually address call construction issue described by me as OP.

Conditionally Set Multiple Column Values with Dynamic Values Using Loc or Apply

You can use Series.fillna with DataFrame.add_suffix:

index_modifier = '_Male'

init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)


lower_limit upper_limit
A 3.0 5.0
B 2.0 NaN
C 4.0 6.0
D NaN NaN

Dynamic Lookup array and Column index in Vlookup

The equivalent of your formula if you have separate sheets for each month would be

=VLOOKUP($A2,INDIRECT(TEXT($B$1+COLUMNS($B:B)-1,"mmmm")&"!A:AK"),DAY($B$1+COLUMNS($B:B)-1)+1,FALSE)

starting in B2 and assuming you have the first day of the year in B1.

But you can make it easier if you have headings in the first row of the master sheet for each date of the year

=VLOOKUP($A2,INDIRECT(TEXT(B$1,"mmmm")&"!A:AK"),DAY(B$1)+1,FALSE)

This is for sheets called January, February etc. If the names are Jan, Feb etc. then change "mmmm" to "mmm".

Fast data.table assign of multiple columns by group from lookup

You can also use lapply:

cols <- noquote(paste0("value_",1:10))

random[lookup, (cols) := lapply (cols, function(x) get(x) * get(paste0("i.", x))), by = .EACHI ]

In case your dataset is too big and you want to see a progress bar of your operation, you can use pblapply:

library(pbapply)

random[lookup, (cols) := pblapply(cols, function(x) get(x) * get(paste0("i.", x))), by = .EACHI ]

vlookup with multiple columns

INDEX works fast than VLOOKUP, I would recommend using that. It'll reduce the strain that many vlookups would put on your system.

First find the row that contains what you need in a helper column with MATCH:

=MATCH(A1,'mySheet'!$A:$A,0)

Then an INDEX using that number, that you can drag across and populate all your columns with:

=INDEX('mySheet'!B:B,$B1)

Your output would be akin to:

ID|Name|Match |Column 1 |Column 2
-------------------------
1|AB |Match1|IndexCol1|IndexCol2
2|CD |Match2|IndexCol1|IndexCol2
3|EF |Match3|IndexCol1|IndexCol2

Also! I'd recomend setting these ranges to actually cover the data, rather than referencing the whole column, for additional speed gains, e.g.:

=INDEX('mySheet'!B1:B100000,$B1)

Let lookup return an array of multi rows and multi columns

You could use the following:
=INDEX(D3:E6,MATCH(H3:H4,C3:C6,0),SEQUENCE(1,2))

Or if it may contain blank values and don't want them displayed as null:
=SUBSTITUTE(INDEX(D3:E6,MATCH(H3:H4,C3:C6,0),SEQUENCE(1,2)),"","")

Sample Image

Formula VLOOKUP with dynamic lookup value

Using R1C1 referencing would be easier here:

WS1.Cells(4 + j - 1, 3 + ((i - 1) * 9)).FormulaR1C1 = "=VLOOKUP(RC[1],WS2NAME!C2:C3,2,FALSE)"

Dynamically build multiple expressions

If you want to dynamically build a function call where you change parameter names, you'll need to build a named list where the names of the list are the parameter names. You'll also need to delay evaluation of your parameters so you'll want to pass language object. There are bunch of different ways to do that but in this case bquote() is probably most helpful. So for example

set.seed(15)
n <- rpois(n = 1, lambda = 5)
expressions <- LETTERS[rpois(n = n, lambda = 10)]
times <- runif(n = n, max = 1e-5)

params <- setNames(
lapply(times, function(x) bquote(Sys.sleep(.(x)))),
expressions)
params
# $G
# Sys.sleep(7.06628567539156e-06)
# $H
# Sys.sleep(8.62313656602055e-06)
# $L
# Sys.sleep(8.41785145225003e-06)
# $K
# Sys.sleep(4.47443719021976e-06)
# $F
# Sys.sleep(9.64666954241693e-06)

Once you have this named list of expressions, you can call microbenchmark using the do.call function to turn your list of parameters into actual parameters.

do.call("microbenchmark", params)
# expr min lq mean median uq max neval
# G 200 300 312 300 300 1600 100
# H 200 200 478 300 300 19100 100
# L 200 250 288 300 300 500 100
# K 200 200 284 300 300 700 100
# F 200 300 300 300 300 1700 100

Dynamically concatenate Columns in R

You can use the do.call(paste0, ...) idiom:

mtcars$lookup <- do.call(paste0, mtcars[c(ColA, ColB, ColC)])
mtcars$lookup
## [1] "216160" "216160" "22.84108" "21.46258" "18.78360" "18.16225" "14.38360"
## [8] "24.44146.7" "22.84140.8" "19.26167.6" "17.86167.6" "16.48275.8" "17.38275.8" "15.28275.8"
## [15] "10.48472" "10.48460" "14.78440" "32.4478.7" "30.4475.7" "33.9471.1" "21.54120.1"
## [22] "15.58318" "15.28304" "13.38350" "19.28400" "27.3479" "264120.3" "30.4495.1"
## [29] "15.88351" "19.76145" "158301" "21.44121"

Replace c(ColA, ColB, ColC) with a vector of the column names or even the column positions.


In the "tidyverse", you can also use unite. Try the following to see what it does:

library(tidyverse) ## `unite` comes from the tidyr package, FYI
mtcars %>% unite(output, mpg, cyl, disp, sep = "")


Related Topics



Leave a reply



Submit