R data.table apply function to rows using columns as arguments
The best way is to write a vectorized function, but if you can't, then perhaps this will do:
x[, func.text(f1, f2), by = seq_len(nrow(x))]
Apply function by row in data.table using columns as arguments
You could do something like this:
DF <- read.table(text = " Cycle Tab ID colA colB colC colG high1 high1a
1 0 45513 -233.781 -84.087 -3.141 3740.916 3740.916 colC
2 0 45513 -103.561 -347.382 2900.866 357.071 2900.866 colC
3 0 45513 153.383 4036.636 353.479 -42.736 4036.636 colC
4 0 45513 -147.941 28.994 4354.994 384.945 4354.994 colC
5 0 45513 -89.719 -504.643 1298.476 131.32 1298.476 colC
6 0 45513 -250.11 -30.862 1877.049 -184.772 1877.049 colC", header = TRUE)
library(data.table)
setDT(DF)
maxTwo <- function(x) {
ind <- length(x) - (1:0) #the index is equal for all rows,
#so it could be made a function parameter
#for better efficiency
as.list(sort.int(x, partial = ind)[ind]) #partial sorting
}
DF[, paste0("max", 1:2) := maxTwo(unlist(.SD)),
by = seq_len(nrow(DF)), .SDcols = 4:7]
DF[, diffMax := max2 - max1]
# Cycle Tab ID colA colB colC colG high1 high1a max1 max2 diffMax
#1: 1 0 45513 -233.781 -84.087 -3.141 3740.916 3740.916 colC -3.141 3740.916 3744.057
#2: 2 0 45513 -103.561 -347.382 2900.866 357.071 2900.866 colC 357.071 2900.866 2543.795
#3: 3 0 45513 153.383 4036.636 353.479 -42.736 4036.636 colC 353.479 4036.636 3683.157
#4: 4 0 45513 -147.941 28.994 4354.994 384.945 4354.994 colC 384.945 4354.994 3970.049
#5: 5 0 45513 -89.719 -504.643 1298.476 131.320 1298.476 colC 131.320 1298.476 1167.156
#6: 6 0 45513 -250.110 -30.862 1877.049 -184.772 1877.049 colC -30.862 1877.049 1907.911
However, you'd still be looping over the rows, which means nrow
calls to the function. You could try Rcpp to do the looping in compiled code.
How to apply a different multi-argument function to each row of a data.table?
So we want do a rowwise calculation, and return it defined as a new column o
mapply
is definitely the right family of functions, but mapply
(and sapply
) will simplify their output out of a list before they return it. data.table
loves lists. Map
is just an expressive shortcut to mapply(..., simplify = FALSE)
which does not modify the return.
The following does the calculation we're after, but it's still not quite right. (data.table
interprets the list-output as separate columns)
> dt[, Map(sub, l, '', n)]
apple ball cat
1: I ate I played ate pudding
So we want to go one further and wrap it in a list to get the output we're after:
>dt[, .(Map(sub, l, '', n))]
V1
1: I ate
2: I played
3: ate pudding
Now we can assign this using :=
> dt[, o := Map(sub, l, '', n)]
> dt
l m n o
1: apple 1 I ate apple I ate
2: ball 2 I played ball I played
3: cat 3 cat ate pudding ate pudding
EDIT: As was pointed out, this results in o
being a list-column.
We can avoid this by using standard mapply
, though I tend to prefer the one-size-fits-all approach of Map
(Each row creates a single output, which goes in a list. Regardless of what that output looks like, this will always work, and then we can type-convert at the end.)
dt[, o := mapply(sub, l, '', n)]
Apply a function to every specified column in a data.table and update by reference
This seems to work:
dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols]
The result is
a b d
1: -1 -1 1
2: -2 -2 2
3: -3 -3 3
There are a few tricks here:
- Because there are parentheses in
(cols) :=
, the result is assigned to the columns specified incols
, instead of to some new variable named "cols". .SDcols
tells the call that we're only looking at those columns, and allows us to use.SD
, theS
ubset of theD
ata associated with those columns.lapply(.SD, ...)
operates on.SD
, which is a list of columns (like all data.frames and data.tables).lapply
returns a list, so in the endj
looks likecols := list(...)
.
EDIT: Here's another way that is probably faster, as @Arun mentioned:
for (j in cols) set(dt, j = j, value = -dt[[j]])
Function in data.table with two columns as arguments
One option if you don't mind adding quotes around the variable names
fun <- function(DT, fun, ...){
fun_args <- c(...)
DT[,new_col := do.call(fun, setNames(mget(fun_args), names(fun_args)))]
}
fun(DT, fun = function(x, y){y - x}, x = 'col1', y = 'col2')
DT
# col1 col2 new_col
# 1: 1 2 1
# 2: 2 3 1
# 3: 3 4 1
# 4: 4 5 1
Or use .SDcols
(same result as above)
fun <- function(DT, fun, ...){
fun_args <- c(...)
DT[, new_col := do.call(fun, setNames(.SD, names(fun_args))),
.SDcols = fun_args]
}
R data.table - Apply function A to some columns and function B to some others
Here is one way to do it with Map
or mapply
:
Let's make some toy data first:
dt <- data.table(
variable1 = rnorm(100),
variable2 = rnorm(100),
variable3 = rnorm(100),
variable4 = rnorm(100),
grp = sample(letters[1:5], 100, replace = T)
)
colsToMean <- c("variable1", "variable2")
colsToMax <- c("variable3")
colsToSd <- c("variable4")
Then,
scols <- list(colsToMean, colsToMax, colsToSd)
funs <- rep(c(mean, max, sd), lengths(scols))
# summary
dt[, Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]
# or replace the original values with summary statistics as in OP
dt[, unlist(scols) := Map(function(f, x) f(x), funs, .SD), by = grp, .SDcols = unlist(scols)]
Another option with GForce on:
scols <- list(colsToMean, colsToMax, colsToSd)
funs <- rep(c('mean', 'max', 'sd'), lengths(scols))
jexp <- paste0('list(', paste0(funs, '(', unlist(scols), ')', collapse = ', '), ')')
dt[, eval(parse(text = jexp)), by = grp, verbose = TRUE]
# Detected that j uses these columns: variable1,variable2,variable3,variable4
# Finding groups using forderv ... 0.000sec
# Finding group sizes from the positions (can be avoided to save RAM) ... 0.000sec
# Getting back original order ... 0.000sec
# lapply optimization is on, j unchanged as 'list(mean(variable1), mean(variable2), max(variable3), sd(variable4))'
# GForce optimized j to 'list(gmean(variable1), gmean(variable2), gmax(variable3), gsd(variable4))'
# Making each group and running j (GForce TRUE) ... 0.000sec
Applying function over data.table and storing results in a list
You can use Map
to get output as list :
setNames(Map(opt, df$xvalue, df$yvalue), df$ColName)
#$Column1
#[1] 15
#$Column2
#[1] 8
#$Column3
#[1] 6
Applying a function to every row on each n number of columns in R
Here is one approach:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data). We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad
to add 0 to the front of the group number)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer):
myfunc <- function(x) c(mean(x), var(x), sd(x))
Related Topics
Can the Value.Var in Dcast Be a List or Have Multiple Value Variables
Check If a Date Is Within an Interval in R
How to Facet a Plot_Ly() Chart
How to Deal with Nas in Residuals in a Regression in R
Replace Multiple Values in a Column for a Single One
Messy Plot When Plotting Predictions of a Polynomial Regression Using Lm() in R
Simple Examples of Filter Function, Recursive Option Specifically
Raw Text Strings for File Paths in R
Get Date Difference in Years (Floating Point)
How to Add Annotations Below the X Axis in Ggplot2
Combining Duplicated Rows in R and Adding New Column Containing Ids of Duplicates
Ggplot2 Make Missing Value in Geom_Tile Not Blank
Datalabels in R Highcharter Cannot Be Seen After Print as Png or Jpg