How to define multiple variables with lapply?
General solution
Try outer
:
c(outer(1:10, 2:4, Vectorize(function(x, y) x*y)))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
If function is Vectorized already
If the function is already vectorized, as it is here, then we can omit Vectorize
:
c(outer(1:10, 2:4, function(x, y) x * y))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
Particular example shown in question
In fact, in this particular case the anonymous function shown is the default so this would work:
c(outer(1:10, 2:4))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
Also in this particular case we could use:
c(1:10 %o% 2:4)
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
If input is list X
If your starting point is list X
shown in the question then:
c(outer(X[[1]], X[[2]], Vectorize(function(x, y) x * y)))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
or
c(do.call("outer", c(unname(X), Vectorize(function(x, y) x*y))))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
where the prior sections apply to shorten it, if applicable.
R apply function with multiple parameters
Just pass var2 as an extra argument to one of the apply functions.
mylist <- list(a=1,b=2,c=3)
myfxn <- function(var1,var2){
var1*var2
}
var2 <- 2
sapply(mylist,myfxn,var2=var2)
This passes the same var2
to every call of myfxn
. If instead you want each call of myfxn
to get the 1st/2nd/3rd/etc. element of both mylist
and var2
, then you're in mapply
's domain.
using lapply() with multiple variables
You need to put the mode calculation in the function too.
sapply(data[, 2:ncol(data)], function(x) {
mode <- data$CAG[which.max(x)]
B <- sum(x[data$CAG >= mode])
B/sum(x)
})
## A01 A02
## 1.0000000 0.5882353
The function which.max
is equivalent (at least in this use) to x==max(x)
.
Using lapply to create new variables based on multiple conditions and subsets
Here is a base R
method that uses ave
with lapply
. Loop through the columns of dataset excluding the 'cluster', then with ave
get the min
grouped by 'cluster', subtract from the column and assign the list
of vector
s to new columns
df[paste0(names(df)[-1], ".var")] <- lapply(df[-1], function(x)
ave(x, df$cluster, FUN = min) - x)
df
# cluster x y x.var y.var
#1 A 3 4 -1 -3
#2 B 4 5 -3 -2
#3 B 1 3 0 0
#4 A 5 1 -3 0
#5 A 2 2 0 -1
#6 B 6 6 -5 -3
Applying a function and assigning multiple variables in a single call in R
Use [
extraction for the lefthand-side data.frame rather than $
extraction:
df[,c('NewX2','NewY2')] <- mapply(find.key,
list(df$x, df$y),
list(x2, y2),
SIMPLIFY=FALSE)
# df
# x y NewX2 NewY2
# 1 a e Alpha Epi
# 2 b f Beta OtherY
# 3 c g Other OtherY
# 4 d h Other OtherY
Or, if you don't like writing mapply
you can use Vectorize
, which will create an mapply
-based function for you to obtain the same result:
find.keys <- Vectorize(find.key, c("x","li"), SIMPLIFY=FALSE)
df[,c('NewX2','NewY2')] <- find.keys(list(df$x, df$y), list(x2, y2))
df
# x y NewX2 NewY2
# 1 a e Alpha Epi
# 2 b f Beta OtherY
# 3 c g Other OtherY
# 4 d h Other OtherY
Use lapply to create new variable over multiple data frames
According to the OP, there are 100 data.frames with identical columns names. The OP wants to create a new column in all of the data.frames using exactly the same formula.
This indicates a fundamental flaw in the design of the data structure. I guess, no data base admin would create 100 identical tables where only the data contents differs. Instead, he would create one table with an additional column identifying the origin of each row. Then, all subsequent operations would be applied on one table instead to be repeated for each of many.
In R, the data.table
package has the convenient rbindlist()
function which can be used for this purpose:
library(data.table) # CRAN version 1.10.4 used
# get list of data.frames from the given names and
# combine the rows of all data sets into one large data.table
DT <- rbindlist(mget(temp), idcol = "origin")
# now create new column for all rows across all data sets
DT[, ps_true := (1 + exp(-(0.8*w1 - 0.25*w2 + 0.6*w3 -
0.4*w4 - 0.8*w5 - 0.5*w6 + 0.7*w7)))^-1]
DT
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep1.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep1.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep1.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep1.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep1.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
199996: sim_rep100.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
199997: sim_rep100.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
199998: sim_rep100.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
199999: sim_rep100.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
200000: sim_rep100.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
DT
consists now of 200 K rows. Performance is no reason to worry as data.table
was built to deal with large (even larger) data efficiently.
The origin of each row can be identified in case the data of the individual data sets need to be treated separately. E.g.,
DT[origin == "sim_rep47.dat"]
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep47.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep47.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep47.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep47.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep47.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
1996: sim_rep47.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
1997: sim_rep47.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
1998: sim_rep47.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
1999: sim_rep47.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
2000: sim_rep47.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
extracts all row belonging to data set sim_rep47.dat
.
Data
For test and demonstration, I've created 100 sample data.frames using the code below:
# create vector of file names
temp <- paste0("sim_rep", 1:100, ".dat")
# create one sample data.frame
nr <- 2000L
nc <- 11L
set.seed(123L)
foo <- as.data.frame(matrix(round(rnorm(nr * nc), 1), nrow = nr))
names(foo) <- c("ARAND", paste0("w", 1:10))
str(foo)
# create 100 individually named data.frames by "copying" foo
for (t in temp) assign(t, foo)
# print warning message on using assign
fortunes::fortune(236)
# verify objects have been created
ls()
Addendum: Reading all files at once
The OP has named the single data.frames sim_rep1.dat
, sim_rep2.dat
, etc. which resemble typical file names. Just in case the OP indeed has 100 files on disk I would like to suggest a way to read all files at once. Let's suppose all files are stored in one directory.
# path to data directory
data_dir <- file.path("path", "to", "data", "directory")
# create vector of file paths
files <- dir(data_dir, pattern = "sim_rep\\d+\\.dat", full.names = TRUE)
# read all files and create one large data.table
# NB: it might be necessary to add parameters to fread()
# or to use another file reader depending on the file type
DT <- rbindlist(lapply(files, fread), idcol = "origin")
# rename origin to contain the file names without path
DT[, origin := factor(origin, labels = basename(files))]
DT
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep1.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep1.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep1.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep1.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep1.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
199996: sim_rep99.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
199997: sim_rep99.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
199998: sim_rep99.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
199999: sim_rep99.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
200000: sim_rep99.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
All data sets are now stored in one large data.table DT
consisting of 200 k rows. However, the order of data sets is different as files
is sorted alphabetically, i.e.,
head(files)
[1] "./data/sim_rep1.dat" "./data/sim_rep10.dat" "./data/sim_rep100.dat"
[4] "./data/sim_rep11.dat" "./data/sim_rep12.dat" "./data/sim_rep13.dat"
use function on multiple columns (variables) in r
Common parameters to the function need to be passed to ...
within lapply
. Like this:
lapply(subset(iris, select = -Species), leveneTest, group = iris$Species)
help("lapply")
explains that ...
is for "optional arguments to FUN" (meaning optional for lapply
not for FUN
) and provides lapply(x, quantile, probs = 1:3/4)
as an example.
Related Topics
How to Simulate Bimodal Distribution
An Error in R: When I Try to Apply Outer Function:
Change The Year in a Datetime Object in R
Combine (Bind) Existing PDF Files in R
Fill Missing Values in The Data.Frame with The Data from The Same Data Frame
How to Fix Axis Margin with Ggplot2
Change Font Size for All Inline Equations R Markdown
How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By
R:Binary Matrix for All Possible Unique Results
Ggplot2 Violin Plot: Fill Central 95% Only
How to Filter Cases in a Data.Table by Multiple Conditions Defined in Another Data.Table
Remove Certain Words in String from Column in Dataframe in R
Download Multiple CSV Files with One Button (Downloadhandler) with R Shiny
R: Check If Value from Dataframe Is Within Range Other Dataframe
Sed Directory Not Found When Running R with -E Flag
Assigning/Referencing a Column Name in Data.Table Dynamically (In I, J and By)
Ggplot2: How to Separate Geom_Polygon and Geom_Line in Legend Keys