Is there a R function that applies a function to each pair of columns?
It wouldn't be faster, but you can use outer
to simplify the code. It does require a vectorized function, so here I've used Vectorize
to make a vectorized version of the function to get the correlation between two columns.
df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
n <- ncol(df)
corpij <- function(i,j,data) {cor.test(data[,i],data[,j])$p.value}
corp <- Vectorize(corpij, vectorize.args=list("i","j"))
outer(1:n,1:n,corp,data=df)
Apply a custom function to pairs of columns in a dataframe
You can use the following solution:
library(dplyr)
library(tibble)
expand.grid(names(data), names(data)) %>%
rowwise() %>%
mutate(Res = custom_function(data[as.character(Var1)], data[as.character(Var2)])) %>%
pivot_wider(names_from = unique("Var1"), values_from = "Res") %>%
column_to_rownames("Var2")
x y z
x -0.3591433 2.157343 -1.470995
y 2.1573430 4.673829 1.045491
z -1.4709953 1.045491 -2.582847
What is the most elegant way to apply a function to multiple pairs of columns in a data.table or data.frame?
1) gv Using gv in the collapse package we could do this:
library(collapse)
DT[, (result.cols) := gv(.SD, one.cols) - gv(.SD, two.cols)]
2) gvr We can alternately use the regex variant of gv to eliminate one.cols and two.cols:
library(collapse)
result.cols <- sub(1, 3, gvr(DT, "1$", "names"))
DT[, (result.cols) := gvr(.SD, "1$") - gvr(.SD, "2$")]
3) across Using dplyr we can use across eliminating result.cols as well.
library(dplyr)
DT %>%
mutate(across(ends_with("1"), .names="{sub(1,3,.col)}") - across(ends_with("2")))
4) data.table If we write it like this it is straight forward in data.table:
DT[, result.cols] <- DT[, ..one.cols] - DT[, ..two.cols]
or
DT[, (result.cols) := .SD[, one.cols, with=FALSE] - .SD[, two.cols, with=FALSE]]
Apply a function to pairs of columns in a loop
Here is an option
lapply(split.default(dframe, sub("\\d+$", "", names(dframe))), cor)
#$a
# a1 a2
#a1 1.0000000 0.1132033
#a2 0.1132033 1.0000000
#$b
# b1 b2
#b1 1.00000000 0.09113974
#b2 0.09113974 1.00000000
#$c
# c1 c2
#c1 1.0000000 -0.2066311
#c2 -0.2066311 1.0000000
We split your data frame column-wise and then iterate over the resulting list with lapply
.
R data.table apply function to all pair of columns
For combinations in pairs, crossprod
seems yet useful.
We only care for whether a value is NA
or not:
NAtemp = is.na(temp)
Compare the co-existence of NA
s:
crossprod(NAtemp)
# M P S
#M 3 2 2
#P 2 3 3
#S 2 3 5
with the number of NA
per column:
colSums(NAtemp)
#M P S
#3 3 5
like:
ans = crossprod(NAtemp) == colSums(NAtemp)
ans
# M P S
#M TRUE FALSE FALSE
#P FALSE TRUE TRUE
#S FALSE FALSE TRUE
And use the convenient as.data.frame.table
to format:
subset(as.data.frame(as.table(ans)), Var1 != Var2)
# Var1 Var2 Freq
#2 P M FALSE
#3 S M FALSE
#4 M P FALSE
#6 S P FALSE
#7 M S FALSE
#8 P S TRUE
apply a function over groups of columns
This may be more generalizable to your situation in that you pass a list of indices. If speed is an issue (large data frame) I'd opt for lapply
with do.call
rather than sapply
:
x <- list(1:3, 4:6)
do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))
Works if you just have col names too:
x <- list(c('a','b','c'), c('d', 'e', 'f'))
do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))
EDIT
Just happened to think maybe you want to automate this to do every three columns. I know there's a better way but here it is on a 100 column data set:
dat <- data.frame(matrix(rnorm(16*100), ncol=100))
n <- 1:ncol(dat)
ind <- matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=TRUE, ncol=3)
ind <- data.frame(t(na.omit(ind)))
do.call(cbind, lapply(ind, function(i) rowMeans(dat[, i])))
EDIT 2
Still not happy with the indexing. I think there's a better/faster way to pass the indexes. here's a second though not satisfying method:
n <- 1:ncol(dat)
ind <- data.frame(matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=F, nrow=3))
nonna <- sapply(ind, function(x) all(!is.na(x)))
ind <- ind[, nonna]
do.call(cbind, lapply(ind, function(i)rowMeans(dat[, i])))
Apply a function to sequential pairs of columns in R
You can split the dataframe and then use mapply
:
col<-seq(1,ncol(df),by=2)
mapply(t.test,df[,col],df[,-col],MoreArgs=list(paired=TRUE))
In this way the names of the resulting list will be the names of the odd coulmns of df
.
Related Topics
Use Trycatch Skip to Next Value of Loop Upon Error
If/Else Constructs Inside and Outside Functions
How to Paste a String on Each Element of a Vector of Strings Using Apply in R
Remove Duplicates Keeping Entry with Largest Absolute Value
Overlay Data Onto Background Image
How to Get the Classes of All Columns in a Data Frame
How to Loop/Repeat a Linear Regression in R
A Matrix Version of Cor.Test()
Remove Columns from Dataframe Where Some of Values Are Na
Angle Between Two Vectors in R
Extend Contigency Table with Proportions (Percentages)
Use Different Center Than the Prime Meridian in Plotting a World Map
Short Formula Call for Many Variables When Building a Model
What Is the Algorithm Behind R Core's 'Split' Function
How to Put a Geom_Sf Produced Map on Top of a Ggmap Produced Raster