Is There a R Function That Applies a Function to Each Pair of Columns

Is there a R function that applies a function to each pair of columns?

It wouldn't be faster, but you can use outer to simplify the code. It does require a vectorized function, so here I've used Vectorize to make a vectorized version of the function to get the correlation between two columns.

df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
n <- ncol(df)

corpij <- function(i,j,data) {cor.test(data[,i],data[,j])$p.value}
corp <- Vectorize(corpij, vectorize.args=list("i","j"))
outer(1:n,1:n,corp,data=df)

Apply a custom function to pairs of columns in a dataframe

You can use the following solution:

library(dplyr)
library(tibble)

expand.grid(names(data), names(data)) %>%
rowwise() %>%
mutate(Res = custom_function(data[as.character(Var1)], data[as.character(Var2)])) %>%
pivot_wider(names_from = unique("Var1"), values_from = "Res") %>%
column_to_rownames("Var2")

x y z
x -0.3591433 2.157343 -1.470995
y 2.1573430 4.673829 1.045491
z -1.4709953 1.045491 -2.582847

What is the most elegant way to apply a function to multiple pairs of columns in a data.table or data.frame?

1) gv Using gv in the collapse package we could do this:

library(collapse)

DT[, (result.cols) := gv(.SD, one.cols) - gv(.SD, two.cols)]

2) gvr We can alternately use the regex variant of gv to eliminate one.cols and two.cols:

library(collapse)

result.cols <- sub(1, 3, gvr(DT, "1$", "names"))
DT[, (result.cols) := gvr(.SD, "1$") - gvr(.SD, "2$")]

3) across Using dplyr we can use across eliminating result.cols as well.

library(dplyr)

DT %>%
mutate(across(ends_with("1"), .names="{sub(1,3,.col)}") - across(ends_with("2")))

4) data.table If we write it like this it is straight forward in data.table:

DT[, result.cols] <- DT[, ..one.cols] - DT[, ..two.cols]

or

DT[, (result.cols) := .SD[, one.cols, with=FALSE] - .SD[, two.cols, with=FALSE]]

Apply a function to pairs of columns in a loop

Here is an option

lapply(split.default(dframe, sub("\\d+$", "", names(dframe))), cor)
#$a
# a1 a2
#a1 1.0000000 0.1132033
#a2 0.1132033 1.0000000

#$b
# b1 b2
#b1 1.00000000 0.09113974
#b2 0.09113974 1.00000000

#$c
# c1 c2
#c1 1.0000000 -0.2066311
#c2 -0.2066311 1.0000000

We split your data frame column-wise and then iterate over the resulting list with lapply.

R data.table apply function to all pair of columns

For combinations in pairs, crossprod seems yet useful.

We only care for whether a value is NA or not:

NAtemp = is.na(temp)

Compare the co-existence of NAs:

crossprod(NAtemp)
# M P S
#M 3 2 2
#P 2 3 3
#S 2 3 5

with the number of NA per column:

colSums(NAtemp)
#M P S
#3 3 5

like:

ans = crossprod(NAtemp) == colSums(NAtemp)
ans
# M P S
#M TRUE FALSE FALSE
#P FALSE TRUE TRUE
#S FALSE FALSE TRUE

And use the convenient as.data.frame.table to format:

subset(as.data.frame(as.table(ans)), Var1 != Var2)
# Var1 Var2 Freq
#2 P M FALSE
#3 S M FALSE
#4 M P FALSE
#6 S P FALSE
#7 M S FALSE
#8 P S TRUE

apply a function over groups of columns

This may be more generalizable to your situation in that you pass a list of indices. If speed is an issue (large data frame) I'd opt for lapply with do.call rather than sapply:

x <- list(1:3, 4:6)
do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))

Works if you just have col names too:

x <- list(c('a','b','c'), c('d', 'e', 'f'))
do.call(cbind, lapply(x, function(i) rowMeans(dat[, i])))

EDIT

Just happened to think maybe you want to automate this to do every three columns. I know there's a better way but here it is on a 100 column data set:

dat <- data.frame(matrix(rnorm(16*100), ncol=100))

n <- 1:ncol(dat)
ind <- matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=TRUE, ncol=3)
ind <- data.frame(t(na.omit(ind)))
do.call(cbind, lapply(ind, function(i) rowMeans(dat[, i])))

EDIT 2
Still not happy with the indexing. I think there's a better/faster way to pass the indexes. here's a second though not satisfying method:

n <- 1:ncol(dat)
ind <- data.frame(matrix(c(n, rep(NA, 3 - ncol(dat)%%3)), byrow=F, nrow=3))
nonna <- sapply(ind, function(x) all(!is.na(x)))
ind <- ind[, nonna]

do.call(cbind, lapply(ind, function(i)rowMeans(dat[, i])))

Apply a function to sequential pairs of columns in R

You can split the dataframe and then use mapply:

    col<-seq(1,ncol(df),by=2)
mapply(t.test,df[,col],df[,-col],MoreArgs=list(paired=TRUE))

In this way the names of the resulting list will be the names of the odd coulmns of df.



Related Topics



Leave a reply



Submit