How to Perform a Pairwise T.Test in R Across Multiple Independent Vectors

How can I perform a pairwise t.test in R across multiple independent vectors?

The advantage to my method below to the one proposed by @ashkan would be that mine removes duplicates. (i.e. either X1 vs X2 OR X2 vs X1 will appear in the results, not both)

# Generate dummy data
df <- data.frame(matrix(rnorm(100), ncol = 10))
colnames(df) <- paste0("X", 1:10)

# Create combinations of the variables
combinations <- combn(colnames(df),2, simplify = FALSE)

# Do the t.test
results <- lapply(seq_along(combinations), function (n) {
df <- df[,colnames(df) %in% unlist(combinations[n])]
result <- t.test(df[,1], df[,2])
return(result)})

# Rename list for legibility
names(results) <- paste(matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,1], matrix(unlist(combinations), ncol = 2, byrow = TRUE)[,2], sep = " vs. ")

How to perform pairwise operation like `%in%` and set operations for a list of vectors

We could use outer(x, y, FUN). x and y need not be a "numeric" input like numerical vector / matrix; a vector input like "list" / "matrix list" is also allowed.

For example, to apply pairwise "%in%" operation, we use

z <- outer(lst, lst, FUN = Vectorize("%in%", SIMPLIFY = FALSE, USE.NAMES = FALSE))
# vec1 vec2 vec3 vec4
#vec1 Logical,2 Logical,2 Logical,2 Logical,2
#vec2 Logical,3 Logical,3 Logical,3 Logical,3
#vec3 Logical,4 Logical,4 Logical,4 Logical,4
#vec4 Logical,5 Logical,5 Logical,5 Logical,5

Since "%in%" itself is not vectorized, we use Vectorized("%in%"). We also need SIMPLIFY = FALSE, so that FUN returns a length-1 list for each pair (x[[i]], y[[j]]). This is important, as outer works like:

y[[4]] | FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])  FUN(x[[1]], y[[4]])  FUN(x[[2]], y[[4]])
y[[3]] | FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[3]]) FUN(x[[1]], y[[3]]) FUN(x[[2]], y[[4]])
y[[2]] | FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[2]]) FUN(x[[1]], y[[2]]) FUN(x[[2]], y[[4]])
y[[1]] | FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[1]]) FUN(x[[1]], y[[1]]) FUN(x[[2]], y[[4]])
------------------- ------------------- ------------------- -------------------
x[[1]] x[[2]] x[[3]] x[[4]]

It must be satisfied that length(FUN(x, y)) == length(x) * length(y). While if SIMPLIFY = FALSE, this does not necessarily hold.

The result z above is a "matrix list", with class(z) being "matrix", but typeof(z) being "list". Read Why is this matrix not numeric? for more.


If we want to further apply some summary function to each element of z, we could use lapply. Here I would offer two examples.

Example 1: Apply any()

Since any(a %in% b) is as same as any(b %in% a), i.e., the operation is symmetric, we only need to work with the lower triangular of z:

lz <- z[lower.tri(z)]

lapply returns an unnamed list, but for readability we want a named list. We may use matrix index (i, j) as name:

ind <- which(lower.tri(z), arr.ind = TRUE)
NAME <- paste(ind[,1], ind[,2], sep = ":")
any_lz <- setNames(lapply(lz, any), NAME)

#List of 6
# $ 2:1: logi FALSE
# $ 3:1: logi TRUE
# $ 4:1: logi TRUE
# $ 3:2: logi TRUE
# $ 4:2: logi FALSE
# $ 4:3: logi TRUE

Set operations like intersect, union and setequal are also symmetric operations which we can work with similarly.

Example 2: Apply which()

which(a %in% b) is not a symmetric operation, so we have to work with the full matrix.

NAME <- paste(1:nrow(z), rep(1:nrow(z), each = ncol(z)), sep = ":")
which_z <- setNames(lapply(z, which), NAME)

# List of 16
# $ 1:1: int [1:2] 1 2
# $ 2:1: int(0)
# $ 3:1: int [1:2] 1 2
# $ 4:1: int 3
# $ 1:2: int(0)
# $ 2:2: int [1:3] 1 2 3
# ...

Set operations like setdiff is also asymmetric and can be dealt with similarly.


Alternatives

Apart from using outer(), we could also use R expressions to obtain the z above. Again, I take binary operation "%in%" as an example:

op <- "'%in%'"    ## operator

lst_name <- names(lst)
op_call <- paste0(op, "(", lst_name, ", ", rep(lst_name, each = length(lst)), ")")
# [1] "'%in%'(vec1, vec1)" "'%in%'(vec2, vec1)" "'%in%'(vec3, vec1)"
# [4] "'%in%'(vec4, vec1)" "'%in%'(vec1, vec2)" "'%in%'(vec2, vec2)"
# ...

Then we can parse and evaluate these expressions within lst. We may use combination index for names of the resulting list:

NAME <- paste(1:length(lst), rep(1:length(lst), each = length(lst)), sep = ":")
z <- setNames(lapply(parse(text = op_call), eval, lst), NAME)

# List of 16
# $ 1:1: logi [1:2] TRUE TRUE
# $ 2:1: logi [1:3] FALSE FALSE FALSE
# $ 3:1: logi [1:4] TRUE TRUE FALSE FALSE
# $ 4:1: logi [1:5] FALSE FALSE TRUE FALSE FALSE
# $ 1:2: logi [1:2] FALSE FALSE
# ...

R: t test over multiple columns using t.test function

Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format

library(tidyverse)

res <- test_data %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#> var estimate estimate1 estimate2 statistic p.value parameter conf.low
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a -0.259 9.78 10.0 -0.587 0.565 16.2 -1.19
#> 2 b 0.154 15.0 14.8 0.169 0.868 15.4 -1.78
#> 3 c -0.359 20.4 20.7 -0.287 0.778 16.5 -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> # alternative <chr>

Created on 2019-03-15 by the reprex package (v0.2.1.9000)



Related Topics



Leave a reply



Submit