Perform Multiple Paired T-Tests Based on Groups/Categories

Perform multiple paired t-tests based on groups/categories

One way to do it is to use by:

result <- by(data, data$Product_type, function(x) 
t.test(x$Price_Online, x$Price_Offline, mu=0, alt="two.sided",
paired=TRUE, conf.level=0.99)[c(1:9)])

To get your results in a dataframe, you have to rbind it:

type.convert(as.data.frame(do.call(rbind, result)), as.is=TRUE)
# statistic parameter p.value conf.int estimate null.value stderr alternative method
# A 2.267787 2 0.1514719 -20.25867, 32.25867 6 0 2.645751 two.sided Paired t-test
# B -0.06666667 1 0.9576214 -477.9256, 476.9256 -0.5 0 7.5 two.sided Paired t-test
# C 1.073154 3 0.3618456 -9.996192, 14.496192 2.25 0 2.096624 two.sided Paired t-test

Or, using pipes:

do.call(rbind, result) |> as.data.frame() |> type.convert(as.is=TRUE)


Data

data <- structure(list(Product_type = c("A", "B", "B", "A", "C", "C", 
"C", "A", "C"), Price_Online = c(48L, 29L, 32L, 38L, 32L, 31L,
28L, 47L, 40L), Price_Offline = c(37L, 22L, 40L, 36L, 27L, 35L,
24L, 42L, 36L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9"))

R - Multiple t-tests, multiple groups against different controls

Answering my own question here. I actually figured out that, for my experiments, there wasn't equal variance between the different concentration groups. I ended up splitting them into individual dataframes, then running ANOVA on each separately, getting a list of p-values with Tukey's then recombining the list. Inevitably, there's a multiple comparisons problem I run into with that but it wouldn't have worked individually and I can always apply a separate post-hoc correction after. Code looked a little like the below. I could have used for loops for after the ANOVA steps but I didn't have many concentrations so I just didn't bother with it. It performs ANOVA by all permutations respective to each concentration, but I by selecting [1:4,] during the transform step, it only shows the p-values of the permutations containing Sample A.


##Using melt tutorial from here: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format
##Using ANOVA tutorial from here: http://www.sthda.com/english/wiki/two-way-anova-test-in-r

setwd("c:/R/STACKOVERFLOW")

library(data.table)
library(ggpubr)


sample <- read.csv("DATA.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')

View(sample)

longo <- melt(setDT(sample), id.vars = c("Concentration"), variable.name = "Name")

View(longo)

ggboxplot(longo, x = "Concentration", y = "Output", color = "Name")

ggline(longo, x = "Concentration", y = "Output", color = "Name",
add = c("mean_se", "dotplot"))

##Split these out so ANOVA can be performed on individual concentrations

ConSplit <- split(longo, f = longo$Concentration, drop = TRUE)

list2env(ConSplit,envir=.GlobalEnv)


#16

D16 <- as.data.frame(`16`)

A16<-aov(Output ~ Name, data = D16)

summary(A16)

T16 <- TukeyHSD(A16)

##Whatever, let's just make all of em


#4

D4 <- as.data.frame(`4`)

A4<-aov(Output ~ Name, data = D4)

summary(A4)

T4 <- TukeyHSD(A4)


#2

D2 <- as.data.frame(`2`)

A2<-aov(Output ~ Name, data = D2)

summary(A2)

T2 <- TukeyHSD(A2)


### LET'S OUTPUT THIS IN A USEABLE MANNER

data16 <- as.data.frame(T16[1])
data4 <- as.data.frame(T4[1])
data2 <- as.data.frame(T2[1])

#make dataframes for merge

M16 <- data.frame(Concentration = c(16))
M4 <- data.frame(Concentration = c(4))
M2 <- data.frame(Concentration = c(2))


#merge em

B16 <- transform(merge(data16[1:4,],M16,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B4 <- transform(merge(data4[1:4,],M4,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B2 <- transform(merge(data2[1:4,],M2,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)

#Stack em

AllTogetherNow <- rbind(B16,B4,B2)

#See em

View(AllTogetherNow)

#Write em

write.csv(AllTogetherNow, "COMBINEDPVALS.csv")


Group by and run multiple t tests in R

Dndata frames can only have certain object classes as column types. A
htest is not one of those.
However, we can store lists as list-columns.
If we adapt the current code to output lists htests as results, we can later extract elements of the tests separately.

library(dplyr)

output <- df %>%
select(-ID) %>%
group_by(Age, Group) %>%
summarize_at(
vars(-group_cols(), -Session),
list(t.test = ~ list(t.test(. ~ Session))))

output

# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list>
1 Old Decrease <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
2 Old Increase <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
3 Young Decrease <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
4 Young Increase <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>

With this output data.frame, we can extract individual tests and values from them as desired:

output$RHR_t.test

[[1]]

Welch Two Sample t-test

data: . by Session
t = -1.8965, df = 188.22, p-value = 0.05942
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-3.09118590 0.06082897
sample estimates:
mean in group Post mean in group Pre
62.28902 63.80420


[[2]]

Welch Two Sample t-test

data: . by Session
t = -2.6271, df = 226.21, p-value = 0.009199
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-3.3949577 -0.4848655
sample estimates:
mean in group Post mean in group Pre
57.95946 59.89937


[[3]]

Welch Two Sample t-test

data: . by Session
t = 1.6633, df = 251.75, p-value = 0.0975
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-0.2074028 2.4611194
sample estimates:
mean in group Post mean in group Pre
60.58255 59.45570


[[4]]

Welch Two Sample t-test

data: . by Session
t = 1.5849, df = 208.4, p-value = 0.1145
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-0.244287 2.247775
sample estimates:
mean in group Post mean in group Pre
60.23462 59.23288
output$RHR_t.test %>%
map_dbl('p.value')

[1] 0.059424354 0.009199459 0.097497620 0.114502332

We can also convert these lists to user-friendly tibbles with broom::tidy

output %>%
mutate(across(ends_with('t.test'), map, broom::tidy))

# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.te… Wake.Onset_t.test Hours.in.Bed_t.t… Hours.of.Sleep_… Sleep.Disturbanc… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list>
1 Old Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
2 Old Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
3 Young Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
4 Young Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
# … with 3 more variables: Deep.Sleep.hours_t.test <list>, Light.Sleep.hours_t.test <list>, Awake.hours_t.test <list>

To have all tests "statistics", we can do it like this:

tidy_output %>%
mutate(across(ends_with('t.test'), sapply, pull, 'statistic'))

# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Old Decrease -1.90 0.171 0.684 -0.145 -1.01 -1.02 -1.45 3.05 -0.928 -0.906
2 Old Increase -2.63 0.477 -1.66 -1.96 -2.69 -2.69 -1.16 0.0848 -1.76 -1.87
3 Young Decrease 1.66 1.13 0.281 -0.305 0.0509 -0.0320 1.14 -1.34 -0.675 0.672
4 Young Increase 1.58 0.519 0.195 -1.40 -1.48 -1.17 0.397 -0.821 -1.73 0.886
# … with 3 more variables: Deep.Sleep.hours_t.test <dbl>, Light.Sleep.hours_t.test <dbl>, Awake.hours_t.test <dbl>

Perform t-tests by groups

You should separate the Group column into 2 columns, one indicates ID and the other indicates treatment(T) or control(C) groups.

library(dplyr)
library(tidyr)

df2 <- df %>%
separate(Group, c("ID", "Group"), sep = "_", fill = "right") %>%
mutate(Group = replace_na(Group, "C"))

# > df2
# Cell_line Gene ID Group Values
# 1 A a 1 C 19.00937
# 2 A a 1 C 19.24884
# 3 A a 1 C 17.69836
# 4 A a 1 T 25.38643
# 5 A a 1 T 23.04596
# 6 A a 1 T 24.25100
# ...

Then perform the two sample or paired t-test for each ID:

df2 %>%
group_by(Cell_line, Gene, ID) %>%
group_map(~ t.test(Values ~ Group, .x, paired = TRUE))
Output
[[1]]
Paired t-test

data: Values by Group
t = -6.2599, df = 2, p-value = 0.02458
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-9.407919 -1.743297
sample estimates:
mean difference
-5.575608

[[2]]
Paired t-test

data: Values by Group
t = -8.9412, df = 2, p-value = 0.01228
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-8.261189 -2.893422
sample estimates:
mean difference
-5.577306

[[3]]
Paired t-test

data: Values by Group
t = -1.929, df = 2, p-value = 0.1935
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-11.844963 4.511769
sample estimates:
mean difference
-3.666597


Update

If you want to summarise each group with the p-value of each t-test, try summarise():

df2 %>%
group_by(Cell_line, Gene, ID) %>%
summarise(p.value = t.test(Values ~ Group, paired = TRUE)$p.value) %>%
ungroup()

# # A tibble: 3 × 4
# Cell_line Gene ID p.value
# <chr> <chr> <chr> <dbl>
# 1 A a 1 0.0246
# 2 A a 2 0.0123
# 3 A a 3 0.194


Data
df <- structure(list(Cell_line = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Gene = c("a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a"), Group = c("1", "1", "1", "1_T", "1_T", "1_T",
"2", "2", "2", "2_T", "2_T", "2_T", "3", "3", "3", "3_T", "3_T",
"3_T"), Values = c(19.0093682898042, 19.2488407161094, 17.6983554368874,
25.3864281704297, 23.0459637706291, 24.2509958128999, 18.6843799736362,
20.7674389968636, 18.833524600653, 23.2825845151011, 26.1647404821767,
25.5699355732609, 20.820013126065, 20.2674129364223, 21.3344018769664,
22.4175652694876, 22.2066293870532, 28.7974230636024)), row.names = c(NA,
-18L), class = "data.frame")

A compact way to perform multiple pairwise tests (e.g. t-test) with a single variable split in multiple categories in long-format

This is a little trickier and less straightforward than I hoped. You can first figure out the combinations of locations and, to make it a little simpler, save that in a lookup table. I turned that into a long shape with an ID for each pair, which I'll use as a grouping variable on the data.

library(dplyr)
library(tidyr)
library(purrr)

set.seed(111)
# same data creation code

grps <- as.data.frame(t(combn(levels(sample_long$Location), 2))) %>%
mutate(pair = row_number()) %>%
gather(key, value = loc, -pair) %>%
select(-key)

grps
#> pair loc
#> 1 1 Area_A
#> 2 2 Area_A
#> 3 3 Area_B
#> 4 1 Area_B
#> 5 2 Area_C
#> 6 3 Area_C

Joining the lookup to the data frame doubles the rows—that will differ depending on how many levels you're combining. Note also I dropped your ID column since it didn't seem necessary right now. Nest, do the t-test, and tidy the results.

sample_long %>%
select(-id) %>%
inner_join(grps, by = c("Location" = "loc")) %>%
group_by(pair) %>%
nest() %>%
mutate(t_test = map(data, ~t.test(distro ~ Location, data = .)),
tidied = map(t_test, broom::tidy)) %>%
unnest(tidied)
#> # A tibble: 3 x 13
#> pair data t_test estimate estimate1 estimate2 statistic p.value
#> <int> <lis> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 <tib… <htes… -0.921 31.8 32.7 -0.245 0.816
#> 2 2 <tib… <htes… -1.48 31.8 33.3 -0.383 0.716
#> 3 3 <tib… <htes… -0.563 32.7 33.3 -0.305 0.769
#> # … with 5 more variables: parameter <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, method <chr>, alternative <chr>

If you needed to, you could do something to show which locations are in each pair—joining with the lookup table would be one way to do this.

I'm realizing also that you mentioned wanting to use broom functions afterwards, but didn't specify that you need a broom::tidy call. In that case, just drop the last 2 lines.

Multiple paired permutation t-tests using perm.t.test

You can use combn to get all the combinations of data$t value.

combn(levels(data$t), 2, function(x) {
perm.t.test(exparat~t,data = subset(data, t %in% x), nperm=999, paired = T)
}, simplify = FALSE) -> result

result


Related Topics



Leave a reply



Submit