Perform multiple paired t-tests based on groups/categories
One way to do it is to use by
:
result <- by(data, data$Product_type, function(x)
t.test(x$Price_Online, x$Price_Offline, mu=0, alt="two.sided",
paired=TRUE, conf.level=0.99)[c(1:9)])
To get your results in a dataframe, you have to rbind
it:
type.convert(as.data.frame(do.call(rbind, result)), as.is=TRUE)
# statistic parameter p.value conf.int estimate null.value stderr alternative method
# A 2.267787 2 0.1514719 -20.25867, 32.25867 6 0 2.645751 two.sided Paired t-test
# B -0.06666667 1 0.9576214 -477.9256, 476.9256 -0.5 0 7.5 two.sided Paired t-test
# C 1.073154 3 0.3618456 -9.996192, 14.496192 2.25 0 2.096624 two.sided Paired t-test
Or, using pipes:
do.call(rbind, result) |> as.data.frame() |> type.convert(as.is=TRUE)
Data
data <- structure(list(Product_type = c("A", "B", "B", "A", "C", "C",
"C", "A", "C"), Price_Online = c(48L, 29L, 32L, 38L, 32L, 31L,
28L, 47L, 40L), Price_Offline = c(37L, 22L, 40L, 36L, 27L, 35L,
24L, 42L, 36L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9"))
R - Multiple t-tests, multiple groups against different controls
Answering my own question here. I actually figured out that, for my experiments, there wasn't equal variance between the different concentration groups. I ended up splitting them into individual dataframes, then running ANOVA on each separately, getting a list of p-values with Tukey's then recombining the list. Inevitably, there's a multiple comparisons problem I run into with that but it wouldn't have worked individually and I can always apply a separate post-hoc correction after. Code looked a little like the below. I could have used for loops for after the ANOVA steps but I didn't have many concentrations so I just didn't bother with it. It performs ANOVA by all permutations respective to each concentration, but I by selecting [1:4,] during the transform step, it only shows the p-values of the permutations containing Sample A.
##Using melt tutorial from here: https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format
##Using ANOVA tutorial from here: http://www.sthda.com/english/wiki/two-way-anova-test-in-r
setwd("c:/R/STACKOVERFLOW")
library(data.table)
library(ggpubr)
sample <- read.csv("DATA.csv", check.names = FALSE, fileEncoding = 'UTF-8-BOM')
View(sample)
longo <- melt(setDT(sample), id.vars = c("Concentration"), variable.name = "Name")
View(longo)
ggboxplot(longo, x = "Concentration", y = "Output", color = "Name")
ggline(longo, x = "Concentration", y = "Output", color = "Name",
add = c("mean_se", "dotplot"))
##Split these out so ANOVA can be performed on individual concentrations
ConSplit <- split(longo, f = longo$Concentration, drop = TRUE)
list2env(ConSplit,envir=.GlobalEnv)
#16
D16 <- as.data.frame(`16`)
A16<-aov(Output ~ Name, data = D16)
summary(A16)
T16 <- TukeyHSD(A16)
##Whatever, let's just make all of em
#4
D4 <- as.data.frame(`4`)
A4<-aov(Output ~ Name, data = D4)
summary(A4)
T4 <- TukeyHSD(A4)
#2
D2 <- as.data.frame(`2`)
A2<-aov(Output ~ Name, data = D2)
summary(A2)
T2 <- TukeyHSD(A2)
### LET'S OUTPUT THIS IN A USEABLE MANNER
data16 <- as.data.frame(T16[1])
data4 <- as.data.frame(T4[1])
data2 <- as.data.frame(T2[1])
#make dataframes for merge
M16 <- data.frame(Concentration = c(16))
M4 <- data.frame(Concentration = c(4))
M2 <- data.frame(Concentration = c(2))
#merge em
B16 <- transform(merge(data16[1:4,],M16,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B4 <- transform(merge(data4[1:4,],M4,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
B2 <- transform(merge(data2[1:4,],M2,by=0,all=TRUE), row.names=Row.names, Row.names=NULL)
#Stack em
AllTogetherNow <- rbind(B16,B4,B2)
#See em
View(AllTogetherNow)
#Write em
write.csv(AllTogetherNow, "COMBINEDPVALS.csv")
Group by and run multiple t tests in R
Dndata frames can only have certain object classes as column types. Ahtest
is not one of those.
However, we can store lists as list-columns.
If we adapt the current code to output lists htests as results, we can later extract elements of the tests separately.
library(dplyr)
output <- df %>%
select(-ID) %>%
group_by(Age, Group) %>%
summarize_at(
vars(-group_cols(), -Session),
list(t.test = ~ list(t.test(. ~ Session))))
output
# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list>
1 Old Decrease <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
2 Old Increase <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
3 Young Decrease <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
4 Young Increase <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest> <htest>
With this output
data.frame, we can extract individual tests and values from them as desired:
output$RHR_t.test
[[1]]
Welch Two Sample t-test
data: . by Session
t = -1.8965, df = 188.22, p-value = 0.05942
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-3.09118590 0.06082897
sample estimates:
mean in group Post mean in group Pre
62.28902 63.80420
[[2]]
Welch Two Sample t-test
data: . by Session
t = -2.6271, df = 226.21, p-value = 0.009199
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-3.3949577 -0.4848655
sample estimates:
mean in group Post mean in group Pre
57.95946 59.89937
[[3]]
Welch Two Sample t-test
data: . by Session
t = 1.6633, df = 251.75, p-value = 0.0975
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-0.2074028 2.4611194
sample estimates:
mean in group Post mean in group Pre
60.58255 59.45570
[[4]]
Welch Two Sample t-test
data: . by Session
t = 1.5849, df = 208.4, p-value = 0.1145
alternative hypothesis: true difference in means between group Post and group Pre is not equal to 0
95 percent confidence interval:
-0.244287 2.247775
sample estimates:
mean in group Post mean in group Pre
60.23462 59.23288
output$RHR_t.test %>%
map_dbl('p.value')
[1] 0.059424354 0.009199459 0.097497620 0.114502332
We can also convert these lists to user-friendly tibbles with broom::tidy
output %>%
mutate(across(ends_with('t.test'), map, broom::tidy))
# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.te… Wake.Onset_t.test Hours.in.Bed_t.t… Hours.of.Sleep_… Sleep.Disturbanc… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list>
1 Old Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
2 Old Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
3 Young Decrease <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
4 Young Increase <tibble [1 × 10]> <tibble [1 … <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 × 10]> <tibble [1 × 10… <tibble [1 ×… <tibble [1 × 10…
# … with 3 more variables: Deep.Sleep.hours_t.test <list>, Light.Sleep.hours_t.test <list>, Awake.hours_t.test <list>
To have all tests "statistics", we can do it like this:
tidy_output %>%
mutate(across(ends_with('t.test'), sapply, pull, 'statistic'))
# A tibble: 4 × 15
# Groups: Age [2]
Age Group RHR_t.test HRV_t.test Sleep.Onset_t.test Wake.Onset_t.test Hours.in.Bed_t.test Hours.of.Sleep_t.test Sleep.Disturbance… Latency.min_t.t… Cycles_t.test REM.Sleep.hours…
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Old Decrease -1.90 0.171 0.684 -0.145 -1.01 -1.02 -1.45 3.05 -0.928 -0.906
2 Old Increase -2.63 0.477 -1.66 -1.96 -2.69 -2.69 -1.16 0.0848 -1.76 -1.87
3 Young Decrease 1.66 1.13 0.281 -0.305 0.0509 -0.0320 1.14 -1.34 -0.675 0.672
4 Young Increase 1.58 0.519 0.195 -1.40 -1.48 -1.17 0.397 -0.821 -1.73 0.886
# … with 3 more variables: Deep.Sleep.hours_t.test <dbl>, Light.Sleep.hours_t.test <dbl>, Awake.hours_t.test <dbl>
Perform t-tests by groups
You should separate the Group
column into 2 columns, one indicates ID
and the other indicates treatment(T) or control(C) groups.
library(dplyr)
library(tidyr)
df2 <- df %>%
separate(Group, c("ID", "Group"), sep = "_", fill = "right") %>%
mutate(Group = replace_na(Group, "C"))
# > df2
# Cell_line Gene ID Group Values
# 1 A a 1 C 19.00937
# 2 A a 1 C 19.24884
# 3 A a 1 C 17.69836
# 4 A a 1 T 25.38643
# 5 A a 1 T 23.04596
# 6 A a 1 T 24.25100
# ...
Then perform the two sample or paired t-test for each ID
:
df2 %>%
group_by(Cell_line, Gene, ID) %>%
group_map(~ t.test(Values ~ Group, .x, paired = TRUE))
Output
[[1]]
Paired t-test
data: Values by Group
t = -6.2599, df = 2, p-value = 0.02458
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-9.407919 -1.743297
sample estimates:
mean difference
-5.575608
[[2]]
Paired t-test
data: Values by Group
t = -8.9412, df = 2, p-value = 0.01228
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-8.261189 -2.893422
sample estimates:
mean difference
-5.577306
[[3]]
Paired t-test
data: Values by Group
t = -1.929, df = 2, p-value = 0.1935
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-11.844963 4.511769
sample estimates:
mean difference
-3.666597
Update
If you want to summarise each group with the p-value of each t-test, try summarise()
:
df2 %>%
group_by(Cell_line, Gene, ID) %>%
summarise(p.value = t.test(Values ~ Group, paired = TRUE)$p.value) %>%
ungroup()
# # A tibble: 3 × 4
# Cell_line Gene ID p.value
# <chr> <chr> <chr> <dbl>
# 1 A a 1 0.0246
# 2 A a 2 0.0123
# 3 A a 3 0.194
Data
df <- structure(list(Cell_line = c("A", "A", "A", "A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Gene = c("a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a"), Group = c("1", "1", "1", "1_T", "1_T", "1_T",
"2", "2", "2", "2_T", "2_T", "2_T", "3", "3", "3", "3_T", "3_T",
"3_T"), Values = c(19.0093682898042, 19.2488407161094, 17.6983554368874,
25.3864281704297, 23.0459637706291, 24.2509958128999, 18.6843799736362,
20.7674389968636, 18.833524600653, 23.2825845151011, 26.1647404821767,
25.5699355732609, 20.820013126065, 20.2674129364223, 21.3344018769664,
22.4175652694876, 22.2066293870532, 28.7974230636024)), row.names = c(NA,
-18L), class = "data.frame")
A compact way to perform multiple pairwise tests (e.g. t-test) with a single variable split in multiple categories in long-format
This is a little trickier and less straightforward than I hoped. You can first figure out the combinations of locations and, to make it a little simpler, save that in a lookup table. I turned that into a long shape with an ID for each pair, which I'll use as a grouping variable on the data.
library(dplyr)
library(tidyr)
library(purrr)
set.seed(111)
# same data creation code
grps <- as.data.frame(t(combn(levels(sample_long$Location), 2))) %>%
mutate(pair = row_number()) %>%
gather(key, value = loc, -pair) %>%
select(-key)
grps
#> pair loc
#> 1 1 Area_A
#> 2 2 Area_A
#> 3 3 Area_B
#> 4 1 Area_B
#> 5 2 Area_C
#> 6 3 Area_C
Joining the lookup to the data frame doubles the rows—that will differ depending on how many levels you're combining. Note also I dropped your ID column since it didn't seem necessary right now. Nest, do the t-test, and tidy the results.
sample_long %>%
select(-id) %>%
inner_join(grps, by = c("Location" = "loc")) %>%
group_by(pair) %>%
nest() %>%
mutate(t_test = map(data, ~t.test(distro ~ Location, data = .)),
tidied = map(t_test, broom::tidy)) %>%
unnest(tidied)
#> # A tibble: 3 x 13
#> pair data t_test estimate estimate1 estimate2 statistic p.value
#> <int> <lis> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 <tib… <htes… -0.921 31.8 32.7 -0.245 0.816
#> 2 2 <tib… <htes… -1.48 31.8 33.3 -0.383 0.716
#> 3 3 <tib… <htes… -0.563 32.7 33.3 -0.305 0.769
#> # … with 5 more variables: parameter <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, method <chr>, alternative <chr>
If you needed to, you could do something to show which locations are in each pair—joining with the lookup table would be one way to do this.
I'm realizing also that you mentioned wanting to use broom
functions afterwards, but didn't specify that you need a broom::tidy
call. In that case, just drop the last 2 lines.
Multiple paired permutation t-tests using perm.t.test
You can use combn
to get all the combinations of data$t
value.
combn(levels(data$t), 2, function(x) {
perm.t.test(exparat~t,data = subset(data, t %in% x), nperm=999, paired = T)
}, simplify = FALSE) -> result
result
Related Topics
Split the Title Onto Multiple Lines
How Does Cut with Breaks Work in R
How to Specify a Dynamic Position for the Start of Substring
What You Can Do with a Data.Frame That You Can't with a Data.Table
R Color Palettes for Many Data Classes
How to Suppress the Vertical Gridlines in a Ggplot2 Plot
Ggplot2 Bar Plot, No Space Between Bottom of Geom and X Axis Keep Space Above
How to 'Source()' and Continue After an Error
Apply a Ggplot-Function Per Group with Dplyr and Set Title Per Group
How to Delete Rows from a Data.Frame, Based on an External List, Using R
Insert a Blank Row After Each Group of Data
Converting a Character String into a Date in R
R - Markdown Avoiding Package Loading Messages
Simple Way to Subset Spatialpolygonsdataframe (I.E. Delete Polygons) by Attribute in R
How Does One Stop Using Rowwise in Dplyr