T test in R for each row
You can try
lapply(data, function(x) t.test(x[5:6], x[7:8]))
Or a modification of the for
loop by allocating the results to a list 't1'
t1 <- vector('list', length(data))
for(i in 1:length(data)){
var1 <- data[[i]][5:6]
var2<- data[[i]][7:8]
t1[[i]] <- t.test(var1,var2)
}
t1
data
set.seed(24)
data <- lapply(1:3, function(i) as.data.frame(matrix(sample(0:9,
20*10, replace=TRUE), ncol=10)))
One-sample T-test Over Multiple Columns with Multiple mu Values in R
To iterate over every combination of each column and mu value and simply print out the results of all t-tests the purrr::cross2
function would give you a list of all column/mu combinations and purrr::map
would loop over the tests:
library(purrr)
t1 <- rnorm(20, 10, 1)
t2 <- rnorm(20, 10, 1)
t3 <- rnorm(20, 10, 1)
test_data <- data.frame(t1, t2, t3)
onett <- function(data) {
muvals <- c(24, 51.8, 21.89)
map(cross2(data, muvals), ~ t.test(.x[[1]], mu = .x[[2]]))
}
onett(test_data)
#> Prints t-test results...
Edit #1
From your clarification of question, it looks like map2
would do the simultaneous iteration over two objects the same length. To make a function you'd pass the data to, I'd suggest something like the following:
library(purrr)
library(dplyr)
library(tidyr)
t1 <- rnorm(20, 10, 1)
t2 <- rnorm(20, 10, 1)
t3 <- rnorm(20, 10, 1)
test_data <- data.frame(t1, t2, t3)
# (Can work best to have `muvals` defined in function rather than environment)
onett <- function(data, muvals = c(24, 51.8, 21.89)) {
map2(data, muvals, function(data, mu) t.test(data, mu = mu))
}
onett(test_data) %>%
map_dfr(broom::tidy)
#> # A tibble: 3 x 8
#> estimate statistic p.value parameter conf.low conf.high method alternative
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 10.1 -50.4 1.07e-21 19 9.50 10.7 One Samp~ two.sided
#> 2 10.3 -187. 1.65e-32 19 9.83 10.8 One Samp~ two.sided
#> 3 9.99 -47.8 2.87e-21 19 9.47 10.5 One Samp~ two.sided
The function outputs the list of t-test results. You can used broom::tidy
to extract all t statistics, p-values etc. (shown above), or incorporate that into the function, or tidy the output within the function to give what you need.
Created on 2021-12-04 by the reprex package (v2.0.1)
perform t-test on specific columns for each row in data.frame
I think you're just using the wrong apply
family member.
Can you try this and see if it gives you what you're looking for?
apply(WW_Summary[, c(2, 8)], 1,
function(temp) unlist(t.test(temp, aslternative = c("two.sided"))
[c("statistic", "parameter", "p.value", "conf.int")]))
Update
@dickoa is correct: you're probably doing the wrong calculation here. Still, the same concept applies:
data.frame(cbind(WW_Summary[1],
t(apply(WW_Summary[, c(2:4, 8:10)], 1, function(temp)
unlist(
tsum.test(mean.x = temp[[1]], s.x = temp[[2]], n.x = temp[[3]],
mean.y = temp[[4]], s.y = temp[[5]], n.y = temp[[6]]))
[c("statistic.t", "parameters.df", "p.value",
"conf.int1", "conf.int2")]))))
# Trait statistic.t parameters.df p.value
# 1 Morph PC1 10.7920944667109 1102.17477516966 6.99739270551733e-26
# 2 Morph PC2 -6.40501752763609 1119.8038108643 2.20872274986877e-10
# 3 Morph PC3 -4.8221965806503 1131.93025335657 1.61345381252079e-06
# 4 Morph PC4 5.51685228304417 1116.04949237415 4.28798959831121e-08
# 5 Colour 7.40032254940697 1083.43427031755 2.71950155801888e-13
# 6 Delta15N -17.6468194524627 923.361537684413 2.79180235004071e-60
# 7 Delta13C -3.47262865160519 949.662208494884 0.000538633884372937
# conf.int1 conf.int2
# 1 0.669552939095012 0.967117133675934
# 2 -0.48878427646537 -0.259544540361528
# 3 -0.2825915783163 -0.119136078036999
# 4 0.119393147514491 0.251194052780122
# 5 0.574117940682135 0.988415271573606
# 6 -2.22952047960588 -1.78325235511202
# 7 -0.645846712936065 -0.179451207860735
T-test for multiple rows in R
Something like this?
apply(df,1,function(x){t.test(x[2:21],x[22:41])})
To save the test statistic or p-value in a new column you could do
df$st=apply(df,1,function(x){t.test(x[2:21],x[22:41])$stat})
or $p.value
trying to perform a t.test for each row and count all rows where p-value is less than 0.05
One option is to loop over the data set calculating the t test for each row, but it is not as elegant.
set.seed(2112)
DataSample <- matrix(rnorm(24000),nrow=1000)
colnames(DataSample) <- c(paste("Trial",1:12,sep=""),paste("Control",13:24,sep=""))
# initialize vector of stored p-values
pvalue <- rep(0,nrow(DataSample))
for (i in 1:nrow(DataSample)){
pvalue[i] <- t.test(DataSample[i,1:12],DataSample[i,13:24])$p.value
}
# finding number that are significant
sum(pvalue < 0.05)
Two sample t-test for every individual row in Python
I written your output above to two tab delimited files, and I read it in below, and add a column to indicate the dataframe or table it is from:
import pandas as pd
from scipy.stats import ttest_ind
t1 = pd.read_csv("../t1.csv",names=['V1','V2','V3'],sep="\t")
t1['data'] = 'data1'
t2 = pd.read_csv("../t2.csv",names=['V1','V2','V3'],sep="\t")
t2['data'] = 'data2'
V1 V2 V3 data
0 T1 X1 0.93 data1
1 T1 X2 0.30 data1
2 T1 X3 -2.90 data1
3 T2 X1 1.30 data1
Then we concatenate them and calculating the mean is straight forward:
df = pd.concat([t1,t2])
res = df.groupby("V2").apply(lambda x:x['V3'].groupby(x['data']).mean())
data data1 data2
V2
X1 1.026 1.700
X2 0.180 -0.784
X3 0.340 0.836
p.value requires a bit more coding within the apply:
res['pvalue'] = df.groupby("V2").apply(lambda x:
ttest_ind(x[x['data']=="data1"]["V3"],x[x['data']=="data2"]["V3"])[1])
data data1 data2 pvalue
V2
X1 1.026 1.700 0.316575
X2 0.180 -0.784 0.521615
X3 0.340 0.836 0.657752
You can always choose to do res.reset_index()
to get a table..
Related Topics
Quickest Way to Read a Subset of Rows of a CSV
Top to Bottom Alignment of Two Ggplot2 Figures
Generate All Combinations, of All Lengths, in R, from a Vector
Unique.Data.Table Select Last Row in Place of the First
Changing Class and Mode from Character to Numeric
How to Start Ggplot2 Geom_Bar from Different Origin
Possible Issue About Random Number Generator
Subset() a Factor by Its Number of Observation
R Ggplot2: Labeling a Horizontal Line Without Associating the Label with a Series
Create Line Graph with Ggplot2, Using Time Periods as X-Variable
Integrate a Very Peaked Function
Updating a Subset of a Dataframe
Change All Columns from Factor to Numeric in R
How to Create Group Indices for Nested Groups in R
Directly Adding Titles and Labels to Visnetwork
Fastest Way to Find *The Index* of the Second (Third...) Highest/Lowest Value in Vector or Column