Doing T.Test for Columns for Each Row in Data Set

T test in R for each row

You can try

lapply(data, function(x) t.test(x[5:6], x[7:8]))

Or a modification of the for loop by allocating the results to a list 't1'

t1 <- vector('list', length(data))
for(i in 1:length(data)){
var1 <- data[[i]][5:6]
var2<- data[[i]][7:8]
t1[[i]] <- t.test(var1,var2)
}

t1

data

set.seed(24)
data <- lapply(1:3, function(i) as.data.frame(matrix(sample(0:9,
20*10, replace=TRUE), ncol=10)))

One-sample T-test Over Multiple Columns with Multiple mu Values in R

To iterate over every combination of each column and mu value and simply print out the results of all t-tests the purrr::cross2 function would give you a list of all column/mu combinations and purrr::map would loop over the tests:

library(purrr)

t1 <- rnorm(20, 10, 1)
t2 <- rnorm(20, 10, 1)
t3 <- rnorm(20, 10, 1)
test_data <- data.frame(t1, t2, t3)

onett <- function(data) {
muvals <- c(24, 51.8, 21.89)
map(cross2(data, muvals), ~ t.test(.x[[1]], mu = .x[[2]]))
}

onett(test_data)
#> Prints t-test results...

Edit #1

From your clarification of question, it looks like map2 would do the simultaneous iteration over two objects the same length. To make a function you'd pass the data to, I'd suggest something like the following:

library(purrr)
library(dplyr)
library(tidyr)

t1 <- rnorm(20, 10, 1)
t2 <- rnorm(20, 10, 1)
t3 <- rnorm(20, 10, 1)
test_data <- data.frame(t1, t2, t3)

# (Can work best to have `muvals` defined in function rather than environment)

onett <- function(data, muvals = c(24, 51.8, 21.89)) {
map2(data, muvals, function(data, mu) t.test(data, mu = mu))
}

onett(test_data) %>%
map_dfr(broom::tidy)

#> # A tibble: 3 x 8
#> estimate statistic p.value parameter conf.low conf.high method alternative
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 10.1 -50.4 1.07e-21 19 9.50 10.7 One Samp~ two.sided
#> 2 10.3 -187. 1.65e-32 19 9.83 10.8 One Samp~ two.sided
#> 3 9.99 -47.8 2.87e-21 19 9.47 10.5 One Samp~ two.sided

The function outputs the list of t-test results. You can used broom::tidy to extract all t statistics, p-values etc. (shown above), or incorporate that into the function, or tidy the output within the function to give what you need.

Created on 2021-12-04 by the reprex package (v2.0.1)

perform t-test on specific columns for each row in data.frame

I think you're just using the wrong apply family member.

Can you try this and see if it gives you what you're looking for?

apply(WW_Summary[, c(2, 8)], 1, 
function(temp) unlist(t.test(temp, aslternative = c("two.sided"))
[c("statistic", "parameter", "p.value", "conf.int")]))

Update

@dickoa is correct: you're probably doing the wrong calculation here. Still, the same concept applies:

data.frame(cbind(WW_Summary[1], 
t(apply(WW_Summary[, c(2:4, 8:10)], 1, function(temp)
unlist(
tsum.test(mean.x = temp[[1]], s.x = temp[[2]], n.x = temp[[3]],
mean.y = temp[[4]], s.y = temp[[5]], n.y = temp[[6]]))
[c("statistic.t", "parameters.df", "p.value",
"conf.int1", "conf.int2")]))))
# Trait statistic.t parameters.df p.value
# 1 Morph PC1 10.7920944667109 1102.17477516966 6.99739270551733e-26
# 2 Morph PC2 -6.40501752763609 1119.8038108643 2.20872274986877e-10
# 3 Morph PC3 -4.8221965806503 1131.93025335657 1.61345381252079e-06
# 4 Morph PC4 5.51685228304417 1116.04949237415 4.28798959831121e-08
# 5 Colour 7.40032254940697 1083.43427031755 2.71950155801888e-13
# 6 Delta15N -17.6468194524627 923.361537684413 2.79180235004071e-60
# 7 Delta13C -3.47262865160519 949.662208494884 0.000538633884372937
# conf.int1 conf.int2
# 1 0.669552939095012 0.967117133675934
# 2 -0.48878427646537 -0.259544540361528
# 3 -0.2825915783163 -0.119136078036999
# 4 0.119393147514491 0.251194052780122
# 5 0.574117940682135 0.988415271573606
# 6 -2.22952047960588 -1.78325235511202
# 7 -0.645846712936065 -0.179451207860735

T-test for multiple rows in R

Something like this?

apply(df,1,function(x){t.test(x[2:21],x[22:41])})

To save the test statistic or p-value in a new column you could do

df$st=apply(df,1,function(x){t.test(x[2:21],x[22:41])$stat})

or $p.value

trying to perform a t.test for each row and count all rows where p-value is less than 0.05

One option is to loop over the data set calculating the t test for each row, but it is not as elegant.

set.seed(2112)
DataSample <- matrix(rnorm(24000),nrow=1000)
colnames(DataSample) <- c(paste("Trial",1:12,sep=""),paste("Control",13:24,sep=""))

# initialize vector of stored p-values
pvalue <- rep(0,nrow(DataSample))

for (i in 1:nrow(DataSample)){
pvalue[i] <- t.test(DataSample[i,1:12],DataSample[i,13:24])$p.value
}
# finding number that are significant
sum(pvalue < 0.05)

Two sample t-test for every individual row in Python

I written your output above to two tab delimited files, and I read it in below, and add a column to indicate the dataframe or table it is from:

import pandas as pd
from scipy.stats import ttest_ind
t1 = pd.read_csv("../t1.csv",names=['V1','V2','V3'],sep="\t")
t1['data'] = 'data1'
t2 = pd.read_csv("../t2.csv",names=['V1','V2','V3'],sep="\t")
t2['data'] = 'data2'

V1 V2 V3 data
0 T1 X1 0.93 data1
1 T1 X2 0.30 data1
2 T1 X3 -2.90 data1
3 T2 X1 1.30 data1

Then we concatenate them and calculating the mean is straight forward:

df = pd.concat([t1,t2])
res = df.groupby("V2").apply(lambda x:x['V3'].groupby(x['data']).mean())
data data1 data2
V2
X1 1.026 1.700
X2 0.180 -0.784
X3 0.340 0.836

p.value requires a bit more coding within the apply:

res['pvalue'] = df.groupby("V2").apply(lambda x:
ttest_ind(x[x['data']=="data1"]["V3"],x[x['data']=="data2"]["V3"])[1])
data data1 data2 pvalue
V2
X1 1.026 1.700 0.316575
X2 0.180 -0.784 0.521615
X3 0.340 0.836 0.657752

You can always choose to do res.reset_index() to get a table..



Related Topics



Leave a reply



Submit