How to Perform Single Factor Anova in R with Samples Organized by Column

How to perform single factor ANOVA in R with samples organized by column?

You stack them in the long format:

mdat <- stack(mydata)
mdat
values ind
1 1 a
2 3 a
3 4 a
4 6 a
5 8 a
6 3 b
7 6 b
snipped output

> aov( values ~ ind, mdat)
Call:
aov(formula = values ~ ind, data = mdat)

Terms:
ind Residuals
Sum of Squares 18.2 65.6
Deg. of Freedom 3 16

Residual standard error: 2.024846
Estimated effects may be unbalanced

Given the warning it might be safer to use lm:

> anova(lm(values ~ ind, mdat))
Analysis of Variance Table

Response: values
Df Sum Sq Mean Sq F value Pr(>F)
ind 3 18.2 6.0667 1.4797 0.2578
Residuals 16 65.6 4.1000
> summary(lm(values~ind, mdat))

Call:
lm(formula = values ~ ind, data = mdat)

Residuals:
Min 1Q Median 3Q Max
-3.40 -1.25 0.00 0.90 3.60

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.4000 0.9055 4.859 0.000174 ***
indb 0.8000 1.2806 0.625 0.540978
indc -1.2000 1.2806 -0.937 0.362666
indd -1.6000 1.2806 -1.249 0.229491
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.025 on 16 degrees of freedom
Multiple R-squared: 0.2172, Adjusted R-squared: 0.07041
F-statistic: 1.48 on 3 and 16 DF, p-value: 0.2578

And please don't ask me why Excel gives a different answer. Excel has generally been shown to be highly unreliable when it comes to statistics. The onus is on Excel to explain why it doesn't give answers comparable to R.

Edit in response to comments: The Excel Data Analysis Pack ANOVA procedure creates an output but it does not use an Excel function for that process, so when you change the data in the data cells from which it was derived, and then hit F9, or the equivalent menu recalculation command, there will be no change in the output section. This and other sources of user and numerical problems are documented in various pages of David Heiser's efforts at assessing Excel's problems with statistical calculations: http://www.daheiser.info/excel/frontpage.html Heiser started out his efforts which are now at least a decade-long, with the expectation that Microsoft would take responsibility for these errors, but they have consistently ignored his and others' efforts at identifying errors and suggesting better procedures. There was also a 6 section Special Report in the June 2008 issue of "Computational Statistics & Data Analysis" edited by BD McCullough that cover various statistical concerns with Excel.

How to perform single factor ANOVA in R with samples organized by row?

Taking the advice found here, we use stack to produce a dataframe with one value and one indicator variable (ind), then perform the aov:

R1 = c(-5.11,-4.77,-6.64,-6.64,-3.06,3.26,2.34)
R2 = c(-6.64,-6.64,-4.01,0.66,2.7,3.23,3.28)
R3 = c(-2.44,2.49,3.07,-3.65,-6.64,1.64,3.25)

mydata = data.frame(cbind(R1,R2,R3))

mat <- t(mydata)
rownames(mat) <- NULL
colnames(mat) <- letters[seq_len(ncol(mat))]
df <- stack(as.data.frame(mat))

> head(df)
values ind
1 -5.11 a
2 -6.64 a
3 -2.44 a
4 -4.77 b
5 -6.64 b
6 2.49 b

aov(values ~ ind, data = df)

Call:
aov(formula = values ~ ind, data = df)

Terms:
ind Residuals
Sum of Squares 164.5202 179.6335
Deg. of Freedom 6 14

Residual standard error: 3.582033
Estimated effects may be unbalanced

If we need more information, we can also use anova(lm(...)):

test <- anova(lm(values ~ ind, data = df))
summary(test)

Df Sum Sq Mean Sq F value Pr(>F)
Min. : 6 Min. :164.5 Min. :12.83 Min. :2.137 Min. :0.1134
1st Qu.: 8 1st Qu.:168.3 1st Qu.:16.48 1st Qu.:2.137 1st Qu.:0.1134
Median :10 Median :172.1 Median :20.13 Median :2.137 Median :0.1134
Mean :10 Mean :172.1 Mean :20.13 Mean :2.137 Mean :0.1134
3rd Qu.:12 3rd Qu.:175.9 3rd Qu.:23.77 3rd Qu.:2.137 3rd Qu.:0.1134
Max. :14 Max. :179.6 Max. :27.42 Max. :2.137 Max. :0.1134
NA's :1 NA's :1

Edit: The ANOVA won't give you single p-values, but lm will:

summary(lm(values ~ ind, data = df))

Call:
lm(formula = values ~ ind, data = df)

Residuals:
Min 1Q Median 3Q Max
-4.307 -1.797 -0.440 0.550 5.597

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.730 2.068 -2.287 0.0383 *
indb 1.757 2.925 0.601 0.5577
indc 2.203 2.925 0.753 0.4637
indd 1.520 2.925 0.520 0.6114
inde 2.397 2.925 0.819 0.4263
indf 7.440 2.925 2.544 0.0234 *
indg 7.687 2.925 2.628 0.0199 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.582 on 14 degrees of freedom
Multiple R-squared: 0.478, Adjusted R-squared: 0.2543
F-statistic: 2.137 on 6 and 14 DF, p-value: 0.1134

How to do an one way anova with all columns of data fame?

You need first to reshape the data.frame from wide to long format.

The following uses an external package, reshape2, to reshape the data from wide to long format.

molten <- reshape2::melt(D)
head(molten)

model <- lm(value ~ variable, data = molten)
anova(model)

How to perform nested anova by groups using R

If I understood correctly, this example might help you

Libraries

library(tidyverse)
library(broom)

Example

Code

mtcars %>% 
# Create 2 factors variables
mutate(
gear = as.factor(gear),
cyl = as.factor(cyl)
) %>%
# Group and nest data by each level of gear
group_nest(gear) %>%
mutate(
# Apply ANOVA por each level of gear, for drat~cyl
anova = map(.x = data,.f = ~anova(aov(drat~cyl,data = .x))),
# Get results of ANOVA
results = map(anova,tidy)
) %>%
# "Show" the results for each gear level
unnest(results)

Output

# A tibble: 6 x 9
gear data anova term df sumsq meansq statistic p.value
<fct> <list<tibble[,10]>> <list> <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 3 [15 x 10] <anova [2 x 5]> cyl 2 0.414 0.207 3.91 0.0491
2 3 [15 x 10] <anova [2 x 5]> Residuals 12 0.634 0.0529 NA NA
3 4 [12 x 10] <anova [2 x 5]> cyl 1 0.107 0.107 1.10 0.318
4 4 [12 x 10] <anova [2 x 5]> Residuals 10 0.967 0.0967 NA NA
5 5 [5 x 10] <anova [2 x 5]> cyl 2 0.158 0.0790 0.352 0.740
6 5 [5 x 10] <anova [2 x 5]> Residuals 2 0.449 0.225 NA NA

How to perform a two-way mixed ANOVA with three groups, three time points and one dependent variable in multiple columns in R

Welcome to SO! It's looks like you're new, but in the future to get great answers quickly, make sure you include sample data in a format that useable (i.e., the output from dput(head(myData))). Check it out: making R reproducible questions.

I know of two approaches to completing within and between ANOVA. The easier to implement version is from the package ez. The package jmv offers a much more complex write-up but you have an immense amount of control, as well. I'm sure there is more to ez's version, but I haven't worked with that package very much.

I created data to somewhat simulate what you are working with.

library(tidyverse)
library(jmv)
library(ez)

set.seed(35)
df1 <- data.frame(subject = factor(rep(1:15, each = 3)),
time = factor(rep(c(1:3), 15)),
group = factor(rep(1:3, each = 15)),
stim_IVis3 = rnorm(45, .5, .15),
stim_IVis4 = rnorm(45, .51, .125))

head(df1)
summary(df1)

To use ez, it's a pretty straightforward implementation. Although, I couldn't find an option that allowed for multiple dependent variables. Essentially, you would either need to pivot_long or you can use jmv.
For this method, you don't get the post hoc comparisons; the effect size is the generalized η2.

ezANOVA(df1, dv = stim_IVis3, wid = subject, within = time, 
between = group, detailed = T)

# $ANOVA
# Effect DFn DFd SSn SSd F p p<.05 ges
# 1 (Intercept) 1 12 12.58700023 0.3872673 390.0252306 1.616255e-10 * 0.92579702
# 2 group 2 12 0.05853169 0.3872673 0.9068417 4.297644e-01 0.05483656
# 3 time 2 24 0.05417372 0.6215855 1.0458491 3.668654e-01 0.05096178
# 4 group:time 4 24 0.06774352 0.6215855 0.6539102 6.298074e-01 0.06292379
#
# $`Mauchly's Test for Sphericity`
# Effect W p p<.05
# 3 time 0.9914977 0.9541232
# 4 group:time 0.9914977 0.9541232
#
# $`Sphericity Corrections`
# Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
# 3 time 0.9915694 0.3664558 1.187039 0.3668654
# 4 group:time 0.9915694 0.6286444 1.187039 0.6298074

Now to use jmv, you will need to pivot the data wider. Your within-subject data needs to be in separate columns. Since time is the factor that represents with the within-subject values in the stim... columns, that's what you need to pivot.

df2 <- df1 %>% 
pivot_wider(id_cols = c(subject, group),
names_from = time,
values_from = starts_with("stim"),
names_vary = "fastest")

head(df2)
# # A tibble: 6 × 8
# subject group stim_IVis3_1 stim_IVis3_2 stim_IVis3_3 stim_IVis4_1 stim_IVis4_2 stim_IVis4_3
# <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0.497 0.505 0.601 0.417 0.776 0.660
# 2 2 1 0.584 0.408 0.737 0.454 0.670 0.650
# 3 3 1 0.741 0.399 0.450 0.306 0.665 0.577
# 4 4 1 0.223 0.450 0.350 0.522 0.342 0.417
# 5 5 1 0.717 0.661 0.581 0.601 0.700 0.686
# 6 6 2 0.534 0.780 0.443 0.420 0.565 0.739

Now you're ready for jmv. bs equates to between-subjects; rm equates to repeated measures.

fit = anovaRM(data = df2, 
ss = "3", # type of SS (1, 2, or 3)
bs = list("group"), # between subjects (links with other parameters this way)
bsTerms = list("group"),
rm = list(list(label = "stim_IVs", # within subjects
levels = c(names(df2)[3:8]))), # could also write c("stim_IVs3_1", "stim_IVs3_2", "stim_IVs3_3")
rmCells = list(list(measure = names(df2)[3], # could also write "stim_IVs3_1"
cell = names(df2)[3]), # the group associated in 'rm' level (I used the column name)
list(measure = names(df2)[4],
cell = names(df2)[4]),
list(measure = names(df2)[5],
cell = names(df2)[5]),
list(measure = names(df2)[6],
cell = names(df2)[6]),
list(measure = names(df2)[7],
cell = names(df2)[7]),
list(measure = names(df2)[8],
cell = names(df2)[8])
),
rmTerms = list("stim_IVs"), # groups variable for repeated/within measures
emMeans = list(list("stim_IVs", "group")), # all grouping vars in the ANOVA (for the em tables)
emmPlots = T, # show em plots
emmTables = T, # show em tables
effectSize = c("omega", "partEta"), # multiple options here, see help
spherTests = T, # use correction
spherCorr = c("none", "GG", "HF"), # multiple options here, see help
leveneTest = T, # check homogeneity (p GREATER than .05 is good)
qq = T, # plot normality validation qq plot
postHoc = list("group",c("group","stim_IVs"),"stim_IVs"),
postHocCorr = "tukey") # use TukeyHSD

If you print fit, you're going to get a ton of information. In addition to the content provided by ezANOVA, this provides the post hoc comparisons of each group, time, dependent variable, and the intermix of each; the results of the statistical assumptions: Levene's for homogeneity and a QQ plot of the residuals for normality; and the estimated marginal means table and a plot of those means.

I realize that you have more fields in your data than what I have here. I believe I've provided enough for you to build off of.

I suggest that you determine what you want to learn from the data and choose the algorithm from there.

If you have any questions, let me know.



Related Topics



Leave a reply



Submit