Get the mean across list of dataframes by rows
A simple way would be to cbind
the list and calculate mean of each row with rowMeans
rowMeans(do.call(cbind, myLs))
#[1] 5 2 1
We can also use bind_cols
from dplyr
to combine all the dataframes.
rowMeans(dplyr::bind_cols(myLs))
Calculate mean of each row in a large list of dataframes in R
We may bind the list
elements to a single data and then use a group by mean
operation
library(dplyr)
bind_rows(lst1) %>%
group_by(id) %>%
summarise(value_mean = mean(value, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 3 x 2
id value_mean
<chr> <dbl>
1 id1 0.25
2 id2 0.25
3 id3 0.5
If the datasets have a the same dimension and the 'id' are in same order, extract the 'value' column, use Reduce
to do elementwise +
and divide by the length
of list
Reduce(`+`, lapply(lst1, `[[`, "value"))/length(lst1)
[1] 0.25 0.25 0.50
Or a more efficient approach is with dapply/t_list
from collapse
library(collapse)
dapply(t_list(dapply(lst1, `[[`, "value")), fmean)
V1 V2 V3
0.25 0.25 0.50
Return a dataframe of averages from a list of dataframes
After the hint from @tom above the final solution arrived at was to change the list of data frames to a single data frame with all data and use the tidyverse to process it.
There were a few little tidy ups needed.
- An errant character column from the origin of the data
- A column with data in both upper and lower case
- Avoiding the character columns in the mean calculation
- Then putting the character columns and the mean data frame back together to get it back in the correct order.
So...
Change the format to a single data frame and fix the non-numeric column
myfiles3 <- myfiles2 %>%
bind_rows() %>%
transform(EdgeStepL2 = as.numeric(EdgeStepL2))
ensure the section names are in uppercase to be consistent
myfiles3$Section <- str_to_upper(myfiles3$Section)
calculate the mean of each cell grouped by common values.
myfiles4 <- myfiles3 %>% group_by(Section,Chainage) %>%
summarise_at(vars("East":"Surf.Det"),funs(mean(., na.rm = TRUE)))
myfiles5 <- data.frame(myfiles2[[1]][1:2])
myfiles6 <- left_join(myfiles5, myfiles4)
This is not the simple solution I had hoped for but for the next person to try this.
Look for the NA's (everywhere in the data).
Make sure that all the columns you are running the mean (or other function) on are those you can calculate with.
Means from a list of data frames in R
You can use lapply
and pass indices as follows:
ids <- seq(3, 54, by=3)
out <- do.call(rbind, lapply(ids, function(idx) {
t <- unlist(x[[idx]][, -1])
c(mean(t), var(t))
}))
How do I make a list of data frames?
This isn't related to your question, but you want to use =
and not <-
within the function call. If you use <-
, you'll end up creating variables y1
and y2
in whatever environment you're working in:
d1 <- data.frame(y1 <- c(1, 2, 3), y2 <- c(4, 5, 6))
y1
# [1] 1 2 3
y2
# [1] 4 5 6
This won't have the seemingly desired effect of creating column names in the data frame:
d1
# y1....c.1..2..3. y2....c.4..5..6.
# 1 1 4
# 2 2 5
# 3 3 6
The =
operator, on the other hand, will associate your vectors with arguments to data.frame
.
As for your question, making a list of data frames is easy:
d1 <- data.frame(y1 = c(1, 2, 3), y2 = c(4, 5, 6))
d2 <- data.frame(y1 = c(3, 2, 1), y2 = c(6, 5, 4))
my.list <- list(d1, d2)
You access the data frames just like you would access any other list element:
my.list[[1]]
# y1 y2
# 1 1 4
# 2 2 5
# 3 3 6
Calculate mean for each row across a list of dataframes in R
Using base
functions, you could extract all the value
columns into a matrix and use row means:
rowMeans(sapply(list, "[[", "value"))
For you sample data, you'd need to also convert to numeric (as below), but I'm hoping your real data has numbers not factors.
rowMeans(sapply(lapply(list, "[[", "value"), function(x) as.numeric(as.character(x))))
This just gives the values (and assumes the rows are in the right order). You can add the sample names with cbind
, e.g., cbind(list[[1]][["sample"]], rowMeans(...))
.
Related Topics
How to Flatten The Data of Different Data Types by Using Sparklyr Package
The Fastest Way to Convert Numeric to Character in R
Overlapped Density Plots in Ggplot2
Blockwise Sum of Matrix Elements
How to Calculate Euclidean Distance Between Two Matrices in R
Combining Date and Time into a Date Column for Plotting
Ggplot2 Ggsave Function Causes Graphics Device to Not Display Plots
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
Total of a Column in Dt Datatables in Shiny
How to Place +/- Plus Minus Operator in Text Annotation of Plot (Ggplot2)
R Ddply with Multiple Variables
Tiff Plot Generation and Compression: R VS. Gimp VS. Irfanview VS. Photoshop File Sizes
Change Thickness of a Marker in Ggplot2
Passing Ellipsis Arguments to Map Function Purrr Package, R
What Happens When Prob Argument in Sample Sums to Less/Greater Than 1