How to Merge Multiple Data.Frames and Sum and Average Columns at the Same Time in R

How to merge multiple data.frames and sum and average columns at the same time in R

I think your second approach is the way to go, and you can do that with data.table or dplyr.

Here a few steps using data.table. First, if your data frames are abc, def, ...
do:

DF <- do.call(rbind, list(abc,def,...))

now you can transform them into a data.table

DT <- data.table(DF)

and simply do something like

DTres <- DT[,.(A=sum(A, na.rm=T), B=sum(B, na.rm=T), C=mean(C,na.rm=T)),by=name]

double check the data.table vignettes to get a better idea how that package work.

Combine Multiple Dataframes in R by Average (Mixed datatypes)

Put a ROW ID on your tables

df_1 <- read_table("A       B       C
2.3     5       3
12      3       1
0.4     13      2") %>% 
  rowid_to_column("ROW") 

df_2 <- read_table("A       B       C
4.3     23      1
1       7       2
0.4     10      2") %>% 
  rowid_to_column("ROW") 

df_3 <- read_table("A       B       C
1.3      3      3
2.2      4      2
12.4     10     1") %>% 
  rowid_to_column("ROW")

Bind them together in an ensemble

ensamb <- bind_rows(df_1, df_2, df_3)

group_by row and then summarize each one by its own method

ensamb %>% 
  group_by(ROW) %>% 
  summarise(A = mean(A), B = median(B), 
            C = C[which.max(C)])

# A tibble: 3 x 4
    ROW     A     B     C
  <int> <dbl> <dbl> <dbl>
1     1  2.63     5     3
2     2  5.07     4     2
3     3  4.4     10     2

merge multiple data.frames [r]

What about something like this:

l2 <- Reduce(function(x, n) merge(x, l1[[n]], by='nu_pregao', suffixes = c("", n)),
             seq(2, length(l1)), init = l1[[1]])
l2
#>   nu_pregao    pcVar   pcVar2   pcVar3
#> 1      2371 7.224848 4.055709 4.011461
#> 2      2372 2.797704 2.944882 3.679907
#> 3      2373 3.947368 3.507937 4.693034

Final touch for names consistency:

names(l2)[match("pcVar", names(l2))] <- "pcVar1"
l2
#>   nu_pregao   pcVar1   pcVar2   pcVar3
#> 1      2371 7.224848 4.055709 4.011461
#> 2      2372 2.797704 2.944882 3.679907
#> 3      2373 3.947368 3.507937 4.693034

Your data:

l1 <- list(read.table(text = "nu_pregao    pcVar
1       2371 7.224848
45      2372 2.797704
89      2373 3.947368", header = TRUE),

read.table(text = "nu_pregao    pcVar
2       2371 4.055709
46      2372 2.944882
90      2373 3.507937", header = TRUE),

read.table(text = "nu_pregao    pcVar
3       2371 4.011461
47      2372 3.679907
91      2373 4.693034", header = TRUE))

A set of functions over multiple data frames and merge the outputs in R

Basil. Welcome to StackOverflow.

I was wary of lapply when I first stated using R, but you should stick with it. It's almost always more efficient than using a for loop. In your particular case, you can put your individual data frames in a list and the code you run on each into a function myFunc, say, which takes the data frame you want to process as its argument.

Then you can simply say

allData <- bind_rows(lapply(1:length(dataFrameList), function(x) myFunc(dataFrameList[[x]])))

Incidentally, your column names make me think your data isn't yet tidy. I'd suggest you spend a little time making it so before you do much else. It will save you a huge amount of effort in the long run.

Multiply and average data from two data.frames into one column using R

One dplyr option could be:

df1 %>%
 rowwise() %>%
 mutate(new = sum(across(df2$p) * df2$q))

   a         b     c     d   new
   <fct> <dbl> <dbl> <dbl> <dbl>
 1 a      7.17 14.8   8.45 24.9 
 2 a      7.42 19.7   3.97 44.7 
 3 a      5.78 19.2   9.66 29.7 
 4 a      5.09 17.7  12.8  19.3 
 5 a      7.21 12.9   6.24 25.2 
 6 a      2.36 13.7   2.50 27.7 
 7 a      7.26 10.9  10.7  12.0 
 8 a      5.45  6.18 12.8  -4.92
 9 b      5.43 18.2   9.55 27.3 
10 b      4.16 12.1   4.11 23.5

Aggregate multiple columns at once

We can use the formula method of aggregate. The variables on the 'rhs' of ~ are the grouping variables while the . represents all other variables in the 'df1' (from the example, we assume that we need the mean for all the columns except the grouping), specify the dataset and the function (mean).

aggregate(.~id1+id2, df1, mean)

Or we can use summarise_each from dplyr after grouping (group_by)

library(dplyr)
df1 %>%
    group_by(id1, id2) %>% 
    summarise_each(funs(mean))

Or using summarise with across (dplyr devel version - ‘0.8.99.9000’)

df1 %>% 
    group_by(id1, id2) %>%
    summarise(across(starts_with('val'), mean))

Or another option is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD) and get the mean.

library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]

data

df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b", 
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"), 
val1 = c(1L, 
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L, 
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

R: merging columns and the values if they have the same column name

Solution 1

Using split(), lapply(), rowSums(), and do.call()/cbind():

do.call(cbind,lapply(split(seq_len(ncol(df)),names(df)),function(x) rowSums(df[x])));
##      B C U
## [1,] 2 2 1
## [2,] 4 4 2
## [3,] 6 6 3
## [4,] 8 8 4

Solution 2

Replacing the rowSums() call with Reduce()/`+`():

do.call(cbind,lapply(split(seq_len(ncol(df)),names(df)),function(x) Reduce(`+`,df[x])));
##      B C U
## [1,] 2 2 1
## [2,] 4 4 2
## [3,] 6 6 3
## [4,] 8 8 4

Solution 3

Replacing the index vector middleman with splitting the data.frame (as an unclassed list) directly:

do.call(cbind,lapply(split(as.list(df),names(df)),function(x) Reduce(`+`,x)));
##      B C U
## [1,] 2 2 1
## [2,] 4 4 2
## [3,] 6 6 3
## [4,] 8 8 4

Benchmarking

library(microbenchmark);

bgoldst1 <- function(df) do.call(cbind,lapply(split(seq_len(ncol(df)),names(df)),function(x) rowSums(df[x])));
bgoldst2 <- function(df) do.call(cbind,lapply(split(seq_len(ncol(df)),names(df)),function(x) Reduce(`+`,df[x])));
bgoldst3 <- function(df) do.call(cbind,lapply(split(as.list(df),names(df)),function(x) Reduce(`+`,x)));
sotos <- function(df) sapply(unique(names(df)), function(i)rowSums(df[names(df) == i]));

df <- data.frame(B=c(1L,2L,3L,4L),C=c(1L,2L,3L,4L),U=c(1L,2L,3L,4L),B=c(1L,2L,3L,4L),C=c(1L,2L,3L,4L),check.names=F);

ex <- bgoldst1(df);
all.equal(ex,sotos(df)[,colnames(ex)]);
## [1] TRUE
all.equal(ex,bgoldst2(df));
## [1] TRUE
all.equal(ex,bgoldst3(df));
## [1] TRUE

microbenchmark(bgoldst1(df),bgoldst2(df),bgoldst3(df),sotos(df));
## Unit: microseconds
##          expr     min       lq     mean   median      uq      max neval
##  bgoldst1(df) 245.473 258.3030 278.9499 272.4155 286.742  641.052   100
##  bgoldst2(df) 156.949 166.3580 184.2206 171.7030 181.539 1042.618   100
##  bgoldst3(df)  82.110  92.5875 100.9138  97.2915 107.128  170.207   100
##     sotos(df) 200.997 211.9030 226.7977 223.6630 235.210  328.010   100

set.seed(1L);
NR <- 1e3L; NC <- 1e3L;
df <- setNames(nm=LETTERS[sample(seq_along(LETTERS),NC,T)],data.frame(replicate(NC,sample(seq_len(NR*3L),NR,T))));

ex <- bgoldst1(df);
all.equal(ex,sotos(df)[,colnames(ex)]);
## [1] TRUE
all.equal(ex,bgoldst2(df));
## [1] TRUE
all.equal(ex,bgoldst3(df));
## [1] TRUE

microbenchmark(bgoldst1(df),bgoldst2(df),bgoldst3(df),sotos(df));
## Unit: milliseconds
##          expr       min        lq      mean    median        uq      max neval
##  bgoldst1(df) 11.070218 11.586182 12.745706 12.870209 13.234997 16.15929   100
##  bgoldst2(df)  4.534402  4.680446  6.161428  6.097900  6.425697 44.83254   100
##  bgoldst3(df)  3.430203  3.555505  5.355128  4.919931  5.219930 41.79279   100
##     sotos(df) 19.953848 21.419628 22.713282 21.829533 22.280279 60.86525   100

sum and average of rows from list of data frames

Instead of the while loop, an option in R would be to get the sum of corresponding elements of list with Reduce and divide by the length of the 'sample_list'

Reduce(`+`, sample_list)/length(sample_list)
#  x
#1  6
#2  7
#3  8
#4  9
#5 10

Or a concise approach is rowMeans after converting it to a single data.frame

rowMeans(do.call(cbind, sample_list))

How to Merge Multiple Data.Frames and Sum and Average Columns at the Same Time in R