Combine multiple data frames and calculate average
You can do:
library(data.table)
rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))[,lapply(.SD,mean), list(Lon, Lat)]
Explanations:
Your data.frames
are put into a list
and 'superposed horizontaly' using rbindlist
(which returns a data.table
). We do this since your data.frame
has the same structure (same number and name of columns, same type of data).
An alternative approach would have been to do do.call(rbind, list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))
.
We then loop over each distinct pair of Lon, Lat
. .SD
represents the data.table
associated with each pair. You can see it by doing:
dt = rbindlist(list(JPL.GRACE,GFZ.GRACE,CSR.GRACE))
dt[,print(.SD), list(Lon, Lat)]
For each of these .SD
, we simply loop over the columns and compute the means.
Get the mean across multiple Pandas DataFrames
Assuming the two dataframes have the same columns, you could just concatenate them and compute your summary stats on the concatenated frames:
import numpy as np
import pandas as pd
# some random data frames
df1 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))
df2 = pd.DataFrame(dict(x=np.random.randn(100), y=np.random.randint(0, 5, 100)))
# concatenate them
df_concat = pd.concat((df1, df2))
print df_concat.mean()
# x -0.163044
# y 2.120000
# dtype: float64
print df_concat.median()
# x -0.192037
# y 2.000000
# dtype: float64
Update
If you want to compute stats across each set of rows with the same index in the two datasets, you can use .groupby()
to group the data by row index, then apply the mean, median etc.:
by_row_index = df_concat.groupby(df_concat.index)
df_means = by_row_index.mean()
print df_means.head()
# x y
# 0 -0.850794 1.5
# 1 0.159038 1.5
# 2 0.083278 1.0
# 3 -0.540336 0.5
# 4 0.390954 3.5
This method will work even when your dataframes have unequal numbers of rows - if a particular row index is missing in one of the two dataframes, the mean/median will be computed on the single existing row.
calculate average over multiple data frames
If I understand you correctly, on a given DB system, in each "iteration" (1...N) you are loading a sequence of DataSets (1,2,3) and running queries on them. It seems like at the end you want to calculate the average time across all iterations, for each DataSet. If so, you actually need to have an additional column DataSet
in your all_results
table that identifies the DataSet. We can add this column as follows:
all_results <- cbind( data.frame( DataSet = rep(1:3,3) ), all_results )
> all_results
DataSet iteration lines loadTime query1 query2 query3
1 1 1 100000 120.4 0.5 6.4 1.2
2 2 1 100000 110.1 0.1 5.2 2.1
3 3 1 50000 130.3 0.2 4.3 2.2
4 1 2 100000 120.4 0.1 2.4 1.2
5 2 2 100000 300.2 0.2 4.5 1.4
6 3 2 50000 235.3 0.4 4.2 0.5
7 1 3 100000 233.5 0.7 8.3 6.7
8 2 3 100000 300.1 0.9 0.5 4.4
9 3 3 50000 100.2 0.4 9.2 1.2
Now you can use the ddply
function from the plyr
package to easily extract the averages for the load and query times for each DataSet.
> ddply(all_results, .(DataSet), colwise(mean, .(loadTime, query1, query2)))
DataSet loadTime query1 query2
1 1 158.1000 0.4333333 5.7
2 2 236.8000 0.4000000 3.4
3 3 155.2667 0.3333333 5.9
Incidentally, I highly recommend you look at Hadley Wickham's plyr
package for a rich set of data-manipulation functions
Average Cells of Two or More DataFrames
We can do this with Reduce
with +
and divide by the number of datasets in a list
. This has the flexibility of keeping 'n' number of datasets in a list
dfAvg <- Reduce(`+`, mget(paste0("df", 1:3)))/3
Or another option is to convert to array
and then use apply
, which also have the option of removing the missing values (na.rm=TRUE
)
apply(array(unlist(mget(paste0("df", 1:3))), c(dim(df1), 3)), 2, rowMeans, na.rm = TRUE)
As @user20650 mentioned, rowMeans
can be applied directly on the array
with the dim
rowMeans(array(unlist(mget(paste0("df", 1:3))), c(dim(df1), 3)), dims=2)
How to calculate mean and Sd for multiple data frames in R
Try this:
library(dplyr)
library(tidyr)
#Code
new <- df1 %>% bind_rows(df2,.id = 'id') %>%
pivot_longer(-id) %>%
mutate(Var=paste0(name,id)) %>%
group_by(Var,.drop = F) %>%
summarise(Mean=mean(value,na.rm = T),
SD=sd(value,na.rm = T))
#List
List <- split(new,gsub('[a-z]','',new$Var))
List <- lapply(1:length(List), function(x) {names(List[[x]])<-paste0(names(List[[x]]),x);List[[x]]})
#Bind
res <- do.call(bind_cols,List)
Output:
# A tibble: 3 x 6
Var1 Mean1 SD1 Var2 Mean2 SD2
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 mot1 12 0 mot2 3 1.73
2 temp1 12 0 temp2 7 0
3 time1 13 0 time2 4 0
Combine Multiple Dataframes in R by Average (Mixed datatypes)
Put a ROW ID on your tables
df_1 <- read_table("A B C
2.3 5 3
12 3 1
0.4 13 2") %>%
rowid_to_column("ROW")
df_2 <- read_table("A B C
4.3 23 1
1 7 2
0.4 10 2") %>%
rowid_to_column("ROW")
df_3 <- read_table("A B C
1.3 3 3
2.2 4 2
12.4 10 1") %>%
rowid_to_column("ROW")
Bind them together in an ensemble
ensamb <- bind_rows(df_1, df_2, df_3)
group_by
row and then summarize each one by its own method
ensamb %>%
group_by(ROW) %>%
summarise(A = mean(A), B = median(B),
C = C[which.max(C)])
# A tibble: 3 x 4
ROW A B C
<int> <dbl> <dbl> <dbl>
1 1 2.63 5 3
2 2 5.07 4 2
3 3 4.4 10 2
Weighted average across multiple dataframes
If you really want to disregard the string column, and you are certain the two df
are the same shape, then you can do this:
sel = ['b', 'c'] # numeric columns
df3 = df1.copy()
df3[sel] = 2/3 * df1[sel] + 1/3 * df2[sel]
On your data, df3
is:
a b c
0 hello 2.0 1.333333
1 hello 1.0 1.000000
However, in the more general case, you may have different sizes and your a
column may be relevant. Here is an example:
df1 = pd.DataFrame([["hello", 2, 1], ["world", 1, 1]], columns=["a", "b", "c"])
df2 = pd.DataFrame([["world", 2, 2], ["hello", 1, 1]], columns=["a", "b", "c"])
(2/3 * df1.set_index('a').stack() +
1/3 * df1.set_index('a').stack()).groupby(level=[0,1]).mean().unstack().reset_index()
# gives:
a b c
0 hello 2.0 1.0
1 world 1.0 1.0
Calculate mean of multiple dataframes columns stored in List
I noticed two things that are causing you trouble here:
1 - When you subsetting your list like list[[1:3]]
it gets read as list[[c(1, 2, 3)]]
, and finds the 3rd entry (21) of the 2nd column (price) in the 1st element (df1) in the list. This is why doing something like list[1:2] returns a vector (it's pulling out an entire variable) and why list[1:4] returns an error (the list doesn't go 4 levels deep). (answer by @aaron-montgomery, from the comments)
2 - In your last line, you reference a column mean
that you've never defined.
If you're trying to get one value that's the mean of all the previous elements, you can nest another loop:
#for each df in list, calculate the mean of the last 3 values of q
for (i in 3:length(list)) {
# add another loop to calculate the mean
vals <- c()
for (j in (i - 2):i) {
vals <- c(vals, list[[j]]$q)
}
list[[i]][["q_mean"]] <- mean(vals)
}
If you want a different value for each row (where row1 is the mean of the previous 2 row1s, etc), you can just do:
for (i in 3:length(list)) {
list[[i]][["q_mean"]] <- (list[[i - 1]]$q + list[[i - 2]]$q) /2
}
Average of multiple dataframes with the same columns and indices
You can use groupby.mean
on the index
level after concatenating the dataframes:
pd.concat([v1, v2, v3]).groupby(level=0).mean()
c1 c2 c3
id
ind1 1.333333 2.333333 2.666667
ind2 3.666667 2.333333 3.666667
Related Topics
Converting a Data.Frame to a List of Lists
Downloading Files from Ftp with R
Importing Multiple Excel Files with Filenames in R
Locator Equivalent in Ggplot2 (For Maps)
Shiny Dashboard Mainpanel Height Issue
Ggplot2 Add a Legend for Several Stat_Functions
How to Efficiently Read the First Character from Each Line of a Text File
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
Displaying Image on Point Hover in Plotly
How to Create a Variable of Rownames
Navlistpanel: Make Tabs Sequentially Active in Shiny App
Remove Certain Legend Variables and Legend Values from Ggplot2
Replace Nan Values in a List with Zero (0)
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default