Merge Several Data.Frames into One Data.Frame With a Loop

Combine multiple Data Frames with WHILE loop

You could create a list with all the dataframes and then concatenate them.
Like before the while loop have a list of dataframes.

list_of_dfs = []

And prior to the index+=1 add the final_list to list of dataframes.

list_of_dfs.append(final_list)

You probably dont want to append like final_list.append(final_list).

Eventually, you could do

my_df_of_concern = pd.concat(list_of_dfs, index=0)

See https://pandas.pydata.org/docs/reference/api/pandas.concat.html

Merge several data.frames into one data.frame with a loop

You may want to look at the closely related question on stackoverflow.

I would approach this in two steps: import all the data (with plyr), then merge it together:

filenames <- list.files(path=".../tempDataFolder/", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)

That will give you a list of all the files that you now need to merge together. There are many ways to do this, but here's one approach (with Reduce):

data <- Reduce(function(x, y) merge(x, y, all=T, 
by=c("COUNTRYNAME", "COUNTRYCODE", "Year")), import.list, accumulate=F)

Alternatively, you can do this with the reshape package if you aren't comfortable with Reduce:

library(reshape)
data <- merge_recurse(import.list)

How to create a for loop for combining several data frames and df subsets into one data frame?

You can define a function that will sum up all numeric columns of a data.frame, and leave other columns as NA, append this to original data frame:

numericCols = sapply(iris,is.numeric)

func = function(df,numCols){

iris_sums <- colSums(df[,numCols])
result <- rep(NA,ncol(df))
names(result) <- colnames(df)
result[names(iris_sums)] <- iris_sums
rbind(df,result,rep(NA,ncol(df)))
}

Then we use purrr to map each subset:

split(iris,iris$Species) %>% map_dfr(func,numCols=numericCols)

How to merge for loop output dataframes into one with python?

A vectorized (read "much faster") solution:

a = np.array(dfa['A'].str.split('').str[1:-1].tolist())
b = np.array(dfb['B'].str.split('').str[1:-1].tolist())

dfb[['disB_1', 'disB_2', 'disB_3']] = (a != b[:, None]).sum(axis=2)

Output:

>>> dfb
B disB_1 disB_2 disB_3
0 AC 1 2 1
1 BC 2 1 1
2 CC 2 2 0

Loop for merging multiple dataframes from list of dataframes in R

I'm not a fan of how this ends up with multiple columns with the same name, but that's what you wanted.

You aren't really asking for a merge because that would give 3 x 3 = 9 rows, so I used cbind.

(I changed the name of the list of data.frames to df_list to avoid confusion)

df_list <- list(
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 1, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y')),
data.frame(ID = 2, b = c('x', 'y', 'z'), c = c('y', 'z', 'x'), d = c('z', 'x', 'y'))
)

for (i in 1:(length(df_list) - 1)) {
if (NROW(df_list[[i]]) == NROW(df_list[[i + 1]]) &&
all(df_list[[i]]$ID == df_list[[i + 1]]$ID)) {
df_list[[i]] <- cbind(df_list[[i]], df_list[[i + 1]][, -1])
df_list[[i + 1]] <- list()
}
}
df_list <- df_list[!sapply(df_list, function(x) NROW(x) == 0)]
df_list
[[1]]
ID b c d b c d
1 1 x y z x y z
2 1 y z x y z x
3 1 z x y z x y

[[2]]
ID b c d
1 2 x y z
2 2 y z x
3 2 z x y

Iterating over a merge of multiple dataframes

Solution 1:

Use if 'value' column only in df1 and df2, but not df_master.

dfcon = pd.concat([df1, df2])
df = pd.merge(df_master, dfcon, how='left', on='CAS')

Solution 2:

Use if 'value' column is also in df_master.

df_master_drop = df_master.drop(columns=['value'])
df_drop = pd.merge(df_master_drop, dfcon, how='left', on='CAS')
df = df_master.combine_first(df_drop)

Notes:
Use dfcon = pd.concat([df1, df2]).drop_duplicates('CAS') if there are duplicates. This will preserves earliest CAS value.

Merging multiple dataframe columns into one using for loop

It is because you save every thing in df_merge. df_merge is always the latest merged, not the sum of all merged dataframes.

I would suggest to set df_merge to a value first, like this.

dfs = [df2,df3,df4,df5,df6]
df_merge = df1
for i in dfs:
df_merge = pd.merge(df_merge,i,how='left',on='Date')
print("Shape of df_merge = ",df_merge.shape)

Merging multiple data frames in a loop

You can try something like this:

create the 'x' column with all NA values in your first data.frame

df[,"x"] <- NA

use your ID column to name the rows of your first data.frame

rownames (df) <- df$ID

and then use this rownames to replace the 'x' column just in the desired rows depending of each of your other datasets

df[df1$ID, "x"] <- df1$x

df[df2$ID, "x"] <- df2$x

This will keep the NA values in the 'x' column as in your example.



Related Topics



Leave a reply



Submit