Merge 2 Data Frames in a Loop for Each Column in One of Them

Merging multiple dataframe columns into one using for loop

It is because you save every thing in df_merge. df_merge is always the latest merged, not the sum of all merged dataframes.

I would suggest to set df_merge to a value first, like this.

dfs = [df2,df3,df4,df5,df6]
df_merge = df1
for i in dfs:
df_merge = pd.merge(df_merge,i,how='left',on='Date')
print("Shape of df_merge = ",df_merge.shape)

Iterating over a merge of multiple dataframes

Solution 1:

Use if 'value' column only in df1 and df2, but not df_master.

dfcon = pd.concat([df1, df2])
df = pd.merge(df_master, dfcon, how='left', on='CAS')

Solution 2:

Use if 'value' column is also in df_master.

df_master_drop = df_master.drop(columns=['value'])
df_drop = pd.merge(df_master_drop, dfcon, how='left', on='CAS')
df = df_master.combine_first(df_drop)

Notes:
Use dfcon = pd.concat([df1, df2]).drop_duplicates('CAS') if there are duplicates. This will preserves earliest CAS value.

Loop through columns combining two data frames in R

df1 <- structure(list(itens = c("item1", "item2", "item3", "item4"), 
sp1 = c(20L, 30L, 30L, 30L),
sp2 = c(10L, 15L, 15L, 15L)),
class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(itens = c("item7", "item8", "item9"),
sp5 = c(20L, 30L, 30L),
sp6 = c(10L, 15L, 15L)),
class = "data.frame", row.names = c(NA, -3L))

Solution not using a loop but tidyr::pivot_longer and dplyr::full_join

library(dplyr)
library(tidyr)

df1 %>%
pivot_longer(-itens) %>%
full_join(df2 %>% pivot_longer(-itens)) %>%
group_by(sps = name) %>%
summarise(N = n(),
sum = sum(value))

Returns:

  sps      N   sum
<chr> <int> <int>
1 sp1 4 110
2 sp2 4 55
3 sp5 3 80
4 sp6 3 40

How to merge for loop output dataframes into one with python?

A vectorized (read "much faster") solution:

a = np.array(dfa['A'].str.split('').str[1:-1].tolist())
b = np.array(dfb['B'].str.split('').str[1:-1].tolist())

dfb[['disB_1', 'disB_2', 'disB_3']] = (a != b[:, None]).sum(axis=2)

Output:

>>> dfb
B disB_1 disB_2 disB_3
0 AC 1 2 1
1 BC 2 1 1
2 CC 2 2 0

Looping through and merging DataFrames with same index, same columns (however a few columns unique to each DataFrame)

I believe columns difference is not necessary here, only use concat, columns are aligned correctly:

df = pd.concat([df,df1,df2], sort=False)
print (df)
ID AA TA TL ML PP
Date
2001 AAPL 1.0 44 50.0 NaN NaN
2002 AAPL 3.0 33 51.0 NaN NaN
2003 AAPL 2.0 22 53.0 NaN NaN
2004 AAPL 5.0 11 76.0 NaN NaN
2005 AAPL 2.0 33 44.0 NaN NaN
2006 AAPL 3.0 22 12.0 NaN NaN
2001 MSFT 3.5 44 NaN 12.0 NaN
2002 MSFT 6.7 33 NaN 15.0 NaN
2003 MSFT 2.3 22 NaN 19.0 NaN
2004 MSFT 5.5 11 NaN 20.0 NaN
2005 MSFT 2.2 33 NaN 43.0 NaN
2006 MSFT 3.2 22 NaN 23.0 NaN
2001 TSLA 3.3 48 NaN NaN 18.0
2002 TSLA 6.3 38 NaN NaN 18.0
2003 TSLA 2.6 28 NaN NaN 18.0
2004 TSLA 5.3 18 NaN NaN 28.0
2005 TSLA 2.3 38 NaN NaN 48.0
2006 TSLA 3.3 28 NaN NaN 28.0


Related Topics



Leave a reply



Submit