Merge 2 Data Frames in a Loop for Each Column in One of Them

Merging multiple dataframe columns into one using for loop

It is because you save every thing in df_merge. df_merge is always the latest merged, not the sum of all merged dataframes.

I would suggest to set df_merge to a value first, like this.

dfs = [df2,df3,df4,df5,df6]
df_merge = df1
for i in dfs:
    df_merge = pd.merge(df_merge,i,how='left',on='Date')
print("Shape of df_merge = ",df_merge.shape)

Iterating over a merge of multiple dataframes

Solution 1:

Use if 'value' column only in df1 and df2, but not df_master.

dfcon = pd.concat([df1, df2])
df = pd.merge(df_master, dfcon, how='left', on='CAS')

Solution 2:

Use if 'value' column is also in df_master.

df_master_drop = df_master.drop(columns=['value'])
df_drop = pd.merge(df_master_drop, dfcon, how='left', on='CAS')
df = df_master.combine_first(df_drop)

Notes:
Use dfcon = pd.concat([df1, df2]).drop_duplicates('CAS') if there are duplicates. This will preserves earliest CAS value.

Loop through columns combining two data frames in R

df1 <- structure(list(itens = c("item1", "item2", "item3", "item4"), 
                      sp1 = c(20L, 30L, 30L, 30L), 
                      sp2 = c(10L, 15L, 15L, 15L)), 
                 class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(itens = c("item7", "item8", "item9"),
                      sp5 = c(20L,  30L, 30L), 
                      sp6 = c(10L, 15L, 15L)), 
                 class = "data.frame", row.names = c(NA,  -3L))

Solution not using a loop but tidyr::pivot_longer and dplyr::full_join

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(-itens) %>% 
  full_join(df2 %>%  pivot_longer(-itens)) %>% 
  group_by(sps = name) %>% 
  summarise(N = n(),
            sum = sum(value))

Returns:

  sps      N   sum
  <chr> <int> <int>
1 sp1       4   110
2 sp2       4    55
3 sp5       3    80
4 sp6       3    40

How to merge for loop output dataframes into one with python?

A vectorized (read "much faster") solution:

a = np.array(dfa['A'].str.split('').str[1:-1].tolist())
b = np.array(dfb['B'].str.split('').str[1:-1].tolist())

dfb[['disB_1', 'disB_2', 'disB_3']] = (a != b[:, None]).sum(axis=2)

Output:

>>> dfb
    B  disB_1  disB_2  disB_3
0  AC       1       2       1
1  BC       2       1       1
2  CC       2       2       0

Looping through and merging DataFrames with same index, same columns (however a few columns unique to each DataFrame)

I believe columns difference is not necessary here, only use concat, columns are aligned correctly:

df = pd.concat([df,df1,df2], sort=False)
print (df)
        ID   AA  TA    TL    ML    PP
Date                                 
2001  AAPL  1.0  44  50.0   NaN   NaN
2002  AAPL  3.0  33  51.0   NaN   NaN
2003  AAPL  2.0  22  53.0   NaN   NaN
2004  AAPL  5.0  11  76.0   NaN   NaN
2005  AAPL  2.0  33  44.0   NaN   NaN
2006  AAPL  3.0  22  12.0   NaN   NaN
2001  MSFT  3.5  44   NaN  12.0   NaN
2002  MSFT  6.7  33   NaN  15.0   NaN
2003  MSFT  2.3  22   NaN  19.0   NaN
2004  MSFT  5.5  11   NaN  20.0   NaN
2005  MSFT  2.2  33   NaN  43.0   NaN
2006  MSFT  3.2  22   NaN  23.0   NaN
2001  TSLA  3.3  48   NaN   NaN  18.0
2002  TSLA  6.3  38   NaN   NaN  18.0
2003  TSLA  2.6  28   NaN   NaN  18.0
2004  TSLA  5.3  18   NaN   NaN  28.0
2005  TSLA  2.3  38   NaN   NaN  48.0
2006  TSLA  3.3  28   NaN   NaN  28.0