Merging multiple dataframe columns into one using for loop
It is because you save every thing in df_merge. df_merge is always the latest merged, not the sum of all merged dataframes.
I would suggest to set df_merge to a value first, like this.
dfs = [df2,df3,df4,df5,df6]
df_merge = df1
for i in dfs:
df_merge = pd.merge(df_merge,i,how='left',on='Date')
print("Shape of df_merge = ",df_merge.shape)
Iterating over a merge of multiple dataframes
Solution 1:
Use if 'value' column only in df1 and df2, but not df_master.
dfcon = pd.concat([df1, df2])
df = pd.merge(df_master, dfcon, how='left', on='CAS')
Solution 2:
Use if 'value' column is also in df_master.
df_master_drop = df_master.drop(columns=['value'])
df_drop = pd.merge(df_master_drop, dfcon, how='left', on='CAS')
df = df_master.combine_first(df_drop)
Notes:
Use dfcon = pd.concat([df1, df2]).drop_duplicates('CAS') if there are duplicates. This will preserves earliest CAS value.
Loop through columns combining two data frames in R
df1 <- structure(list(itens = c("item1", "item2", "item3", "item4"),
sp1 = c(20L, 30L, 30L, 30L),
sp2 = c(10L, 15L, 15L, 15L)),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(itens = c("item7", "item8", "item9"),
sp5 = c(20L, 30L, 30L),
sp6 = c(10L, 15L, 15L)),
class = "data.frame", row.names = c(NA, -3L))
Solution not using a loop but tidyr::pivot_longer
and dplyr::full_join
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(-itens) %>%
full_join(df2 %>% pivot_longer(-itens)) %>%
group_by(sps = name) %>%
summarise(N = n(),
sum = sum(value))
Returns:
sps N sum
<chr> <int> <int>
1 sp1 4 110
2 sp2 4 55
3 sp5 3 80
4 sp6 3 40
How to merge for loop output dataframes into one with python?
A vectorized (read "much faster") solution:
a = np.array(dfa['A'].str.split('').str[1:-1].tolist())
b = np.array(dfb['B'].str.split('').str[1:-1].tolist())
dfb[['disB_1', 'disB_2', 'disB_3']] = (a != b[:, None]).sum(axis=2)
Output:
>>> dfb
B disB_1 disB_2 disB_3
0 AC 1 2 1
1 BC 2 1 1
2 CC 2 2 0
Looping through and merging DataFrames with same index, same columns (however a few columns unique to each DataFrame)
I believe columns difference is not necessary here, only use concat
, columns are aligned correctly:
df = pd.concat([df,df1,df2], sort=False)
print (df)
ID AA TA TL ML PP
Date
2001 AAPL 1.0 44 50.0 NaN NaN
2002 AAPL 3.0 33 51.0 NaN NaN
2003 AAPL 2.0 22 53.0 NaN NaN
2004 AAPL 5.0 11 76.0 NaN NaN
2005 AAPL 2.0 33 44.0 NaN NaN
2006 AAPL 3.0 22 12.0 NaN NaN
2001 MSFT 3.5 44 NaN 12.0 NaN
2002 MSFT 6.7 33 NaN 15.0 NaN
2003 MSFT 2.3 22 NaN 19.0 NaN
2004 MSFT 5.5 11 NaN 20.0 NaN
2005 MSFT 2.2 33 NaN 43.0 NaN
2006 MSFT 3.2 22 NaN 23.0 NaN
2001 TSLA 3.3 48 NaN NaN 18.0
2002 TSLA 6.3 38 NaN NaN 18.0
2003 TSLA 2.6 28 NaN NaN 18.0
2004 TSLA 5.3 18 NaN NaN 28.0
2005 TSLA 2.3 38 NaN NaN 48.0
2006 TSLA 3.3 28 NaN NaN 28.0
Related Topics
Fastest Way to Add Rows For Missing Time Steps
Reorder Levels of a Factor Without Changing Order of Values
Warning Message: in '...': Invalid Factor Level, Na Generated
Position Geom_Text on Dodged Barplot
Generate List of All Possible Combinations of Elements of Vector
Rstudio Suddenly Stopped Showing Plots in the Plot Pane
Adding Value from One Data.Frame to Another Data.Frame by Matching a Variable
To Find Most Frequently Occuring Element in Matrix in R
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)
Too Much White Space Between Caption and Figure Produced by Tikzdevice and Ggplot2 in Latex
How to Specify the Size of a Graph in Ggplot2 Independent of Axis Labels
Dplyr Conditional Summarise Function
How to Force R to Use a Specified Factor Level as Reference in a Regression