Pandas: Merge (Join) Two Data Frames on Multiple Columns

pandas: merge (join) two data frames on multiple columns

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left
DataFrame. Can be a vector or list of vectors of the length of the
DataFrame to use a particular vector as the join key instead of
columns

right_on : label or list, or array-like Field names to join on
in right DataFrame or vector/list of vectors per left_on docs

Pandas, merging two dataframes on multiple columns, and multiplying result

You could merge and them multiply:

merged = df1.merge(df2, on=['Name', 'Event'])
merged['ResultFactor'] = merged.Factor1 * merged.Factor2
result = merged.drop(['Factor1', 'Factor2'], axis=1)

print(result)

Output

   Name Event  ResultFactor
0 John A 2.4
1 John B 1.5
2 Ken A 3.0

Join two dataframes based on two columns

You can use pd.merge() and multiple keys a, b and a1, b1 using left_on and right_on, as follows:

import pandas as pd
import numpy as np

df1 = pd.DataFrame()
df2 = pd.DataFrame()
df3 = pd.DataFrame()

df1['a'] = [1, 2, 3]
df1['b'] = [2, 4, 6]
df1['c'] = [3, 5, 9]

df2['a1'] = [1, 2]
df2['b1'] = [4, 4]
df2['c1'] = [7, 5]

df3 = pd.merge(df1, df2, left_on=['a', 'b'], right_on=['a1', 'b1'], how='inner')
print(df3) # df3 has all columns for df1 and df2

# a b c a1 b1 c1
#0 2 4 5 2 4 5

df3 = df3.drop(df2.columns, axis=1) # removed columns of df2 as they're duplicated
df3.columns = ['a2', 'b2', 'c3'] # column names are changed as you want.
print(df3)

# a2 b2 c3
#0 2 4 5

For more information about pd.merge(), please see: https://pandas.pydata.org/docs/reference/api/pandas.merge.html

Pandas Merging Multiple Columns at the Same Between Two Dataframes

If you already have the empty columns, you can use:

mapping = df_keyword_vol.set_index('Keyword')['Volume']

df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)

Else, if you only have the KWx columns:

df2 = (pd.concat([df, df.replace(mapping)], axis=1)
.sort_index(axis=1)
)

output:

         KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0 nectarine 1000 apple 600 banana 450 kiwi 1200 raspberry 400
1 apricot 500 orange 800 grapefruit 10 lemon 150 blueberry 850
2 plum 200 pear 1000 cherry 900 peach 700 berries 1000


Related Topics



Leave a reply



Submit