Combine Two Pandas Data Frames (Join on a Common Column)

Merge two data frames based on common column values in Pandas

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

JOIN two dataframes on common column in python

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
.drop(['id1', 'rating'], axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
.drop('id1', axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0

Pandas: Combine two data-frames with different shape based on one common column

Use GroupBy.cumcount for helper column used for merge with left join:

df['g'] = df.groupby('Student_id').cumcount()
df1['g'] = df1.groupby('Student_id').cumcount()

df = df.merge(df1, on=['Student_id','g'], how='left').drop('g', axis=1)

pandas: merge (join) two data frames on multiple columns

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left
DataFrame. Can be a vector or list of vectors of the length of the
DataFrame to use a particular vector as the join key instead of
columns

right_on : label or list, or array-like Field names to join on
in right DataFrame or vector/list of vectors per left_on docs

merge two dataframes with some common columns where the combining of the common needs to be a custom function

You can concatenate the dataframes, and then groupby the column names to apply an operation on the similarly named columns: In this case you can get away with taking the sum and then typecasting back to bool to get the or operation.

import pandas as pd

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)

Output:

        0.0   0.5    0.7
12.5 True True True
14.0 True True False
15.5 False True False

If you need to see how to do this in a less case-specific manner, then again just group by the columns and apply something to the grouped object over axis=1

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
# 0.0 0.5 0.7
#12.5 True True True
#14.0 True True False
#15.5 False True False

Further, you can define a custom combining function. Here's one which adds twice the left Frame to 4 times the right Frame. If there is only one column, it returns 2x the left frame.

Sample Data

left:

      0.0  0.5
12.5 1 11
14.0 2 17
15.5 3 17

right:

      0.7  0.5
12.5 4 2
14.0 4 -1
15.5 5 5

Code

def my_func(x):
try:
res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
except IndexError:
res = x.iloc[:, 0]*2
return res

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))

Output:

      0.0  0.5  0.7
12.5 2 30 8
14.0 4 30 8
15.5 6 54 10

Finally, if you wanted to do this in a consecutive manner, then you should make use of reduce. Here I'll combine 5 DataFrames with the above function. (I'll just repeat the right Frame 4x for the example)

from functools import reduce

def my_comb(df_l, df_r, func):
""" Concatenate df_l and df_r along axis=1. Apply the
specified function.
"""
df = pd.concat([df_l, df_r], 1)
return df.groupby(df.columns, 1).apply(lambda x: func(x))

reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
# 0.0 0.5 0.7
#12.5 16 296 176
#14.0 32 212 176
#15.5 48 572 220

How do I merge/expand two python pandas dataframes, based on one common column but different content?

Use outer merge with sort=True:

print( pd.merge(df1, df2, on='Date', how='outer', sort=True) )

Prints:

        Date Name1  Name2
0 2017-06-02 NaN dfdds
1 2018-08-05 abc NaN
2 2018-09-17 NaN hger
3 2019-08-05 cdsx NaN
4 2020-08-05 sdfs NaN


Related Topics



Leave a reply



Submit