Merge two data frames based on common column values in Pandas
We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.
import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')
For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title'
as 'movie_name'
.
dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')
If you want to be even more specific, you may read the documentation of pandas merge
operation.
JOIN two dataframes on common column in python
Use merge
:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
Another solution is simple rename column:
print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id', how='left'))
id name count price rating
0 1 a 10 100.0 1.0
1 2 b 20 200.0 2.0
2 3 c 30 300.0 3.0
3 4 d 40 NaN NaN
4 5 e 50 500.0 5.0
If need only column price
the simpliest is map
:
df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Another 2 solutions:
print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
.drop(['id1', 'rating'], axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
.drop('id1', axis=1))
id name count price
0 1 a 10 100.0
1 2 b 20 200.0
2 3 c 30 300.0
3 4 d 40 NaN
4 5 e 50 500.0
Pandas: Combine two data-frames with different shape based on one common column
Use GroupBy.cumcount
for helper column used for merge with left join:
df['g'] = df.groupby('Student_id').cumcount()
df1['g'] = df1.groupby('Student_id').cumcount()
df = df.merge(df1, on=['Student_id','g'], how='left').drop('g', axis=1)
pandas: merge (join) two data frames on multiple columns
Try this
new_df = pd.merge(A_df, B_df, how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
left_on : label or list, or array-like Field names to join on in left
DataFrame. Can be a vector or list of vectors of the length of the
DataFrame to use a particular vector as the join key instead of
columnsright_on : label or list, or array-like Field names to join on
in right DataFrame or vector/list of vectors per left_on docs
merge two dataframes with some common columns where the combining of the common needs to be a custom function
You can concatenate the dataframes, and then groupby the column names to apply an operation on the similarly named columns: In this case you can get away with taking the sum and then typecasting back to bool to get the or
operation.
import pandas as pd
df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)
Output:
0.0 0.5 0.7
12.5 True True True
14.0 True True False
15.5 False True False
If you need to see how to do this in a less case-specific manner, then again just group by the columns and apply something to the grouped object over axis=1
df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
# 0.0 0.5 0.7
#12.5 True True True
#14.0 True True False
#15.5 False True False
Further, you can define a custom combining function. Here's one which adds twice the left Frame to 4 times the right Frame. If there is only one column, it returns 2x the left frame.
Sample Data
left:
0.0 0.5
12.5 1 11
14.0 2 17
15.5 3 17
right:
0.7 0.5
12.5 4 2
14.0 4 -1
15.5 5 5
Code
def my_func(x):
try:
res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
except IndexError:
res = x.iloc[:, 0]*2
return res
df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))
Output:
0.0 0.5 0.7
12.5 2 30 8
14.0 4 30 8
15.5 6 54 10
Finally, if you wanted to do this in a consecutive manner, then you should make use of reduce
. Here I'll combine 5 DataFrames
with the above function. (I'll just repeat the right Frame 4x for the example)
from functools import reduce
def my_comb(df_l, df_r, func):
""" Concatenate df_l and df_r along axis=1. Apply the
specified function.
"""
df = pd.concat([df_l, df_r], 1)
return df.groupby(df.columns, 1).apply(lambda x: func(x))
reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
# 0.0 0.5 0.7
#12.5 16 296 176
#14.0 32 212 176
#15.5 48 572 220
How do I merge/expand two python pandas dataframes, based on one common column but different content?
Use outer merge with sort=True
:
print( pd.merge(df1, df2, on='Date', how='outer', sort=True) )
Prints:
Date Name1 Name2
0 2017-06-02 NaN dfdds
1 2018-08-05 abc NaN
2 2018-09-17 NaN hger
3 2019-08-05 cdsx NaN
4 2020-08-05 sdfs NaN
Related Topics
How to Specify New Lines on Python, When Writing on Files
How to Use the Same Python Virtualenv on Both Windows and Linux
How I Open Remote Server Folder Using Python
Numpy Selecting Specific Column Index Per Row by Using a List of Indexes
How to Write a Python Dictionary to a CSV File
Pandas Timeseries Plot Setting X-Axis Major and Minor Ticks and Labels
Horizontal Stacked Bar Plot and Add Labels to Each Section
Count Unique Values Per Groups with Pandas
Pil Installation Fails Missing:Stdarg.H
How to Write to a CSV Line by Line
Python Ctypes - Loading Dll Throws Oserror: [Winerror 193] %1 Is Not a Valid Win32 Application