Combine Two Pandas Data Frames (Join on a Common Column)

Merge two data frames based on common column values in Pandas

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

JOIN two dataframes on common column in python

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
         .drop(['id1', 'rating'], axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
         .drop('id1', axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Pandas: Combine two data-frames with different shape based on one common column

Use GroupBy.cumcount for helper column used for merge with left join:

df['g'] = df.groupby('Student_id').cumcount()
df1['g'] = df1.groupby('Student_id').cumcount()

df = df.merge(df1, on=['Student_id','g'], how='left').drop('g', axis=1)

pandas: merge (join) two data frames on multiple columns

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

left_on : label or list, or array-like Field names to join on in left
DataFrame. Can be a vector or list of vectors of the length of the
DataFrame to use a particular vector as the join key instead of
columns
right_on : label or list, or array-like Field names to join on
in right DataFrame or vector/list of vectors per left_on docs

merge two dataframes with some common columns where the combining of the common needs to be a custom function

You can concatenate the dataframes, and then groupby the column names to apply an operation on the similarly named columns: In this case you can get away with taking the sum and then typecasting back to bool to get the or operation.

import pandas as pd

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)

Output:

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

If you need to see how to do this in a less case-specific manner, then again just group by the columns and apply something to the grouped object over axis=1

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
#        0.0   0.5    0.7
#12.5   True  True   True
#14.0   True  True  False
#15.5  False  True  False

Further, you can define a custom combining function. Here's one which adds twice the left Frame to 4 times the right Frame. If there is only one column, it returns 2x the left frame.

Sample Data

left:

      0.0  0.5
12.5    1   11
14.0    2   17
15.5    3   17

right:

      0.7  0.5
12.5    4    2
14.0    4   -1
15.5    5    5

Code

def my_func(x):
    try:
        res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
    except IndexError:
        res = x.iloc[:, 0]*2
    return res

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))

Output:

      0.0  0.5  0.7
12.5    2   30    8
14.0    4   30    8
15.5    6   54   10

Finally, if you wanted to do this in a consecutive manner, then you should make use of reduce. Here I'll combine 5 DataFrames with the above function. (I'll just repeat the right Frame 4x for the example)

from functools import reduce

def my_comb(df_l, df_r, func):
    """ Concatenate df_l and df_r along axis=1. Apply the
    specified function.
    """
    df = pd.concat([df_l, df_r], 1)
    return df.groupby(df.columns, 1).apply(lambda x: func(x))

reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
#      0.0  0.5  0.7
#12.5   16  296  176
#14.0   32  212  176
#15.5   48  572  220

How do I merge/expand two python pandas dataframes, based on one common column but different content?

Use outer merge with sort=True:

print( pd.merge(df1, df2, on='Date', how='outer', sort=True) )

Prints:

        Date Name1  Name2
0 2017-06-02   NaN  dfdds
1 2018-08-05   abc    NaN
2 2018-09-17   NaN   hger
3 2019-08-05  cdsx    NaN
4 2020-08-05  sdfs    NaN

Combine Two Pandas Data Frames (Join on a Common Column)

Merge two data frames based on common column values in Pandas

JOIN two dataframes on common column in python

Pandas: Combine two data-frames with different shape based on one common column

pandas: merge (join) two data frames on multiple columns

merge two dataframes with some common columns where the combining of the common needs to be a custom function

Output:

Sample Data

Code

Output:

How do I merge/expand two python pandas dataframes, based on one common column but different content?

Related Topics

Leave a reply