Join Two Dataframes on Common Column

Merge two data frames based on common column values in Pandas

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

JOIN two dataframes on common column in python

Use merge:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

Another solution is simple rename column:

print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
   id name  count  price  rating
0   1    a     10  100.0     1.0
1   2    b     20  200.0     2.0
2   3    c     30  300.0     3.0
3   4    d     40    NaN     NaN
4   5    e     50  500.0     5.0

If need only column price the simpliest is map:

df1['price'] = df1.id.map(df2.set_index('id1')['price'])
print (df1)
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Another 2 solutions:

print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
         .drop(['id1', 'rating'], axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
         .drop('id1', axis=1))
   id name  count  price
0   1    a     10  100.0
1   2    b     20  200.0
2   3    c     30  300.0
3   4    d     40    NaN
4   5    e     50  500.0

Merging 2 dataframes by common column values under a common column name in R

# set as data.table
lapply(list(df1, df2), \(i) setDT(i))

# inner join
df1[df2, on=.(ID), nomatch=0]

Merge multiple dataframes based on a common column

Use merge and reduce

In [86]: from functools import reduce

In [87]: reduce(lambda x,y: pd.merge(x,y, on='Col1', how='outer'), [df1, df2, df3])
Out[87]:
    Col1  Col2  Col3  Col4  Col5  Col6  Col7
0  data1     3     4   7.0   4.0   NaN   NaN
1  data2     4     3   6.0   9.0   5.0   8.0
2  data3     2     3   1.0   4.0   2.0   7.0
3  data4     2     4   NaN   NaN   NaN   NaN
4  data5     1     4   NaN   NaN   5.0   3.0

Details

In [88]: df1
Out[88]:
    Col1  Col2  Col3
0  data1     3     4
1  data2     4     3
2  data3     2     3
3  data4     2     4
4  data5     1     4

In [89]: df2
Out[89]:
    Col1  Col4  Col5
0  data1     7     4
1  data2     6     9
2  data3     1     4

In [90]: df3
Out[90]:
    Col1  Col6  Col7
0  data2     5     8
1  data3     2     7
2  data5     5     3

joining two dataframes on matching values of two common columns R

We may do a left_join

library(dplyr)
library(tidyr)
A %>%
    mutate(week = as.character(week)) %>% 
    left_join(B) %>% 
    mutate(fill = replace_na(fill, 0))

merge two dataframes with some common columns where the combining of the common needs to be a custom function

You can concatenate the dataframes, and then groupby the column names to apply an operation on the similarly named columns: In this case you can get away with taking the sum and then typecasting back to bool to get the or operation.

import pandas as pd

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).sum().astype(bool)

Output:

        0.0   0.5    0.7
12.5   True  True   True
14.0   True  True  False
15.5  False  True  False

If you need to see how to do this in a less case-specific manner, then again just group by the columns and apply something to the grouped object over axis=1

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: x.any(1))
#        0.0   0.5    0.7
#12.5   True  True   True
#14.0   True  True  False
#15.5  False  True  False

Further, you can define a custom combining function. Here's one which adds twice the left Frame to 4 times the right Frame. If there is only one column, it returns 2x the left frame.

Sample Data

left:

      0.0  0.5
12.5    1   11
14.0    2   17
15.5    3   17

right:

      0.7  0.5
12.5    4    2
14.0    4   -1
15.5    5    5

Code

def my_func(x):
    try:
        res = x.iloc[:, 0]*2 + x.iloc[:, 1]*4
    except IndexError:
        res = x.iloc[:, 0]*2
    return res

df = pd.concat([left, right], 1)
df.groupby(df.columns, 1).apply(lambda x: my_func(x))

Output:

      0.0  0.5  0.7
12.5    2   30    8
14.0    4   30    8
15.5    6   54   10

Finally, if you wanted to do this in a consecutive manner, then you should make use of reduce. Here I'll combine 5 DataFrames with the above function. (I'll just repeat the right Frame 4x for the example)

from functools import reduce

def my_comb(df_l, df_r, func):
    """ Concatenate df_l and df_r along axis=1. Apply the
    specified function.
    """
    df = pd.concat([df_l, df_r], 1)
    return df.groupby(df.columns, 1).apply(lambda x: func(x))

reduce(lambda dfl, dfr: my_comb(dfl, dfr, func=my_func), [left, right, right, right, right])
#      0.0  0.5  0.7
#12.5   16  296  176
#14.0   32  212  176
#15.5   48  572  220

Combine two pandas Data Frames (join on a common column)

You can use merge to combine two dataframes into one:

import pandas as pd
pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')

where on specifies field name that exists in both dataframes to join on, and how
defines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y'). if you wanted to do something like star_restaurant_id and star_restaurant_review, you can do:

 pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))

The parameters are explained in detail in this link.

Join two DataFrames on common columns only if the difference in a separate column is within range [-n, +n]

We can merge, then perform a query to drop rows not within the range:

(df1.merge(df2, on=['Date', 'BillNo.'])
    .query('abs(Amount_x - Amount_y) <= 5')
    .drop('Amount_x', axis=1))

         Date    BillNo.  Amount_y
0  10/08/2020  ABBCSQ1ZA       876
1  10/16/2020  AA171E1Z0      5491

This works well as long as there is only one row that corresponds to a specific (Date, BillNo) combination in each frame.

Merging two dataframes with one common column name

You could accomplish this using an outer join. Here is the code for it:

train_id = pd.read_csv("train_id.csv")
train_up = pd.read_csv("train_up")
train_merged = train_id.merge(train_ub, on=["ID"], how="outer")

Join Two Dataframes on Common Column

Merge two data frames based on common column values in Pandas

JOIN two dataframes on common column in python

Merging 2 dataframes by common column values under a common column name in R

Merge multiple dataframes based on a common column

joining two dataframes on matching values of two common columns R

merge two dataframes with some common columns where the combining of the common needs to be a custom function

Output:

Sample Data

Code

Output:

Combine two pandas Data Frames (join on a common column)

Join two DataFrames on common columns only if the difference in a separate column is within range [-n, +n]

Merging two dataframes with one common column name

Related Topics

Leave a reply