Merge pandas dataframes where one value is between two others
As you say, this is pretty easy in SQL, so why not do it in SQL?
import pandas as pd
import sqlite3
#We'll use firelynx's tables:
presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"],
"president_id":[43, 44, 45]})
terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'),
'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'),
'president_id': [43, 43, 44, 44, 45]})
war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)],
"name": ["War in Afghanistan", "Iraq War"]})
#Make the db in memory
conn = sqlite3.connect(':memory:')
#write the tables
terms.to_sql('terms', conn, index=False)
presidents.to_sql('presidents', conn, index=False)
war_declarations.to_sql('wars', conn, index=False)
qry = '''
select
start_date PresTermStart,
end_date PresTermEnd,
wars.date WarStart,
presidents.name Pres
from
terms join wars on
date between start_date and end_date join presidents on
terms.president_id = presidents.president_id
'''
df = pd.read_sql_query(qry, conn)
df:
PresTermStart PresTermEnd WarStart Pres
0 2001-01-31 00:00:00 2005-01-31 00:00:00 2001-09-14 00:00:00 Bush
1 2001-01-31 00:00:00 2005-01-31 00:00:00 2003-03-03 00:00:00 Bush
Merging two dataframes based on a date between two other dates without a common column
Create data and format to datetimes:
df_A = pd.DataFrame({'start_date':['2017-03-27','2017-01-10'],'end_date':['2017-04-20','2017-02-01']})
df_B = pd.DataFrame({'event_date':['2017-01-20','2017-01-27'],'price':[100,200]})
df_A['end_date'] = pd.to_datetime(df_A.end_date)
df_A['start_date'] = pd.to_datetime(df_A.start_date)
df_B['event_date'] = pd.to_datetime(df_B.event_date)
Create keys to do a cross join:
New in pandas 1.2.0+ how='cross'
instead of assigning psuedo keys:
df_merge = df_A.merge(df_B, how='cross')
Else, with pandas < 1.2.0 use psuedo key to merge on 'key'
df_A = df_A.assign(key=1)
df_B = df_B.assign(key=1)
df_merge = pd.merge(df_A, df_B, on='key').drop('key',axis=1)
Filter out records that do not meet criteria of event dates between start and end dates:
df_merge = df_merge.query('event_date >= start_date and event_date <= end_date')
Join back to original date range table and drop key column
df_out = df_A.merge(df_merge, on=['start_date','end_date'], how='left').fillna('').drop('key', axis=1)
print(df_out)
Output:
end_date start_date event_date price
0 2017-04-20 00:00:00 2017-03-27 00:00:00
1 2017-02-01 00:00:00 2017-01-10 00:00:00 2017-01-20 00:00:00 100
2 2017-02-01 00:00:00 2017-01-10 00:00:00 2017-01-27 00:00:00 200
Best way to join / merge by range in pandas
Setup
Consider the dataframes A
and B
A = pd.DataFrame(dict(
A_id=range(10),
A_value=range(5, 105, 10)
))
B = pd.DataFrame(dict(
B_id=range(5),
B_low=[0, 30, 30, 46, 84],
B_high=[10, 40, 50, 54, 84]
))
A
A_id A_value
0 0 5
1 1 15
2 2 25
3 3 35
4 4 45
5 5 55
6 6 65
7 7 75
8 8 85
9 9 95
B
B_high B_id B_low
0 10 0 0
1 40 1 30
2 50 2 30
3 54 3 46
4 84 4 84
numpy
The ✌easiest✌ way is to use numpy
broadcasting.
We look for every instance of A_value
being greater than or equal to B_low
while at the same time A_value
is less than or equal to B_high
.
a = A.A_value.values
bh = B.B_high.values
bl = B.B_low.values
i, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh))
pd.concat([
A.loc[i, :].reset_index(drop=True),
B.loc[j, :].reset_index(drop=True)
], axis=1)
A_id A_value B_high B_id B_low
0 0 5 10 0 0
1 3 35 40 1 30
2 3 35 50 2 30
3 4 45 50 2 30
To address the comments and give something akin to a left join, I appended the part of A
that doesn't match.
pd.concat([
A.loc[i, :].reset_index(drop=True),
B.loc[j, :].reset_index(drop=True)
], axis=1).append(
A[~np.in1d(np.arange(len(A)), np.unique(i))],
ignore_index=True, sort=False
)
A_id A_value B_id B_low B_high
0 0 5 0.0 0.0 10.0
1 3 35 1.0 30.0 40.0
2 3 35 2.0 30.0 50.0
3 4 45 2.0 30.0 50.0
4 1 15 NaN NaN NaN
5 2 25 NaN NaN NaN
6 5 55 NaN NaN NaN
7 6 65 NaN NaN NaN
8 7 75 NaN NaN NaN
9 8 85 NaN NaN NaN
10 9 95 NaN NaN NaN
Value between two values of another df in pandas
If your table isn't to big (this merge creates a cartesian product), you merge and then filter:
# Merge on Key1
dfm = df1.merge(df2, on='Key1')
# Filter on value in range of initial and final
df1['Key2'] = dfm.loc[(dfm['Value'] >= dfm['Value Initial']) & (dfm['Value'] <= dfm['Value Final']), 'Key2']
df1
Output:
Value Key1 Key2
0 10 55 Y
1 20 55 Y
2 30 35 Z
3 40 35 Z
How to join two dataframes when only some dates in one dataframe is present between two other dates in other dataframe?
If your start_date
and end_date
do not overlap, create an interval index and merge your two dataframes:
bins = pd.IntervalIndex.from_arrays(df_A['start_date'],
df_A['end_date'],
closed='both')
out = df_B.assign(interval=pd.cut(df_B['event_date'], bins)) \
.merge(df_A.assign(interval=bins), on='interval', how='left')
print(out[['event_date', 'price', 'start_date']])
# Output:
event_date price start_date
0 2021-04-01 00:06:00 100 2021-04-01
1 2021-05-01 00:03:00 200 2021-05-01
2 2021-05-04 00:00:00 500 NaT
Pandas DataFrame merge between two values instead of matching one
I ended up realizing I was over thinking this I added a column called merge to both tables which was just all 1's
then I can merge on that column and do regular boolean filters on the resulting merged table.
a["merge"] = 1
b["merge"] = 1
c = a.merge(b, on="merge")
then filter on c
Pandas merge two dataframes based on one column from one table lies in between two columns from another table
Quick and dirty way:
countries = []
for i in range(len(df1)):
ip = df1.loc[i, 'ip']
country = df2.query("low_ip <= @ip <= high_ip")['country'].to_numpy()
if len(country) > 0:
countries.append(country[0])
else:
countries.append('NA')
df1['country'] = countries
print(df1)
ip country
0 0.1 NA
1 2.5 B
2 3.5 A
Related Topics
What Does % Do to Strings in Python
Pandas New Column from Groupby Averages
Inverting a Dictionary with List Values
Pyqt: No Error Msg (Traceback) on Exit
Pandas: Replace Substring in String
How to Change an Image Size in Pygame
Understanding Recursion in Python
Opencv Videocapture and Error: (-215:Assertion Failed) !_Src.Empty() in Function 'Cv::Cvtcolor'
How to Plot and Annotate a Grouped Bar Chart
Is This Bad Programming Practice in Tkinter
Error Message: 'Chromedriver' Executable Needs to Be Path
How to Rotate the Sprite and Shoot the Bullets Towards the Mouse Position