Pandas: Join dataframe with condition
Try the following:
# Transform data in first dataframe
df1 = pd.DataFrame(data)
# Save the data in another datframe
df2 = pd.DataFrame(data)
# Rename column names of second dataframe
df2.rename(index=str, columns={'Reader_ID1': 'Reader_ID1_x', 'SITE_ID1': 'SITE_ID1_x', 'EVENT_TS1': 'EVENT_TS1_x'}, inplace=True)
# Merge the dataframes into another dataframe based on PERSONID and Badge_ID
df3 = pd.merge(df1, df2, how='outer', on=['PERSONID', 'Badge_ID'])
# Use df.loc() to fetch the data you want
df3.loc[(df3.Reader_ID1 < df3.Reader_ID1_x) & (df3.SITE_ID1 != df3.SITE_ID1_x) & (pd.to_datetime(df3['EVENT_TS1']) - pd.to_datetime(df3['EVENT_TS1_x'])<=datetime.timedelta(hours=event_time_diff))]
Pandas - Merge data frames based on conditions
Your question is a bit confusing: array indexes start from 0 so I think in your example it should be [[0]]
and [[1]]
instead of [[1]]
and [[2]]
.
You can first concatenate your dataframes to have all names listed, then loop over your columns and update the values where the corresponding array is greater (I added a Z
row to df2
to show new rows are being added):
df1 = pd.DataFrame({'Name': ['A', 'B', 'C', 'D', 'E'],
'Age': [3, 8, 4, 2, 5], 'Height': [7, 2, 1, 4, 9]})
df2 = pd.DataFrame({'Name': ['A', 'B', 'D', 'Z'],
'Age': [4, 6, 4, 8], 'Height': [3,9, 2, 7]})
array1 = np.array([ 1, 5])
array2 = np.array([2, 3])
df1.set_index('Name', inplace=True)
df2.set_index('Name', inplace=True)
df3 = pd.concat([df1, df2[~df2.index.isin(df1.index)]])
for i, col in enumerate(df1.columns):
if array2[[i]] > array1[[i]]:
df3[col].update(df2[col])
print(df3)
Note: You have to set Name
as index in order to update the right rows
Output:
Age Height
Name
A 4 7
B 6 2
C 4 1
D 4 4
E 5 9
Z 8 7
I you have more than two dataframes in a list, you'll have to store your arrays in a list as well and iterate over the dataframe list while keeping track of the highest array values in a new array.
pandas join dataframes based on conditions
You could filter on your condition after creating the cross-joined table:
output_df = pd.merge(df_pos, df_emp, how='outer', on='Country')
condition = (output_df.level_x - output_df.level_y).between(-1, 1)
output_df = df_merged[condition][['Pos_id', 'Emp_id']]
Joining two pandas dataframes based on multiple conditions
You need an inner merge, specifying both merge columns in each case:
res = df_a.merge(df_b, how='inner', left_on=['A', 'B'], right_on=['A', 'B_new'])
print(res)
A B C D E B_new F
0 x1 Apple 0.3 0.9 0.6 Apple 0.3
1 x1 Orange 0.1 0.5 0.2 Orange 0.1
2 x2 Apple 0.2 0.2 0.1 Apple 0.2
3 x2 Orange 0.3 0.4 0.9 Orange 0.3
4 x2 Mango 0.1 0.2 0.3 Mango 0.1
5 x3 Orange 0.3 0.1 0.2 Orange 0.3
Pandas: how to make a join based on a condition between two columns of 2 seperate dataframes
one liner:
df3 = pd.merge(df1,df2, on="B")\
.where((df1.D-df2.D)==1)\
.dropna()\
.reset_index(drop=True)
out:
A B C D_x E F Y D_y
0 1.0 2015-02-27 1.0 5.0 train foo 1.0 4.0
Join pandas dataframes based on different conditions
There is merge
and query
:
(df1.merge(df2, on=['var'], suffixes=['_a','_b'])
.query('date_a > date_b')
)
Output:
id var date_a date_b
1 1 ABCD 2019-01-01 2017-06-01
2 1 ABCD 2019-01-01 2016-01-01
5 1 ABCD 2017-06-01 2016-01-01
8 1 ABCD 2016-06-01 2016-01-01
Pandas merge by condition
You don't need to create the "next_created" column. Just use merge_asof
and then merge
:
#convert the created columns to datetime if needed
df1["created"] = pd.to_datetime(df1["created"])
df2["created"] = pd.to_datetime(df2["created"])
df3 = pd.merge_asof(df2, df1, by='id', on="created")
output = df1.merge(df3.drop("created", axis=1), how="left")
>>> output
process type country id created product
0 buying in_progress usa 022 2021-07-01 apple
1 selling in_progress NaN 022 2021-07-03 NaN
2 searhicng end usa 022 2021-07-04 orange
3 searhicng end usa 022 2021-07-04 watermelow
4 repairing in_progress ghana 011 2021-07-05 NaN
5 preparing end ghana 011 2021-07-09 NaN
6 selling in_progress ghana 011 2021-07-10 qiwi
7 selling in_progress ghana 011 2021-07-10 pear
8 selling in_progress ghana 011 2021-07-10 cherry
9 buying in_progress portugal 011 2021-07-15 apple
10 searching end portugal 011 2021-07-17 NaN
11 selling in_progress portugal 011 2021-07-19 qiwi
12 searching end england 011 2021-07-21 cherry
13 searching end england 011 2021-07-21 orange
Related Topics
Extract File Name from Read_Csv - Python
How to Replace Negative Numbers in Pandas Data Frame by Zero
Swapping List Elements Effectively in Python
Capturing Video from Two Cameras in Opencv At Once
How to Convert Dict Value to a Float
Write a Dictionary With Multiple Values to Store Data in Columns in the CSV File
Fitting a Straight Line to a Log-Log Curve in Matplotlib
Comparing Digits in an Integer in Python
Which Is Faster and Why Set or List
Exclude First Row When Importing Data from Excel into Python
Convert Timedelta to Floating-Point
Cannot Find Reference 'Xxx' in _Init_.Py
Print 5 Items in a Row on Separate Lines for a List
Format/Suppress Scientific Notation from Pandas Aggregation Results
Sort List Based on Another List
Django.Db.Utils.Operationalerror: (1045, Access Denied for User '<User>'@'Localhost'