Compare current row value to previous row values
Map the time
like values in columns start_time
and end_time
to pandas TimeDelta
objects and subtract 1 seconds
from the 00:00:00
timedelta values in end_time
column.
c = ['start_time', 'end_time']
s, e = df[c].astype(str).apply(pd.to_timedelta).to_numpy().T
e[e == pd.Timedelta(0)] += pd.Timedelta(days=1, seconds=-1)
Then for each pair of start_time
and end_time
in the dataframe df
mark the corresponding duplicate intervals using numpy broadcasting
:
m = (s[:, None] >= s) & (e[:, None] <= e)
np.fill_diagonal(m, False)
df['isDupe'] = (m.any(1) & ~df[c].duplicated(keep=False)).view('i1')
# example 1
start_time end_time isDupe
0 00:12:38 01:00:02 0
1 00:55:14 01:00:02 1
2 01:00:02 01:32:40 0
3 01:00:02 01:08:40 1
4 01:41:22 03:56:23 0
5 18:58:26 19:16:49 0
6 20:12:37 20:52:49 0
7 20:55:16 22:02:50 0
8 22:21:24 22:48:50 0
9 23:11:30 00:00:00 0
# example 2
start_time end_time isDupe
0 13:32:54 13:32:55 0
1 13:36:10 13:50:16 0
2 13:37:54 13:38:14 1
3 13:46:38 13:46:45 1
4 13:48:59 13:49:05 1
5 13:50:16 13:50:20 0
6 14:03:39 14:03:49 0
7 15:36:20 15:36:20 0
8 15:46:47 15:46:47 0
Comparing previous row values in Pandas DataFrame
You need eq
with shift
:
df['match'] = df.col1.eq(df.col1.shift())
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True
Or instead eq
use ==
, but it is a bit slowier in large DataFrame:
df['match'] = df.col1 == df.col1.shift()
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True
Timings:
import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print (df)
#[80000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
df['match'] = df.col1 == df.col1.shift()
df['match1'] = df.col1.eq(df.col1.shift())
print (df)
In [208]: %timeit df.col1.eq(df.col1.shift())
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 933 µs per loop
In [209]: %timeit df.col1 == df.col1.shift()
1000 loops, best of 3: 1 ms per loop
Compare rows and remove previous row in pandas
You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'
#import packages
import pandas as pd
import numpy as np
#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)
#mask the df on two conditions
df[((df['username'].shift() == df['username']) & #keep if above user is the same
(df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1
Compare previous and next different values in a Pandas column
You can do it with unique value in a row and then reindex
like:
s = df['col1'] #to ease the code
#where the value is not the same as before
m = s.diff().ne(0)
# unique value if following
su = s[m].reset_index(drop=True)
print (su)
# 0 10
# 1 5
# 2 10
# 3 4
# 4 5
# Name: col1, dtype: int64
#create columns in df to align previous and after not equal value
df['col1_after'] = su.reindex(m.cumsum().values).values
df['col1_before'] = su.reindex(m.cumsum().values-2).values
#create col2 where the two previous columns are equal
df['col2'] = df['col1_after'].eq(df['col1_before'])
and you get
print (df)
col1 col1_after col1_before col2
0 10 5.0 NaN False
1 10 5.0 NaN False
2 5 10.0 10.0 True
3 5 10.0 10.0 True
4 5 10.0 10.0 True
5 10 4.0 5.0 False
6 4 5.0 10.0 False
7 4 5.0 10.0 False
8 4 5.0 10.0 False
9 4 5.0 10.0 False
10 4 5.0 10.0 False
11 5 NaN 4.0 False
12 5 NaN 4.0 False
Note you can do df.drop(['col1_after','col1_before'], axis=1)
to remove not necessary columns, I left them here to show what is happening
pandas groupby comparing string value with previous row value and spot changes in new columns
Building on the sum
you've created (named g
), we can groupby
the first 2 levels of the index and shift
it, then join
it back to g
. After rename
-ing columns, mask
"To" and "From" columns depending on if there was any change or if it's NaN. Finally, join
this back to the DataFrame:
g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()
Output:
customer Good Date Flavor From To
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla Chocolate Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry Vanilla Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla Chocolate Vanilla
6 Molly Ice Cream 2022-01-19 Caramel Vanilla Caramel
7 Molly Ice Cream 2022-01-30 Caramel
Comparing values of each row with previous value for unique date
You can try df.shift and np.where:
dataset['new_col'] = np.where(dataset['arr_delay'].shift(-1) < dataset['arr_delay'], 1, 0)
Edit
dataset['new_col'] = 0
for unique in dataset.Data.unique():
new_df = dataset[dataset.Date == unique].copy()
new_df['new_col'] = np.where(new_df['arr_delay'].shift(-1) < new_df['arr_delay'], 1, 0)
dataset.loc[dataset.Date == unique] = new_df
Edit 2 For the expected format try df.pivot
dataset.pivot(index = 'Date', columns = 'Aircraft', values ='new_col)
Related Topics
How to Select Percentage of Rows in Pandas Dataframe
How to Remove Carriage Return in a Dataframe
Sys.Path Different in Jupyter and Python - How to Import Own Modules in Jupyter
How to Upgrade the Sqlite Version Used by Python'S Sqlite3 Module on Mac
Add Excel File Attachment When Sending Python Email
Get the Last Sunday and Saturday'S Date in Python
Fill With Nan When Length of Values Does Not Match Length of Index
Get Row Value of Maximum Count After Applying Group by in Pandas
Issue in Using Win32Com to Access Excel File
How to Find Words in a List That Starts With a Certain Letter the User Asked For
Create New Column Based on String
Pandas Join Dataframes Based on Conditions
Use Python Extract Images from Excel Sheets
Check Type: How to Check If Something Is a Rdd or a Dataframe
Iterate Through a List by Skipping Every 5Th Element
How to Find the Most Common Element in the List of List in Python
Better Way to Extract Only 2Nd Column of a Txt File in Python