Python Comparing Previous and Next Row Value

Compare current row value to previous row values

Map the time like values in columns start_time and end_time to pandas TimeDelta objects and subtract 1 seconds from the 00:00:00 timedelta values in end_time column.

c = ['start_time', 'end_time']
s, e = df[c].astype(str).apply(pd.to_timedelta).to_numpy().T
e[e == pd.Timedelta(0)] += pd.Timedelta(days=1, seconds=-1)

Then for each pair of start_time and end_time in the dataframe df mark the corresponding duplicate intervals using numpy broadcasting:

m = (s[:, None] >= s) & (e[:, None] <= e)
np.fill_diagonal(m, False)
df['isDupe'] = (m.any(1) & ~df[c].duplicated(keep=False)).view('i1')


# example 1
start_time end_time isDupe
0 00:12:38 01:00:02 0
1 00:55:14 01:00:02 1
2 01:00:02 01:32:40 0
3 01:00:02 01:08:40 1
4 01:41:22 03:56:23 0
5 18:58:26 19:16:49 0
6 20:12:37 20:52:49 0
7 20:55:16 22:02:50 0
8 22:21:24 22:48:50 0
9 23:11:30 00:00:00 0

# example 2
start_time end_time isDupe
0 13:32:54 13:32:55 0
1 13:36:10 13:50:16 0
2 13:37:54 13:38:14 1
3 13:46:38 13:46:45 1
4 13:48:59 13:49:05 1
5 13:50:16 13:50:20 0
6 14:03:39 14:03:49 0
7 15:36:20 15:36:20 0
8 15:46:47 15:46:47 0

Comparing previous row values in Pandas DataFrame

You need eq with shift:

df['match'] = df.col1.eq(df.col1.shift())
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True

Or instead eq use ==, but it is a bit slowier in large DataFrame:

df['match'] = df.col1 == df.col1.shift()
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True

Timings:

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print (df)
#[80000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

df['match'] = df.col1 == df.col1.shift()
df['match1'] = df.col1.eq(df.col1.shift())
print (df)

In [208]: %timeit df.col1.eq(df.col1.shift())
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 933 µs per loop

In [209]: %timeit df.col1 == df.col1.shift()
1000 loops, best of 3: 1 ms per loop

Compare rows and remove previous row in pandas

You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'

#import packages
import pandas as pd
import numpy as np

#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)

#mask the df on two conditions
df[((df['username'].shift() == df['username']) & #keep if above user is the same
(df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1

Compare previous and next different values in a Pandas column

You can do it with unique value in a row and then reindex like:

s = df['col1'] #to ease the code
#where the value is not the same as before
m = s.diff().ne(0)
# unique value if following
su = s[m].reset_index(drop=True)
print (su)
# 0 10
# 1 5
# 2 10
# 3 4
# 4 5
# Name: col1, dtype: int64

#create columns in df to align previous and after not equal value
df['col1_after'] = su.reindex(m.cumsum().values).values
df['col1_before'] = su.reindex(m.cumsum().values-2).values
#create col2 where the two previous columns are equal
df['col2'] = df['col1_after'].eq(df['col1_before'])

and you get

print (df)
col1 col1_after col1_before col2
0 10 5.0 NaN False
1 10 5.0 NaN False
2 5 10.0 10.0 True
3 5 10.0 10.0 True
4 5 10.0 10.0 True
5 10 4.0 5.0 False
6 4 5.0 10.0 False
7 4 5.0 10.0 False
8 4 5.0 10.0 False
9 4 5.0 10.0 False
10 4 5.0 10.0 False
11 5 NaN 4.0 False
12 5 NaN 4.0 False

Note you can do df.drop(['col1_after','col1_before'], axis=1) to remove not necessary columns, I left them here to show what is happening

pandas groupby comparing string value with previous row value and spot changes in new columns

Building on the sum you've created (named g), we can groupby the first 2 levels of the index and shift it, then join it back to g. After rename-ing columns, mask "To" and "From" columns depending on if there was any change or if it's NaN. Finally, join this back to the DataFrame:

g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()

Output:

  customer       Good        Date      Flavor       From          To
0 Jason Cookie 2021-12-14 Chocolate
1 Jason Cookie 2022-01-04 Vanilla Chocolate Vanilla
2 Jason Cookie 2022-01-11 Vanilla
3 Jason Cookie 2022-01-18 Strawberry Vanilla Strawberry
4 Molly Ice Cream 2022-01-12 Chocolate
5 Molly Ice Cream 2022-01-15 Vanilla Chocolate Vanilla
6 Molly Ice Cream 2022-01-19 Caramel Vanilla Caramel
7 Molly Ice Cream 2022-01-30 Caramel

Comparing values of each row with previous value for unique date

You can try df.shift and np.where:

dataset['new_col'] = np.where(dataset['arr_delay'].shift(-1) < dataset['arr_delay'], 1, 0)

Edit

dataset['new_col'] = 0
for unique in dataset.Data.unique():
new_df = dataset[dataset.Date == unique].copy()
new_df['new_col'] = np.where(new_df['arr_delay'].shift(-1) < new_df['arr_delay'], 1, 0)
dataset.loc[dataset.Date == unique] = new_df

Edit 2 For the expected format try df.pivot

dataset.pivot(index = 'Date', columns = 'Aircraft', values ='new_col)


Related Topics



Leave a reply



Submit