Comparing Previous Row Values in Pandas Dataframe

Comparing previous row values in Pandas DataFrame

You need eq with shift:

df['match'] = df.col1.eq(df.col1.shift())
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True

Or instead eq use ==, but it is a bit slowier in large DataFrame:

df['match'] = df.col1 == df.col1.shift()
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True

Timings:

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print (df)
#[80000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

df['match'] = df.col1 == df.col1.shift()
df['match1'] = df.col1.eq(df.col1.shift())
print (df)

In [208]: %timeit df.col1.eq(df.col1.shift())
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 933 µs per loop

In [209]: %timeit df.col1 == df.col1.shift()
1000 loops, best of 3: 1 ms per loop

Comparing previous row values in Pandas DataFrame in different column

Use shift and between to compare a row with the previous one:

>>> df[0].loc[df[0].between(df[0].shift(), df[1].shift())]
1 1680
2 5000
Name: 0, dtype: int64

Details of shift:

>>> pd.concat([df[0], df.shift()], axis=1)
0 0 1
0 0 NaN NaN
1 1680 0.0 4999.0
2 5000 1680.0 7501.0
3 14999 5000.0 10000.0
4 17000 14999.0 16777.0

Compare rows and remove previous row in pandas

You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'

#import packages
import pandas as pd
import numpy as np

#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)

#mask the df on two conditions
df[((df['username'].shift() == df['username']) & #keep if above user is the same
(df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1

Python - compare previous row value and fill upwards when max is reached

groupby increasing stretches and transform with the last value:

df['Column2'] = (df.groupby(df['Column2'].diff().lt(0).cumsum())['Column2']
.transform('last')
)

output:

   Column1  Column2
0 1 5
1 2 5
2 3 5
3 4 4
4 5 4
5 6 5
6 7 5
7 8 5

intermediate to define the group:

df['Column2'].diff().lt(0).cumsum()

0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 2
Name: Column2, dtype: int64

Comparing previous row values in Pandas DataFrame with Condition

The expected output is not fully clear. If you want to have False where the condition of equality on col3 is unmet:

df['col2_alternative1'] = (  df['col1'].gt(df['col1'].mul(2).shift())
& df['col3'].eq(df['col3'].shift()) )

If you want to have (boolean NA):

df['col2_alternative2'] = (df['col1'].gt(df['col1'].mul(2).shift())
.mask(df['col3'].ne(df['col3'].shift()))
.astype('boolean')
)

output:

    col1 col3   col2  col2_alternative1  col2_alternative2
0 10 AB False False <NA>
1 30 AB True True True
2 11 AB False False False
3 24 AB True True True
4 22 AB False False False
5 50 AB True True True
6 12 AB False False False
7 10 AB False False False
8 30 AB True True True
9 31 AB False False False
10 32 AB False False False
11 33 AB False False False
12 20 AB False False False
13 41 AC True False <NA>
14 44 AC False False False

Comparing value with previous row in Pandas DataFrame

Compare shifted values by Series.gt and Series.shift,last missing value is replaced to -1 for True, working, if all values are positive:

df['match'] = df['col1'].gt(df['col1'].shift(-1, fill_value=-1))
print (df)

col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 True
6 2 False
7 2 True

If need last value set to True for any Dataframe:

df['match'] = df['col1'].gt(df['col1'].shift(-1))
df.loc[df.index[-1], 'match'] = True

How to compare values from previous row for % difference

You could add a column (per transport way) taking the value of your condition:

df['hasDecreasedMarkedly'] = (df['Bus'] - df['Bus'].shift(1))/df['Bus'] >= 0.05

Comparing previous row values of every column in a dataframe

Use:

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'B', 'B', 'B', 'B'], 'counter': [1, 2, 3, 1, 2, 3, 4], 'valueA': [10, 10, 5, 1, 1, 2, 3], 'valueB': [1, 1, 1, 2, 3, 4, 4]})

print (df)



c = ['valueA','valueB']
df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).astype(int)
print (df)
ID counter valueA valueB
0 A 1 0 0
1 A 2 0 0
2 A 3 1 0
3 B 1 0 0
4 B 2 0 1
5 B 3 1 1
6 B 4 1 0

For counter per groups I try this solution, but still different output:

df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).groupby(df['ID']).cumsum()
print (df)
ID counter valueA valueB
0 A 1 0 0
1 A 2 0 0
2 A 3 1 0
3 B 1 0 0
4 B 2 0 1
5 B 3 1 2
6 B 4 2 2

Note: For a success the newest pandas version should be installed



Related Topics



Leave a reply



Submit