Comparing Previous Row Values in Pandas Dataframe

Comparing previous row values in Pandas DataFrame

You need eq with shift:

df['match'] = df.col1.eq(df.col1.shift())
print (df)
   col1  match
0     1  False
1     3  False
2     3   True
3     1  False
4     2  False
5     3  False
6     2  False
7     2   True

Or instead eq use ==, but it is a bit slowier in large DataFrame:

df['match'] = df.col1 == df.col1.shift()
print (df)
   col1  match
0     1  False
1     3  False
2     3   True
3     1  False
4     2  False
5     3  False
6     2  False
7     2   True

Timings:

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print (df)
#[80000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

df['match'] = df.col1 == df.col1.shift()
df['match1'] = df.col1.eq(df.col1.shift())
print (df)

In [208]: %timeit df.col1.eq(df.col1.shift())
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 933 µs per loop

In [209]: %timeit df.col1 == df.col1.shift()
1000 loops, best of 3: 1 ms per loop

Comparing previous row values in Pandas DataFrame in different column

Use shift and between to compare a row with the previous one:

>>> df[0].loc[df[0].between(df[0].shift(), df[1].shift())]
1    1680
2    5000
Name: 0, dtype: int64

Details of shift:

>>> pd.concat([df[0], df.shift()], axis=1)
       0        0        1
0      0      NaN      NaN
1   1680      0.0   4999.0
2   5000   1680.0   7501.0
3  14999   5000.0  10000.0
4  17000  14999.0  16777.0

Compare rows and remove previous row in pandas

You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'

#import packages
import pandas as pd
import numpy as np

#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
     'amount': [25,25,26,40,41],
     'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)

#mask the df on two conditions 
df[((df['username'].shift() == df['username']) &  #keep if above user is the same
     (df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1

Python - compare previous row value and fill upwards when max is reached

groupby increasing stretches and transform with the last value:

df['Column2'] = (df.groupby(df['Column2'].diff().lt(0).cumsum())['Column2']
                   .transform('last')
                 )

output:

   Column1  Column2
0        1        5
1        2        5
2        3        5
3        4        4
4        5        4
5        6        5
6        7        5
7        8        5

intermediate to define the group:

df['Column2'].diff().lt(0).cumsum()

0    0
1    0
2    0
3    1
4    1
5    2
6    2
7    2
Name: Column2, dtype: int64

Comparing previous row values in Pandas DataFrame with Condition

The expected output is not fully clear. If you want to have False where the condition of equality on col3 is unmet:

df['col2_alternative1'] = (  df['col1'].gt(df['col1'].mul(2).shift())
                           & df['col3'].eq(df['col3'].shift()) )

If you want to have (boolean NA):

df['col2_alternative2'] = (df['col1'].gt(df['col1'].mul(2).shift())
                             .mask(df['col3'].ne(df['col3'].shift()))
                             .astype('boolean')
                          )

output:

    col1 col3   col2  col2_alternative1  col2_alternative2
0     10   AB  False              False               <NA>
1     30   AB   True               True               True
2     11   AB  False              False              False
3     24   AB   True               True               True
4     22   AB  False              False              False
5     50   AB   True               True               True
6     12   AB  False              False              False
7     10   AB  False              False              False
8     30   AB   True               True               True
9     31   AB  False              False              False
10    32   AB  False              False              False
11    33   AB  False              False              False
12    20   AB  False              False              False
13    41   AC   True              False               <NA>
14    44   AC  False              False              False

Comparing value with previous row in Pandas DataFrame

Compare shifted values by Series.gt and Series.shift,last missing value is replaced to -1 for True, working, if all values are positive:

df['match'] = df['col1'].gt(df['col1'].shift(-1, fill_value=-1))
print (df)

   col1  match
0     1  False
1     3  False
2     3   True
3     1  False
4     2  False
5     3   True
6     2  False
7     2   True

If need last value set to True for any Dataframe:

df['match'] = df['col1'].gt(df['col1'].shift(-1))
df.loc[df.index[-1], 'match'] = True

How to compare values from previous row for % difference

You could add a column (per transport way) taking the value of your condition:

df['hasDecreasedMarkedly'] = (df['Bus'] - df['Bus'].shift(1))/df['Bus'] >= 0.05

Comparing previous row values of every column in a dataframe

Use:

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'B', 'B', 'B', 'B'], 'counter': [1, 2, 3, 1, 2, 3, 4], 'valueA': [10, 10, 5, 1, 1, 2, 3], 'valueB': [1, 1, 1, 2, 3, 4, 4]})

print (df)

c = ['valueA','valueB']
df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).astype(int)
print (df)
  ID  counter  valueA  valueB
0  A        1       0       0
1  A        2       0       0
2  A        3       1       0
3  B        1       0       0
4  B        2       0       1
5  B        3       1       1
6  B        4       1       0

For counter per groups I try this solution, but still different output:

df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).groupby(df['ID']).cumsum()
print (df)
  ID  counter  valueA  valueB
0  A        1       0       0
1  A        2       0       0
2  A        3       1       0
3  B        1       0       0
4  B        2       0       1
5  B        3       1       2
6  B        4       2       2

Note: For a success the newest pandas version should be installed