Comparing previous row values in Pandas DataFrame
You need eq
with shift
:
df['match'] = df.col1.eq(df.col1.shift())
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True
Or instead eq
use ==
, but it is a bit slowier in large DataFrame:
df['match'] = df.col1 == df.col1.shift()
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 False
6 2 False
7 2 True
Timings:
import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print (df)
#[80000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
df['match'] = df.col1 == df.col1.shift()
df['match1'] = df.col1.eq(df.col1.shift())
print (df)
In [208]: %timeit df.col1.eq(df.col1.shift())
The slowest run took 4.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 933 µs per loop
In [209]: %timeit df.col1 == df.col1.shift()
1000 loops, best of 3: 1 ms per loop
Comparing previous row values in Pandas DataFrame in different column
Use shift
and between
to compare a row with the previous one:
>>> df[0].loc[df[0].between(df[0].shift(), df[1].shift())]
1 1680
2 5000
Name: 0, dtype: int64
Details of shift
:
>>> pd.concat([df[0], df.shift()], axis=1)
0 0 1
0 0 NaN NaN
1 1680 0.0 4999.0
2 5000 1680.0 7501.0
3 14999 5000.0 10000.0
4 17000 14999.0 16777.0
Compare rows and remove previous row in pandas
You can groupby and mask the dataframe on two conditions using (1) .shift() to compare with previous 'username' and (2) .diff() to handle the difference in 'amount'
#import packages
import pandas as pd
import numpy as np
#create the df
d = {'username': ['amy123', 'bob1', 'amy123', 'bob1', 'bob1'],
'amount': [25,25,26,40,41],
'verified': ['no','yes','yes','yes','yes']}
df = pd.DataFrame.from_dict(d)
#mask the df on two conditions
df[((df['username'].shift() == df['username']) & #keep if above user is the same
(df.groupby('username')['amount'].diff() <= 1))] #keep if difference is less than or equal to 1
Python - compare previous row value and fill upwards when max is reached
groupby
increasing stretches and transform
with the last
value:
df['Column2'] = (df.groupby(df['Column2'].diff().lt(0).cumsum())['Column2']
.transform('last')
)
output:
Column1 Column2
0 1 5
1 2 5
2 3 5
3 4 4
4 5 4
5 6 5
6 7 5
7 8 5
intermediate to define the group:
df['Column2'].diff().lt(0).cumsum()
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 2
Name: Column2, dtype: int64
Comparing previous row values in Pandas DataFrame with Condition
The expected output is not fully clear. If you want to have False where the condition of equality on col3 is unmet:
df['col2_alternative1'] = ( df['col1'].gt(df['col1'].mul(2).shift())
& df['col3'].eq(df['col3'].shift()) )
If you want to have (boolean NA):
df['col2_alternative2'] = (df['col1'].gt(df['col1'].mul(2).shift())
.mask(df['col3'].ne(df['col3'].shift()))
.astype('boolean')
)
output:
col1 col3 col2 col2_alternative1 col2_alternative2
0 10 AB False False <NA>
1 30 AB True True True
2 11 AB False False False
3 24 AB True True True
4 22 AB False False False
5 50 AB True True True
6 12 AB False False False
7 10 AB False False False
8 30 AB True True True
9 31 AB False False False
10 32 AB False False False
11 33 AB False False False
12 20 AB False False False
13 41 AC True False <NA>
14 44 AC False False False
Comparing value with previous row in Pandas DataFrame
Compare shifted values by Series.gt
and Series.shift
,last missing value is replaced to -1
for True
, working, if all values are positive:
df['match'] = df['col1'].gt(df['col1'].shift(-1, fill_value=-1))
print (df)
col1 match
0 1 False
1 3 False
2 3 True
3 1 False
4 2 False
5 3 True
6 2 False
7 2 True
If need last value set to True
for any Dataframe:
df['match'] = df['col1'].gt(df['col1'].shift(-1))
df.loc[df.index[-1], 'match'] = True
How to compare values from previous row for % difference
You could add a column (per transport way) taking the value of your condition:
df['hasDecreasedMarkedly'] = (df['Bus'] - df['Bus'].shift(1))/df['Bus'] >= 0.05
Comparing previous row values of every column in a dataframe
Use:
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'B', 'B', 'B', 'B'], 'counter': [1, 2, 3, 1, 2, 3, 4], 'valueA': [10, 10, 5, 1, 1, 2, 3], 'valueB': [1, 1, 1, 2, 3, 4, 4]})
print (df)
c = ['valueA','valueB']
df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).astype(int)
print (df)
ID counter valueA valueB
0 A 1 0 0
1 A 2 0 0
2 A 3 1 0
3 B 1 0 0
4 B 2 0 1
5 B 3 1 1
6 B 4 1 0
For counter per groups I try this solution, but still different output:
df[c] = df[c].ne(df[c].groupby(df['ID']).shift().bfill()).groupby(df['ID']).cumsum()
print (df)
ID counter valueA valueB
0 A 1 0 0
1 A 2 0 0
2 A 3 1 0
3 B 1 0 0
4 B 2 0 1
5 B 3 1 2
6 B 4 2 2
Note: For a success the newest pandas version should be installed
Related Topics
Comparing Boolean and Int Using Isinstance
How to Get Item's Position in a List
Downloading with Chrome Headless and Selenium
Getting Rid of Console Output When Freezing Python Programs Using Pyinstaller
How to Write Binary Data to Stdout in Python 3
Parsing a JSON String Which Was Loaded from a CSV Using Pandas
How to Include Related Model Fields Using Django Rest Framework
How to Extract an Arbitrary Line of Values from a Numpy Array
How to Solve Readtimeouterror: Httpsconnectionpool(Host='Pypi.Python.Org', Port=443) with Pip
Pyserial Non-Blocking Read Loop
How to Join Two Wav Files Using Python
Plotting a Fast Fourier Transform in Python
Multiple Levels of 'Collection.Defaultdict' in Python
Creating Over 20 Unique Legend Colors Using Matplotlib
How to Get a Thread Safe Print in Python 2.6