Pandas Equivalent of Oracle Lead/Lag Function

Pandas equivalent of Oracle Lead/Lag function

You could perform a groupby/apply (shift) operation:

In [15]: df['Data_lagged'] = df.groupby(['Group'])['Data'].shift(1)

In [16]: df
Out[16]:
Date Group Data Data_lagged
2014-05-14 09:10:00 A 1 NaN
2014-05-14 09:20:00 A 2 1
2014-05-14 09:30:00 A 3 2
2014-05-14 09:40:00 A 4 3
2014-05-14 09:50:00 A 5 4
2014-05-14 10:00:00 B 1 NaN
2014-05-14 10:10:00 B 2 1
2014-05-14 10:20:00 B 3 2
2014-05-14 10:30:00 B 4 3

[9 rows x 4 columns]

To obtain the ORDER BY Date ASC effect, you must sort the DataFrame first:

df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
.groupby(['Group'])['Data'].shift(1))

Groupby and lag all columns of a dataframe?

IIUC, you can simply use level="grp" and then shift by -1:

>>> shifted = df.groupby(level="grp").shift(-1)
>>> df.join(shifted.rename(columns=lambda x: x+"_lag"))
col1 col2 col1_lag col2_lag
time grp
2015-11-20 A 1 a 2 b
2015-11-21 A 2 b 3 c
2015-11-22 A 3 c NaN NaN
2015-11-23 B 1 a 2 b
2015-11-24 B 2 b 3 c
2015-11-25 B 3 c NaN NaN

Is there a similar pandas/numpy function to group_by lead/lag in dplyr with ifelse statements?

Using transform + idxmax

cno = example_data['contract_no']
ob = example_data['outstanding_balance']
md = example_data['maturity_date']
drc = example_data['date_report_created']

i = ob.eq(0).groupby(cno).transform('idxmax')
j = md.eq(drc).groupby(cno).transform('idxmax')

i.eq(j).view('i1')

0    1
1 1
2 0
3 0
4 0
5 0
dtype: int8

Is there an equivalent of SQL GROUP BY ROLLUP in Python pandas?

Refer to this answer Pandas Pivot tables row subtotals

It uses pivot_table() with margins=True to add a totals column

Then some reshaping of the pivot_table through the use of stack()

Not as slick as group by rollup, but it works



Related Topics



Leave a reply



Submit