Subtracting Values Across Grouped Data Frames in Pandas

Subtract rows in a grouped Dataframe

Try:

subtract = lambda x: x.iloc[0] - (x.iloc[1] if len(x) == 2 else 0)
out = df.groupby(['A', 'B'])['Value'].apply(subtract).reset_index()
print(out)

# Output:
A B Value
0 A1 B1 5.0
1 A1 B2 5.0
2 A2 B1 -2.0
3 A2 B2 1.0

How to subtract first and last values in grouped data for all columns in dataset using pandas

try:

df1 = df[['ID','BDI', 'GAD', 'TSQ']].groupby('ID').agg('first')-df[['ID','BDI', 'GAD', 'TSQ']].groupby('ID').agg('last')
df_final = df1.merge(df[['ID','age']].groupby('ID').agg('first'), on='ID')


BDI GAD TSQ age
ID
1 18 2 1 22
2 6 3 4 35

Second option using lambda to get the first part, then merge

df[['ID','BDI', 'GAD', 'TSQ']].groupby('ID', as_index=False).apply(lambda x: x.groupby('ID').agg('first')-x.groupby('ID').agg('last'))

Subtract successive rows in a dataframe grouped by id in pandas(Python)

You can use DataFrameGroupBy.diff:

df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
id day total_amount dif
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT

Another solution with apply shift:

df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
id day total_amount diff
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT

EDIT by comment:

If you need difference in hours as int, convert timedelta to hour:

df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) / 
np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN

how to use pandas to subtract rows of a column based upon data by group?

Let's try two steps:

s = df.sort_values(['ID','start_yr']).groupby(['ID'])['amt'].agg(['first','last'])
output = s['last'] - s['first']

Output:

ID
a 20
b 40
dtype: int64

DataFrame subtract group-wise means

If you use the transform method, e.g.,

means = df.groupby(group, axis=1).transform('mean')

then transform will a DataFrame of the same shape as df. This makes it easier to subtract means from df.

You can also pass a sequence, such as group=[1,1,1,2,2,3,3] to df.groupby instead of passing a column name. df.groupby(group, axis=1) will group the columns based on the sequence values. So, for example, to group according to the non-numeric part of each column name, you could use:

import numpy as np
import datetime as DT
np.random.seed(2016)
base = DT.date.today()
date_list = [base - DT.timedelta(days=x) for x in range(0, 10)]
df = pd.DataFrame(data=np.random.randint(1, 100, (10, 8)),
index=date_list,
columns=['a1', 'a2', 'b1', 'a3', 'b2', 'c1' , 'c2', 'b3'])

group = df.columns.str.extract(r'(\D+)', expand=False)
means = df.groupby(group, axis=1).transform('mean')
result = df - means
print(result)

which yields

            a1  a2  b1  a3  b2  c1  c2  b3
2016-05-18 29 29 53 29 53 23 23 53
2016-05-17 55 55 32 55 32 92 92 32
2016-05-16 59 59 53 59 53 50 50 53
2016-05-15 46 46 30 46 30 55 55 30
2016-05-14 56 56 28 56 28 28 28 28
2016-05-13 34 34 36 34 36 70 70 36
2016-05-12 39 39 64 39 64 48 48 64
2016-05-11 45 45 59 45 59 57 57 59
2016-05-10 55 55 30 55 30 37 37 30
2016-05-09 61 61 59 61 59 59 59 59

Subtract a different reference value for each group of rows in pandas

You can map column Nucleus by dict and then substract by sub:

REF_H = 30
REF_C = 180
d = {'C': REF_C, 'H':REF_H}
df['Delta'] = df.Nucleus.map(d).sub(df['Isotropic Shift'])
print (df)
Atom Number Nucleus Isotropic Shift Delta
0 0 1 C 49.3721 130.6279
1 1 2 C 52.9650 127.0350
2 2 3 C 36.3443 143.6557
3 3 4 C 50.8163 129.1837
4 4 5 C 50.0493 129.9507
5 5 6 C 49.7985 130.2015
6 6 7 H 24.0772 5.9228
7 7 8 H 23.7986 6.2014
8 8 9 H 24.2922 5.7078
9 9 10 H 24.1632 5.8368
10 10 11 H 24.1572 5.8428
11 11 12 C 102.9401 77.0599

Subtract a row from previous row which has value from previous group in DataFrame

If in E column are unique groups use DataFrameGroupBy.diff, replace mising values by original with Series.fillna and use Series.where with mask for consecutive values (compared for not equal shifted values) and then forward filling missing values with ffill and last to integers:

df['A1'] = (df.groupby('user')['A'].diff()
.fillna(df['A'])
.where(df['E'].ne(df['E'].shift()))
.ffill()
.astype(int))
print (df)
A E user A1
0 0 0 1 0
1 12 1 1 12
2 12 1 1 12
3 13 2 1 1
4 15 3 1 2
5 15 3 1 2
6 15 3 1 2
7 19 4 2 19
8 20 5 2 1
9 25 6 2 5
10 25 6 2 5


Related Topics



Leave a reply



Submit