Calculate Difference Between Values in Consecutive Rows by Group

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

Calculating the difference between consecutive rows by group using dplyr?

Like this:

dat %>% 
group_by(id) %>%
mutate(time.difference = time - lag(time))

pandas groupby dataframes, calculate diffs between consecutive rows

You can use groupby

s2= df.groupby(['cycleID'])['mean'].diff()
s2.dropna(inplace=True)

output

1   -8.453876e-12
3 -1.486037e-11
5 2.482933e-12
7 -3.388330e-12
8 3.000000e-12


UPDATE

d = [[1, 1.5020712104685252e-11],
[1, 6.56683605063102e-12],
[2, 1.3993315187144084e-11],
[2, -8.670502467042485e-13],
[3, 7.0270625256163566e-12],
[3, 9.509995221868016e-12],
[4, 1.2901435995915644e-11],
[4, 9.513106448422182e-12]]

df = pd.DataFrame(d, columns=['cycleID', 'mean'])



df2 = df.groupby(['cycleID']).diff().dropna().rename(columns={'mean': 'difference'})
df2['mean'] = df['mean'].iloc[df2.index]
       difference    mean
1 -8.453876e-12 6.566836e-12
3 -1.486037e-11 -8.670502e-13
5 2.482933e-12 9.509995e-12
7 -3.388330e-12 9.513106e-12

difference between rows within groups pandas

See if this helps:

#replaces any value that contains a string value, with a 0
df['col2'] = pd.to_numeric(df.col2, errors='coerce').fillna(0)
#sorts the column in ascending first and calculates the difference
df['diff']=df.sort_values(['col1','col2'],ascending=[1,1]).groupby('col1').diff()
#display the dataframe after sorting col1 in asc and col2 in desc
df.sort_values(['col1','col2'],ascending=[1,0])

Out:

Sample Image

computing datetime difference among consecutive rows in groupby DataFrame

Use GroupBy.diff and divide by a 1 month timedelta.

df['months'] = df.groupby('name')['date'].diff().div(pd.Timedelta(days=30.44), fill_value=0).round().astype(int)

output

    name       date  months
0 Mark 2018-01-01 0
1 Anne 2018-01-01 0
2 Anne 2018-02-01 1
3 Anne 2018-04-01 2
4 Anne 2018-09-01 5
5 Anne 2019-01-01 4
6 John 2018-02-01 0
7 John 2018-06-01 4
8 John 2019-02-01 8
9 Ethan 2018-03-01 0

Is there a function that will allow to get the difference between rows of the same type?

You can use diff in ave.

df$diff <- ave(df$y, df$x, FUN=function(z) c(diff(z), NA))

df
# x y diff
#1 Jimmy Page 1 1
#2 Jimmy Page 2 1
#3 Jimmy Page 3 1
#4 Jimmy Page 4 NA
#5 John Smith 5 2
#6 John Smith 7 82
#7 John Smith 89 NA
#8 Joe Root 12 22
#9 Joe Root 34 33
#10 Joe Root 67 28
#11 Joe Root 95 9579
#12 Joe Root 9674 NA

Difference between consecutive row groups using data table in R

We can get the diff of 'weight' grouped by by 'Game', multiply by -1 and concatenate both the values.

dt_in[, want := {v1 <- diff(weight); list(c(-v1, v1))} , by = Game]

dt_in
# Game ID weight want
#1: 1 1 150 30
#2: 1 2 120 -30
#3: 2 3 151 -9
#4: 2 4 160 9
#5: 3 5 190 20
#6: 3 6 170 -20
#7: 4 7 170 0
#8: 4 8 170 0

Or a compact option by @Frank

dt_in[, want := c(-1,1)*diff(weight), by=Game]


Related Topics



Leave a reply



Submit