Shift a Column of Lists in Data.Table by Group

Shift a column of lists in data.table by group

This has come up more than once. So I've gone ahead and added this feature. You'll have to use the development version at the moment though, v1.9.7.. see installation instructions here.

DT[, foo2 := shift(.(foo), type = "lead"), 
by = id]
# foo id foo2
# 1: a,b,c 1 b,c
# 2: b,c 1 NA
# 3: a,b 2 a
# 4: a 2 NA

Just wrap foo for each group in a list. Note that it returns a list-of-list which works well with := as shown above.. If you're not adding/updating your data.table (which doesn't make much sense), then you'll have to extract the list element.

DT[, .(foo2 = shift(.(foo), type="lead")[[1L]]), 
by = id]
# id foo2
# 1: 1 b,c
# 2: 1 NA
# 3: 2 a
# 4: 2 NA

shift() is designed to play nicely with data.table's := syntax, since it returns the same number of rows all the time.

R data table : Shifting rows of list type

You could use a manual shift, like the following.

x[, m := c(NA_real_, head(l, -1L))]

resulting in

   k   l   m
1: 1 4,5 NA
2: 2 4,5 4,5
3: 3 4,5 4,5
4: 4 4,5 4,5
5: 5 4,5 4,5

For a larger shift, you could roll your own function.

mshift <- function(var, n) c(NA[1:n], head(var, -n))

Then use it to shift two places.

x[, m := mshift(l, 2)]

which gives, from the original data

   k   l   m
1: 1 4,5 NA
2: 2 4,5 NA
3: 3 4,5 4,5
4: 4 4,5 4,5
5: 5 4,5 4,5

Obviously, this function is very basic and only shifts to the right (down). If you wanted to, you could adjust the function to shift in the opposite direction and add some class checking/matching as well.

Shift multiple columns, each with a different offset

You can use Map to apply a different n to each column:

cols <- setdiff(names(DT), "date")
DT[, (cols) := Map(shift, .SD, seq_along(.SD) - 1L, fill = 0), .SDcols = cols]

> DT
date a b c d e f
1: 2008 1 0 0 0 0 0
2: 2008 3 5 0 0 0 0
3: 2008 2 6 3 0 0 0
4: 2009 5 8 2 6 0 0
5: 2009 3 5 3 1 9 0
6: 2010 2 3 3 4 5 8

Use pandas.shift() within a group

Pandas' grouped objects have a groupby.DataFrameGroupBy.shift method, which will shift a specified column in each group n periods, just like the regular dataframe's shift method:

df['prev_value'] = df.groupby('object')['value'].shift()

For the following example dataframe:

print(df)

object period value
0 1 1 24
1 1 2 67
2 1 4 89
3 2 4 5
4 2 23 23

The result would be:

     object  period  value  prev_value
0 1 1 24 NaN
1 1 2 67 24.0
2 1 4 89 67.0
3 2 4 5 NaN
4 2 23 23 5.0

data.table from long colum to list by grup

We can wrap it in a list

dd <- d[, .(V2 = list(V2)), V1]
head(dd)
# V1 V2
#1: c Z,W,K,G,Q,A
#2: a V,X,T,D,K
#3: w Z,I,N
#4: u N,Y,H,U,M,Z,...
#5: d G,M,D,B
#6: q O,Z,K,V,I,X,...

str(dd)
#Classes ‘data.table’ and 'data.frame': 25 obs. of 2 variables:
# $ V1: chr "c" "a" "w" "u" ...
# $ V2:List of 25
# ..$ : chr "Z" "W" "K" "G" ...
# ..$ : chr "V" "X" "T" "D" ...
# ..$ : chr "Z" "I" "N"
# ..$ : chr "N" "Y" "H" "U" ...
# ..$ : chr "G" "M" "D" "B"
# ..

Shifting groups of data in a data frame by varying amounts

Join the two dataframes and shift each var by the corresponding shiftnum value.

library(dplyr)

df %>%
left_join(shift.values, by = 'group') %>%
group_by(group) %>%
mutate(var.shift = lag(var, first(shiftnum))) %>%
ungroup()

# date group var shiftnum var.shift
# <date> <chr> <dbl> <dbl> <dbl>
# 1 2021-01-01 a 3.66 1 NA
# 2 2021-01-02 a 5.06 1 3.66
# 3 2021-01-03 a 2.07 1 5.06
# 4 2021-01-04 a 7.12 1 2.07
# 5 2021-01-05 a 0.833 1 7.12
# 6 2021-01-01 b 2.88 3 NA
# 7 2021-01-02 b 4.39 3 NA
# 8 2021-01-03 b 6.58 3 NA
# 9 2021-01-04 b 1.47 3 2.88
#10 2021-01-05 b 2.66 3 4.39
#11 2021-01-01 c 2.70 2 NA
#12 2021-01-02 c 2.47 2 NA
#13 2021-01-03 c 2.35 2 2.70
#14 2021-01-04 c 2.95 2 2.47
#15 2021-01-05 c 3.96 2 2.35
#16 2021-01-01 d 4.58 3 NA
#17 2021-01-02 d 0.182 3 NA
#18 2021-01-03 d 1.39 3 NA
#19 2021-01-04 d 1.93 3 4.58
#20 2021-01-05 d 1.73 3 0.182

Remove the shiftnum column from the output if not needed by adding %>% select(-shiftnum).

different n (offset) for shift within each group

Here's an approach with a rolling join.

First, we subset the data on Action == "A" and Action == "X" and join the two subsets onto each other. We use on = c("Case","Time") to join on cases that are the same and then time. In data.table, you can only roll on the last join condition. We then use roll = Inf to roll forward. For some reason, the column you roll on is combined during the join, so we create and extra copy called InitialTime.

The rolling join rolls forward to all possible value in the positive direction, so then we subset by Case to the minimum Time for all combinations of Case and InitialTime.

library(data.table)
data[Action == "A",.(Case,Action,Time,InitialTime=Time)][
data[Action == "X",], on = c("Case","Time"), roll = Inf][
,.SD[which.min(Time),.(XTime=Time)],by = .(Case,InitialTime)]
Case InitialTime XTime
1: 1 2020-01-23 12:55:00 2020-04-16 17:50:00
2: 2 2020-01-25 23:04:00 2020-02-12 17:50:00
3: 3 2020-01-26 03:23:00 2020-02-18 21:27:00
4: 3 2020-03-15 03:23:00 2020-03-18 21:27:00

Sample Data

data <- structure(list(Case = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L), Action = structure(c(1L, 2L, 3L, 4L, 1L, 4L, 4L, 1L,
3L, 4L, 1L, 4L), .Label = c("A", "B", "C", "X"), class = "factor"),
Time = structure(c(1579802100, 1580026980, 1580203380, 1587073800,
1580011440, 1581547800, 1581634200, 1580026980, 1582078980,
1582079220, 1584256980, 1584581220), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -12L), class = c("data.table",
"data.frame"))


Related Topics



Leave a reply



Submit