Shift a column of lists in data.table by group
This has come up more than once. So I've gone ahead and added this feature. You'll have to use the development version at the moment though, v1.9.7.. see installation instructions here.
DT[, foo2 := shift(.(foo), type = "lead"),
by = id]
# foo id foo2
# 1: a,b,c 1 b,c
# 2: b,c 1 NA
# 3: a,b 2 a
# 4: a 2 NA
Just wrap foo
for each group in a list. Note that it returns a list-of-list which works well with :=
as shown above.. If you're not adding/updating your data.table (which doesn't make much sense), then you'll have to extract the list element.
DT[, .(foo2 = shift(.(foo), type="lead")[[1L]]),
by = id]
# id foo2
# 1: 1 b,c
# 2: 1 NA
# 3: 2 a
# 4: 2 NA
shift()
is designed to play nicely with data.table's :=
syntax, since it returns the same number of rows all the time.
R data table : Shifting rows of list type
You could use a manual shift, like the following.
x[, m := c(NA_real_, head(l, -1L))]
resulting in
k l m
1: 1 4,5 NA
2: 2 4,5 4,5
3: 3 4,5 4,5
4: 4 4,5 4,5
5: 5 4,5 4,5
For a larger shift, you could roll your own function.
mshift <- function(var, n) c(NA[1:n], head(var, -n))
Then use it to shift two places.
x[, m := mshift(l, 2)]
which gives, from the original data
k l m
1: 1 4,5 NA
2: 2 4,5 NA
3: 3 4,5 4,5
4: 4 4,5 4,5
5: 5 4,5 4,5
Obviously, this function is very basic and only shifts to the right (down). If you wanted to, you could adjust the function to shift in the opposite direction and add some class checking/matching as well.
Shift multiple columns, each with a different offset
You can use Map
to apply a different n
to each column:
cols <- setdiff(names(DT), "date")
DT[, (cols) := Map(shift, .SD, seq_along(.SD) - 1L, fill = 0), .SDcols = cols]
> DT
date a b c d e f
1: 2008 1 0 0 0 0 0
2: 2008 3 5 0 0 0 0
3: 2008 2 6 3 0 0 0
4: 2009 5 8 2 6 0 0
5: 2009 3 5 3 1 9 0
6: 2010 2 3 3 4 5 8
Use pandas.shift() within a group
Pandas' grouped objects have a groupby.DataFrameGroupBy.shift
method, which will shift a specified column in each group n periods
, just like the regular dataframe's shift
method:
df['prev_value'] = df.groupby('object')['value'].shift()
For the following example dataframe:
print(df)
object period value
0 1 1 24
1 1 2 67
2 1 4 89
3 2 4 5
4 2 23 23
The result would be:
object period value prev_value
0 1 1 24 NaN
1 1 2 67 24.0
2 1 4 89 67.0
3 2 4 5 NaN
4 2 23 23 5.0
data.table from long colum to list by grup
We can wrap it in a list
dd <- d[, .(V2 = list(V2)), V1]
head(dd)
# V1 V2
#1: c Z,W,K,G,Q,A
#2: a V,X,T,D,K
#3: w Z,I,N
#4: u N,Y,H,U,M,Z,...
#5: d G,M,D,B
#6: q O,Z,K,V,I,X,...
str(dd)
#Classes ‘data.table’ and 'data.frame': 25 obs. of 2 variables:
# $ V1: chr "c" "a" "w" "u" ...
# $ V2:List of 25
# ..$ : chr "Z" "W" "K" "G" ...
# ..$ : chr "V" "X" "T" "D" ...
# ..$ : chr "Z" "I" "N"
# ..$ : chr "N" "Y" "H" "U" ...
# ..$ : chr "G" "M" "D" "B"
# ..
Shifting groups of data in a data frame by varying amounts
Join the two dataframes and shift each var
by the corresponding shiftnum
value.
library(dplyr)
df %>%
left_join(shift.values, by = 'group') %>%
group_by(group) %>%
mutate(var.shift = lag(var, first(shiftnum))) %>%
ungroup()
# date group var shiftnum var.shift
# <date> <chr> <dbl> <dbl> <dbl>
# 1 2021-01-01 a 3.66 1 NA
# 2 2021-01-02 a 5.06 1 3.66
# 3 2021-01-03 a 2.07 1 5.06
# 4 2021-01-04 a 7.12 1 2.07
# 5 2021-01-05 a 0.833 1 7.12
# 6 2021-01-01 b 2.88 3 NA
# 7 2021-01-02 b 4.39 3 NA
# 8 2021-01-03 b 6.58 3 NA
# 9 2021-01-04 b 1.47 3 2.88
#10 2021-01-05 b 2.66 3 4.39
#11 2021-01-01 c 2.70 2 NA
#12 2021-01-02 c 2.47 2 NA
#13 2021-01-03 c 2.35 2 2.70
#14 2021-01-04 c 2.95 2 2.47
#15 2021-01-05 c 3.96 2 2.35
#16 2021-01-01 d 4.58 3 NA
#17 2021-01-02 d 0.182 3 NA
#18 2021-01-03 d 1.39 3 NA
#19 2021-01-04 d 1.93 3 4.58
#20 2021-01-05 d 1.73 3 0.182
Remove the shiftnum
column from the output if not needed by adding %>% select(-shiftnum)
.
different n (offset) for shift within each group
Here's an approach with a rolling join.
First, we subset the data on Action == "A"
and Action == "X"
and join the two subsets onto each other. We use on = c("Case","Time")
to join on cases that are the same and then time. In data.table
, you can only roll on the last join condition. We then use roll = Inf
to roll forward. For some reason, the column you roll on is combined during the join, so we create and extra copy called InitialTime
.
The rolling join rolls forward to all possible value in the positive direction, so then we subset by Case
to the minimum Time
for all combinations of Case
and InitialTime
.
library(data.table)
data[Action == "A",.(Case,Action,Time,InitialTime=Time)][
data[Action == "X",], on = c("Case","Time"), roll = Inf][
,.SD[which.min(Time),.(XTime=Time)],by = .(Case,InitialTime)]
Case InitialTime XTime
1: 1 2020-01-23 12:55:00 2020-04-16 17:50:00
2: 2 2020-01-25 23:04:00 2020-02-12 17:50:00
3: 3 2020-01-26 03:23:00 2020-02-18 21:27:00
4: 3 2020-03-15 03:23:00 2020-03-18 21:27:00
Sample Data
data <- structure(list(Case = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L), Action = structure(c(1L, 2L, 3L, 4L, 1L, 4L, 4L, 1L,
3L, 4L, 1L, 4L), .Label = c("A", "B", "C", "X"), class = "factor"),
Time = structure(c(1579802100, 1580026980, 1580203380, 1587073800,
1580011440, 1581547800, 1581634200, 1580026980, 1582078980,
1582079220, 1584256980, 1584581220), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -12L), class = c("data.table",
"data.frame"))
Related Topics
Extract Hyperlink from Excel File in R
Group Vector on Conditional Sum
Row-Wise Sum of Values Grouped by Columns with Same Name
Subset Dataframe Based on Posixct Date and Time Greater Than Datetime Using Dplyr
Enclosing Variables Within for Loop
The Rolling Regression in R Using Roll Apply
R: Compare All the Columns Pairwise in Matrix
How to Pad a Vector with Na from the Front
From Long to Wide Data with Multiple Columns
"Unpacking" a Factor List from a Data.Frame
Plot with Ggplot in For-Loop Doesn't Work
Remove Duplicates Column Combinations from a Dataframe in R
Create a Histogram for Weighted Values
How to Underline Text in a Plot Title or Label? (Ggplot2)
Constroptim in R - Init Val Is Not in the Interior of the Feasible Region Error
How to Optimize for Integer Parameters (And Other Discontinuous Parameter Space) in R
Using User-Defined "For Loop" Function to Construct a Data Frame