Get the Last Row of a Previous Group in Data.Table

Get the last row of a previous group in data.table

You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1: A 1 NA NA
2: B 1 NA NA
3: C 2 B B
4: D 2 B B
5: E 2 B B
6: F 3 E E
7: G 3 E E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • shift(...) shifts the character vector in the previous call
  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 

Select last row by group for all columns data.table

Last row by group :

DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"]            # (1)

or, faster (avoid use of .SD where possible, for speed) :

w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]]    # (2)
DT[w]

Note that the following feature request will make approach (1) as fast as approach (2) :

FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.

Duplicate last row of every group in a data.table and change the value of one column

We can extract the last row and rbind with the dataset

library(data.table)
library(lubridate)
rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := Date %m+% months(1)])[order(ID)]
# ID Date v3 v4
# 1: 1 2015-01-01 a 5
# 2: 1 2015-02-01 b 5
# 3: 1 2015-03-01 f 1
# 4: 1 2015-04-01 z 5
# 5: 1 2015-05-01 a 2
# 6: 1 2015-06-01 a 2
# 7: 2 2013-03-01 a 6
# 8: 2 2013-04-01 a 2
# 9: 2 2013-05-01 g 13
#10: 2 2013-06-01 a 2
#11: 2 2013-07-01 e 8
#12: 2 2013-08-01 h 9
#13: 2 2013-09-01 h 9
#14: 2 2013-10-01 h 9

Note: It may be better to convert the 'Date' to Date class

Or as @Frank mentioned, seq.Date from base R can be used to get the sequence of 'month'

rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := seq(Date, length.out = 2, by = 'month')[2], by = ID])[order(ID)]

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>%
group_by(Species) %>%
slice_tail(n = 1)

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

How to get the last row in the table using group by with Order by DESC?

One approach uses a GROUP BY query:

SELECT tla1.*, tb.*
FROM tbl_brands tb
INNER JOIN tbl_loader_attachment tla1
ON tb.b_id = tla1.b_id
INNER JOIN
(
SELECT b_id, MAX(la_id) AS max_la_id
FROM tbl_loader_attachment
GROUP BY b_id
) tla2
ON tla1.b_id = tla2.b_id AND
tla1.la_id = tla2.max_la_id;

If you are using MySQL 8+ (or should a future reader of this question be using MySQL 8+), then another option here is to use ROW_NUMBER:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY la_id DESC) rn
FROM tbl_loader_attachment
)

SELECT tla.*, tb.*
FROM tbl_brands tb
INNER JOIN cte tla ON tb.b_id = tla.b_id
WHERE tla.rn = 1;

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

Compare last row to previous row by group and populate new column

Use the data step and lag statements. Ensure your data is sorted by group first, and that the rows within groups are sorted in the correct order. Using arrays will make your code much smaller.

The logic below will compare each row with the previous row. A flag of 1 will be set only if:

  1. It's not the first row of the group
  2. The current value differs from the previous value.

The syntax var = (test logic); is a shortcut to automatically generate dummy flags.

data want;
set have;
by group;

array var[*] name sport dogname eligibility;
array lagvar[*] $ lag_name lag_sport lag_dogname lag_eligibility;
array changeflag[*] N_change S_change D_change E_change;

do i = 1 to dim(var);
lagvar[i] = lag(var[i]);
changeflag[i] = (var[i] NE lagvar[i] AND NOT first.group);
end;

drop lag: i;
run;

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())


Related Topics



Leave a reply



Submit