Select Last Row by Group for All Columns Data.Table

Select last row by group for all columns data.table

Last row by group :

DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"]            # (1)

or, faster (avoid use of .SD where possible, for speed) :

w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]]    # (2)
DT[w]

Note that the following feature request will make approach (1) as fast as approach (2) :

FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.

Get the last row of a previous group in data.table

You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1: A 1 NA NA
2: B 1 NA NA
3: C 2 B B
4: D 2 B B
5: E 2 B B
6: F 3 E E
7: G 3 E E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • shift(...) shifts the character vector in the previous call
  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>%
group_by(Species) %>%
slice_tail(n = 1)

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

How to get the last row in the table using group by with Order by DESC?

One approach uses a GROUP BY query:

SELECT tla1.*, tb.*
FROM tbl_brands tb
INNER JOIN tbl_loader_attachment tla1
ON tb.b_id = tla1.b_id
INNER JOIN
(
SELECT b_id, MAX(la_id) AS max_la_id
FROM tbl_loader_attachment
GROUP BY b_id
) tla2
ON tla1.b_id = tla2.b_id AND
tla1.la_id = tla2.max_la_id;

If you are using MySQL 8+ (or should a future reader of this question be using MySQL 8+), then another option here is to use ROW_NUMBER:

WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY la_id DESC) rn
FROM tbl_loader_attachment
)

SELECT tla.*, tb.*
FROM tbl_brands tb
INNER JOIN cte tla ON tb.b_id = tla.b_id
WHERE tla.rn = 1;

Duplicate last row of every group in a data.table and change the value of one column

We can extract the last row and rbind with the dataset

library(data.table)
library(lubridate)
rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := Date %m+% months(1)])[order(ID)]
# ID Date v3 v4
# 1: 1 2015-01-01 a 5
# 2: 1 2015-02-01 b 5
# 3: 1 2015-03-01 f 1
# 4: 1 2015-04-01 z 5
# 5: 1 2015-05-01 a 2
# 6: 1 2015-06-01 a 2
# 7: 2 2013-03-01 a 6
# 8: 2 2013-04-01 a 2
# 9: 2 2013-05-01 g 13
#10: 2 2013-06-01 a 2
#11: 2 2013-07-01 e 8
#12: 2 2013-08-01 h 9
#13: 2 2013-09-01 h 9
#14: 2 2013-10-01 h 9

Note: It may be better to convert the 'Date' to Date class

Or as @Frank mentioned, seq.Date from base R can be used to get the sequence of 'month'

rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := seq(Date, length.out = 2, by = 'month')[2], by = ID])[order(ID)]

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())


Related Topics



Leave a reply



Submit