Select Last Row by Group for All Columns Data.Table

Select last row by group for all columns data.table

Last row by group :

DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"]            # (1)

or, faster (avoid use of .SD where possible, for speed) :

w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]]    # (2)
DT[w]

Note that the following feature request will make approach (1) as fast as approach (2) :

FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.

Get the last row of a previous group in data.table

You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
shift(...) shifts the character vector in the previous call
dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
#    id d gr  mm area
# 1  15 1  2 3.4    1
# 2  15 1  1 5.5    2
# 3  21 1  1 4.0    2
# 4  21 1  2 3.8    2
# 5  22 1  1 4.0    2
# 6  22 1  2 4.6    2
# 7  23 1  1 2.7    2
# 8  23 1  2 3.0    2
# 9  24 1  1 3.0    2
# 10 24 1  2 2.0    3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
#       id     d    gr    mm  area
#    <int> <int> <int> <dbl> <int>
# 1     15     1     2   3.4     1
# 2     15     1     1   5.5     2
# 3     21     1     1   4.0     2
# 4     21     1     2   3.8     2
# 5     22     1     1   4.0     2
# 6     22     1     2   4.6     2
# 7     23     1     1   2.7     2
# 8     23     1     2   3.0     2
# 9     24     1     1   3.0     2
# 10    24     1     2   2.0     3

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>% 
    group_by(Species) %>% 
    slice_tail(n = 1)

Get last row of each group in R

You might try:

a %>% 
  group_by(ID) %>% 
  arrange(NUM) %>%  
  slice(n())

How to get the last row in the table using group by with Order by DESC?

One approach uses a GROUP BY query:

SELECT tla1.*, tb.*
FROM tbl_brands tb
INNER JOIN tbl_loader_attachment tla1
    ON tb.b_id = tla1.b_id
INNER JOIN
(
    SELECT b_id, MAX(la_id) AS max_la_id
    FROM tbl_loader_attachment
    GROUP BY b_id
) tla2
    ON tla1.b_id = tla2.b_id AND
       tla1.la_id = tla2.max_la_id;

If you are using MySQL 8+ (or should a future reader of this question be using MySQL 8+), then another option here is to use ROW_NUMBER:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY la_id DESC) rn
    FROM tbl_loader_attachment
)

SELECT tla.*, tb.*
FROM tbl_brands tb
INNER JOIN cte tla ON tb.b_id = tla.b_id
WHERE tla.rn = 1;

Duplicate last row of every group in a data.table and change the value of one column

We can extract the last row and rbind with the dataset

library(data.table)
library(lubridate)
rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
       .SD[.N], ID][, Date := Date %m+% months(1)])[order(ID)]
#    ID       Date v3 v4
# 1:  1 2015-01-01  a  5
# 2:  1 2015-02-01  b  5
# 3:  1 2015-03-01  f  1
# 4:  1 2015-04-01  z  5
# 5:  1 2015-05-01  a  2
# 6:  1 2015-06-01  a  2
# 7:  2 2013-03-01  a  6
# 8:  2 2013-04-01  a  2
# 9:  2 2013-05-01  g 13
#10:  2 2013-06-01  a  2
#11:  2 2013-07-01  e  8
#12:  2 2013-08-01  h  9
#13:  2 2013-09-01  h  9
#14:  2 2013-10-01  h  9

Note: It may be better to convert the 'Date' to Date class

Or as @Frank mentioned, seq.Date from base R can be used to get the sequence of 'month'

rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
     .SD[.N], ID][, Date := seq(Date, length.out = 2, by = 'month')[2], by = ID])[order(ID)]

Select first and last row from grouped data

There is probably a faster way:

df %>%
  group_by(id) %>%
  arrange(stopSequence) %>%
  filter(row_number()==1 | row_number()==n())

Select Last Row by Group for All Columns Data.Table