Select last row by group for all columns data.table
Last row by group :
DT[, .SD[.N], by="TRADER_ID,EXEC_IDATE"] # (1)
or, faster (avoid use of .SD
where possible, for speed) :
w = DT[, .I[.N], by="TRADER_ID,EXEC_IDATE"][[3]] # (2)
DT[w]
Note that the following feature request will make approach (1) as fast as approach (2) :
FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged.
Get the last row of a previous group in data.table
You could do
dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]
This results in the following updated dt
, where newcol
matches your desired column with the unnecessarily long name. ;)
Product Group LastProductOfPriorGroup newcol
1: A 1 NA NA
2: B 1 NA NA
3: C 2 B B
4: D 2 B B
5: E 2 B B
6: F 3 E E
7: G 3 E E
Let's break the code down from the inside out. I will use ...
to denote the accumulated code:
dt[, last(Product), by = Group]$V1
is getting the last values from each group as a character vector.shift(...)
shifts the character vector in the previous calldt[, newcol := ...[.GRP], by = Group]
groups byGroup
and uses the internal.GRP
values for indexing
Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either
shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]
so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.
dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]
Select the first and last row by group in a data frame
A plyr solution (tmp
is your data frame):
library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
Or with dplyr (see also here):
library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
R: get last row of each group in dataframe
Package dplyr
has a nice function for doing this.
library(tidyverse)
iris %>%
group_by(Species) %>%
slice_tail(n = 1)
Get last row of each group in R
You might try:
a %>%
group_by(ID) %>%
arrange(NUM) %>%
slice(n())
How to get the last row in the table using group by with Order by DESC?
One approach uses a GROUP BY
query:
SELECT tla1.*, tb.*
FROM tbl_brands tb
INNER JOIN tbl_loader_attachment tla1
ON tb.b_id = tla1.b_id
INNER JOIN
(
SELECT b_id, MAX(la_id) AS max_la_id
FROM tbl_loader_attachment
GROUP BY b_id
) tla2
ON tla1.b_id = tla2.b_id AND
tla1.la_id = tla2.max_la_id;
If you are using MySQL 8+ (or should a future reader of this question be using MySQL 8+), then another option here is to use ROW_NUMBER
:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY la_id DESC) rn
FROM tbl_loader_attachment
)
SELECT tla.*, tb.*
FROM tbl_brands tb
INNER JOIN cte tla ON tb.b_id = tla.b_id
WHERE tla.rn = 1;
Duplicate last row of every group in a data.table and change the value of one column
We can extract the last row and rbind with the dataset
library(data.table)
library(lubridate)
rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := Date %m+% months(1)])[order(ID)]
# ID Date v3 v4
# 1: 1 2015-01-01 a 5
# 2: 1 2015-02-01 b 5
# 3: 1 2015-03-01 f 1
# 4: 1 2015-04-01 z 5
# 5: 1 2015-05-01 a 2
# 6: 1 2015-06-01 a 2
# 7: 2 2013-03-01 a 6
# 8: 2 2013-04-01 a 2
# 9: 2 2013-05-01 g 13
#10: 2 2013-06-01 a 2
#11: 2 2013-07-01 e 8
#12: 2 2013-08-01 h 9
#13: 2 2013-09-01 h 9
#14: 2 2013-10-01 h 9
Note: It may be better to convert the 'Date' to Date
class
Or as @Frank mentioned, seq.Date
from base R
can be used to get the sequence of 'month'
rbind(setDT(df1), df1[, Date := as.Date(Date, "%Y.%m.%d")][,
.SD[.N], ID][, Date := seq(Date, length.out = 2, by = 'month')[2], by = ID])[order(ID)]
Select first and last row from grouped data
There is probably a faster way:
df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())
Related Topics
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
How to Create a Line Plot with Groups in Base R Without Loops
How to Add Axis Text in This Negative and Positive Bars Differently Using Ggplot2
How to Remove Na Data in Only One Columns
Installing Rcppeigen on Amazon Ec2
Installing R on Osx Big Sur (Edit: and Apple M1) for Use with Rcpp and Openmp
Replace Nas in One Variable with Values from Another Variable
Generate All Combinations, of All Lengths, in R, from a Vector
Extract Name of Data.Frame in R as Character
Draw Multiple Squares with Ggplot
Plot Multiple Datasets with Ggplot
R 3.5 Is Not Available for Linux
Rename Columns in Multiple Dataframes, R