Get Last Row of Each Group in R

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>%
group_by(Species) %>%
slice_tail(n = 1)

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())

How to select last N observation from each group in dplyr dataframe?

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5

2) Or use filter from dplyr

df %>% 
group_by(a) %>%
filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
group_by(a) %>%
do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))

How to flag the last row of a data frame group?

You can group_by ID and replace the last row for each ID with 0.

library(dplyr)

df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = replace(Calculate, n(), 0)) %>%
ungroup

# ID Period Value Calculate
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 10 10
#2 1 2 12 24
#3 1 3 11 0
#4 5 1 4 4
#5 5 2 6 0

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Calculating the difference between first and last row in each group

(Assuming dplyr.) Not assuming that date is guaranteed to be in order; if it is, then one could also use first(.)/last(.) for the same results. I tend to prefer not trusting order ...)

If your discount is always 0/1 and you are looking to group by contiguous same-values, then

dat %>%
group_by(discountgrp = cumsum(discount != lag(discount, default = discount[1]))) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discountgrp change
# <int> <dbl>
# 1 0 -0.871
# 2 1 -0.481

If your discount is instead a categorical value and can exceed 1, then

dat %>%
group_by(discount) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discount change
# <dbl> <dbl>
# 1 0 -0.871
# 2 1 -0.481

They happen to be the same here, but if the row order were changed such that some of the 1s occurred in the middle of 0s (for instance), then the groups would be different.

Is there an R function to choose the last row of a group of variables?

Assuming you want Start to have the lowest Serial, and End to have the highest Serial:

library(tidyverse)

df <- tribble(~Group, ~State, ~Serial,
1, "Start", 1,
1, "End", 2,
1, "End", 3,
2, "Start", 4,
2, "End", 5,
2, "End", 6,
2, "End", 7)

df %>%
group_by(Group, State) %>%
filter(if_else(State == "START", Serial == min(Serial), Serial == max(Serial))) %>%
ungroup()

# A tibble: 4 x 3
Group State Serial
<dbl> <chr> <dbl>
1 1 Start 1
2 1 End 3
3 2 Start 4
4 2 End 7


Related Topics



Leave a reply



Submit