How to Select Last N Observation from Each Group in Dplyr Dataframe

How to select last N observation from each group in dplyr dataframe?

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5

2) Or use filter from dplyr

df %>% 
group_by(a) %>%
filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
group_by(a) %>%
do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))

Extract the first and last observation of each group, every time that group appears within a large dataset?

Here is a data.table approach. As mentioned by @Henrik in the comments, you can use rleid to create a new column to group by, instead of using station as values for station get repeated. rleid. Then, for each group, it will include the first and last .N values. Note that unique is added to consider situations when only one row of data may be present for a given group. I hope this may be a fast solution for you.

library(data.table)

setDT(df)

df[ , id := rleid(station)][ , .SD[unique(c(1, .N))], by = id]

Output

   id tagID       date station temp depth
1: 1 8272 2020-07-12 4 10 6.14
2: 1 8272 2020-07-13 4 11 21
3: 2 8272 2020-07-13 5 12 23.5
4: 2 8272 2020-07-16 5 10 15.4
5: 3 8272 2020-07-17 6 12 54
6: 3 8272 2020-07-29 6 12 23
7: 4 8272 2020-07-30 4 12 33.3
8: 4 8272 2020-08-04 4 9 32.7

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())

r - select last n occurrences for each group

You can try slice

library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01

Or in place of the last step

   do(tail(., 2))

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

New dataframe with last 6 rows per group in R

You can use slice_tail function in dplyr to get last n rows from each group. If the number of rows in a group is less than 6, it will return all the rows for that group.

library(dplyr)
test1 %>% group_by(groupa) %>% slice_tail(n = 6) %>% ungroup

# A tibble: 21 x 3
# timea groupa vara
# <dbl> <dbl> <dbl>
# 1 9 1 9
# 2 10 1 1
# 3 11 1 3
# 4 12 1 4
# 5 13 1 8
# 6 14 1 9
# 7 25 2 8
# 8 26 2 3
# 9 27 2 5
#10 28 2 8
# … with 11 more rows

Create new columns for last n rows in dataframe by group dplyr

You can use data.table shift like this:

library(data.table)
setDT(df)[, c(paste0("p1.",1:3)):=shift(p, 2:0), by=ID][]

Output:

       ID   p p1.1 p1.2 p1.3
1: 11041 2.9 NA NA 2.9
2: 11041 3.6 NA 2.9 3.6
3: 11041 4.8 2.9 3.6 4.8
4: 11041 2.6 3.6 4.8 2.6
5: 11041 2.2 4.8 2.6 2.2
6: 11041 3.9 2.6 2.2 3.9
7: 11042 6.5 NA NA 6.5
8: 11042 2.9 NA 6.5 2.9
9: 11042 1.4 6.5 2.9 1.4
10: 11063 0.7 NA NA 0.7
11: 11063 5.1 NA 0.7 5.1

One way to generalize would be as follows:

  1. Make helper function
f <- function(v,n) {
setNames(shift(v,(n-1):0), paste0(deparse(substitute(v)),".",1:n))
}

  1. Apply function by ID; here I use n=5
df[, f(p,5), by=ID]

Output:

       ID p.1 p.2 p.3 p.4 p.5
1: 11041 NA NA NA NA 2.9
2: 11041 NA NA NA 2.9 3.6
3: 11041 NA NA 2.9 3.6 4.8
4: 11041 NA 2.9 3.6 4.8 2.6
5: 11041 2.9 3.6 4.8 2.6 2.2
6: 11041 3.6 4.8 2.6 2.2 3.9
7: 11042 NA NA NA NA 6.5
8: 11042 NA NA NA 6.5 2.9
9: 11042 NA NA 6.5 2.9 1.4
10: 11063 NA NA NA NA 0.7
11: 11063 NA NA NA 0.7 5.1

dplyr - group last n row values


library(dplyr)

# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)

# specify number of rows
row_threshold = 2

df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume)) %>%
group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
summarise(volume = sum(volume),
share = sum(share))

# # A tibble: 3 x 3
# Make_upd volume share
# <chr> <dbl> <dbl>
# 1 honda 21 44.68085
# 2 others 12 25.53191
# 3 toyota 14 29.78723

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>%
group_by(Species) %>%
slice_tail(n = 1)


Related Topics



Leave a reply



Submit