How to Select Last N Observation from Each Group in Dplyr Dataframe

How to select last N observation from each group in dplyr dataframe?

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>% 
   group_by(a) %>% 
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   a [4]
#      a      b
#  <dbl>  <dbl>
#1     1  343  
#2     1   54  
#3     2   55  
#4     2   62  
#5     3   59  
#6     3   -9  
#7     4    0  
#8     4   -0.5

2) Or use filter from dplyr

df %>% 
   group_by(a) %>% 
   filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
    group_by(a) %>% 
    do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))

Extract the first and last observation of each group, every time that group appears within a large dataset?

Here is a data.table approach. As mentioned by @Henrik in the comments, you can use rleid to create a new column to group by, instead of using station as values for station get repeated. rleid. Then, for each group, it will include the first and last .N values. Note that unique is added to consider situations when only one row of data may be present for a given group. I hope this may be a fast solution for you.

library(data.table)

setDT(df)

df[ , id := rleid(station)][ , .SD[unique(c(1, .N))], by = id]

Output

   id tagID       date station temp depth
1:  1  8272 2020-07-12       4   10  6.14
2:  1  8272 2020-07-13       4   11    21
3:  2  8272 2020-07-13       5   12  23.5
4:  2  8272 2020-07-16       5   10  15.4
5:  3  8272 2020-07-17       6   12    54
6:  3  8272 2020-07-29       6   12    23
7:  4  8272 2020-07-30       4   12  33.3
8:  4  8272 2020-08-04       4    9  32.7

Select first and last row from grouped data

There is probably a faster way:

df %>%
  group_by(id) %>%
  arrange(stopSequence) %>%
  filter(row_number()==1 | row_number()==n())

r - select last n occurrences for each group

You can try slice

library(dplyr)
df %>% 
    group_by(person) %>% 
    arrange(date, person) %>%
    slice((n()-1):n())
#  person       date
#1     pA 2014-03-01
#2     pA 2014-04-01
#3     pB 2014-02-01
#4     pB 2014-03-01
#5     pC 2014-01-01
#6     pC 2014-02-01

Or in place of the last step

   do(tail(., 2))

Get last row of each group in R

You might try:

a %>% 
  group_by(ID) %>% 
  arrange(NUM) %>%  
  slice(n())

New dataframe with last 6 rows per group in R

You can use slice_tail function in dplyr to get last n rows from each group. If the number of rows in a group is less than 6, it will return all the rows for that group.

library(dplyr)
test1 %>% group_by(groupa) %>% slice_tail(n = 6) %>%  ungroup

# A tibble: 21 x 3
#   timea groupa  vara
#   <dbl>  <dbl> <dbl>
# 1     9      1     9
# 2    10      1     1
# 3    11      1     3
# 4    12      1     4
# 5    13      1     8
# 6    14      1     9
# 7    25      2     8
# 8    26      2     3
# 9    27      2     5
#10    28      2     8
# … with 11 more rows

Create new columns for last n rows in dataframe by group dplyr

You can use data.table shift like this:

library(data.table)
setDT(df)[, c(paste0("p1.",1:3)):=shift(p, 2:0), by=ID][]

Output:

       ID   p p1.1 p1.2 p1.3
 1: 11041 2.9   NA   NA  2.9
 2: 11041 3.6   NA  2.9  3.6
 3: 11041 4.8  2.9  3.6  4.8
 4: 11041 2.6  3.6  4.8  2.6
 5: 11041 2.2  4.8  2.6  2.2
 6: 11041 3.9  2.6  2.2  3.9
 7: 11042 6.5   NA   NA  6.5
 8: 11042 2.9   NA  6.5  2.9
 9: 11042 1.4  6.5  2.9  1.4
10: 11063 0.7   NA   NA  0.7
11: 11063 5.1   NA  0.7  5.1

One way to generalize would be as follows:

Make helper function

f <- function(v,n) {
  setNames(shift(v,(n-1):0), paste0(deparse(substitute(v)),".",1:n))
}

Apply function by ID; here I use n=5

df[, f(p,5), by=ID]

Output:

       ID p.1 p.2 p.3 p.4 p.5
 1: 11041  NA  NA  NA  NA 2.9
 2: 11041  NA  NA  NA 2.9 3.6
 3: 11041  NA  NA 2.9 3.6 4.8
 4: 11041  NA 2.9 3.6 4.8 2.6
 5: 11041 2.9 3.6 4.8 2.6 2.2
 6: 11041 3.6 4.8 2.6 2.2 3.9
 7: 11042  NA  NA  NA  NA 6.5
 8: 11042  NA  NA  NA 6.5 2.9
 9: 11042  NA  NA 6.5 2.9 1.4
10: 11063  NA  NA  NA  NA 0.7
11: 11063  NA  NA  NA 0.7 5.1

dplyr - group last n row values

library(dplyr)

# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
                 Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
                 Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)

# specify number of rows
row_threshold = 2

df %>%
  group_by(Make) %>%
  summarise(volume = sum(Cnt)) %>%
  mutate(share=volume/sum(volume)*100.0) %>%
  arrange(desc(volume)) %>%
  group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
  summarise(volume = sum(volume),
            share = sum(share))

# # A tibble: 3 x 3
#   Make_upd volume    share
#      <chr>  <dbl>    <dbl>
# 1    honda     21 44.68085
# 2   others     12 25.53191
# 3   toyota     14 29.78723

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>% 
    group_by(Species) %>% 
    slice_tail(n = 1)

How to Select Last N Observation from Each Group in Dplyr Dataframe