How to select last N observation from each group in dplyr dataframe?
As it is a specific question based on dplyr
1) after the group_by
, use slice
on the row_number()
library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5
2) Or use filter
from dplyr
df %>%
group_by(a) %>%
filter(row_number() >= (n() - 1))
3) or with do
and tail
df %>%
group_by(a) %>%
do(tail(., 2))
4) In addition to the tidyverse
, methods, we can also use compact data.table
library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]
5) Or by
from base R
by(df, df$a, FUN = tail, 2)
6) or with aggregate
from base R
df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]
7) or with split
from base R
do.call(rbind, lapply(split(df, df$a), tail, 2))
Extract the first and last observation of each group, every time that group appears within a large dataset?
Here is a data.table
approach. As mentioned by @Henrik in the comments, you can use rleid
to create a new column to group by, instead of using station
as values for station
get repeated. rleid
. Then, for each group, it will include the first and last .N
values. Note that unique
is added to consider situations when only one row of data may be present for a given group. I hope this may be a fast solution for you.
library(data.table)
setDT(df)
df[ , id := rleid(station)][ , .SD[unique(c(1, .N))], by = id]
Output
id tagID date station temp depth
1: 1 8272 2020-07-12 4 10 6.14
2: 1 8272 2020-07-13 4 11 21
3: 2 8272 2020-07-13 5 12 23.5
4: 2 8272 2020-07-16 5 10 15.4
5: 3 8272 2020-07-17 6 12 54
6: 3 8272 2020-07-29 6 12 23
7: 4 8272 2020-07-30 4 12 33.3
8: 4 8272 2020-08-04 4 9 32.7
Select first and last row from grouped data
There is probably a faster way:
df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())
r - select last n occurrences for each group
You can try slice
library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
Or in place of the last step
do(tail(., 2))
Get last row of each group in R
You might try:
a %>%
group_by(ID) %>%
arrange(NUM) %>%
slice(n())
New dataframe with last 6 rows per group in R
You can use slice_tail
function in dplyr
to get last n
rows from each group. If the number of rows in a group is less than 6, it will return all the rows for that group.
library(dplyr)
test1 %>% group_by(groupa) %>% slice_tail(n = 6) %>% ungroup
# A tibble: 21 x 3
# timea groupa vara
# <dbl> <dbl> <dbl>
# 1 9 1 9
# 2 10 1 1
# 3 11 1 3
# 4 12 1 4
# 5 13 1 8
# 6 14 1 9
# 7 25 2 8
# 8 26 2 3
# 9 27 2 5
#10 28 2 8
# … with 11 more rows
Create new columns for last n rows in dataframe by group dplyr
You can use data.table shift
like this:
library(data.table)
setDT(df)[, c(paste0("p1.",1:3)):=shift(p, 2:0), by=ID][]
Output:
ID p p1.1 p1.2 p1.3
1: 11041 2.9 NA NA 2.9
2: 11041 3.6 NA 2.9 3.6
3: 11041 4.8 2.9 3.6 4.8
4: 11041 2.6 3.6 4.8 2.6
5: 11041 2.2 4.8 2.6 2.2
6: 11041 3.9 2.6 2.2 3.9
7: 11042 6.5 NA NA 6.5
8: 11042 2.9 NA 6.5 2.9
9: 11042 1.4 6.5 2.9 1.4
10: 11063 0.7 NA NA 0.7
11: 11063 5.1 NA 0.7 5.1
One way to generalize would be as follows:
- Make helper function
f <- function(v,n) {
setNames(shift(v,(n-1):0), paste0(deparse(substitute(v)),".",1:n))
}
- Apply function by ID; here I use
n=5
df[, f(p,5), by=ID]
Output:
ID p.1 p.2 p.3 p.4 p.5
1: 11041 NA NA NA NA 2.9
2: 11041 NA NA NA 2.9 3.6
3: 11041 NA NA 2.9 3.6 4.8
4: 11041 NA 2.9 3.6 4.8 2.6
5: 11041 2.9 3.6 4.8 2.6 2.2
6: 11041 3.6 4.8 2.6 2.2 3.9
7: 11042 NA NA NA NA 6.5
8: 11042 NA NA NA 6.5 2.9
9: 11042 NA NA 6.5 2.9 1.4
10: 11063 NA NA NA NA 0.7
11: 11063 NA NA NA 0.7 5.1
dplyr - group last n row values
library(dplyr)
# example data
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1), stringsAsFactors = F)
# specify number of rows
row_threshold = 2
df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume)) %>%
group_by(Make_upd = ifelse(row_number() > row_threshold, "others", Make)) %>%
summarise(volume = sum(volume),
share = sum(share))
# # A tibble: 3 x 3
# Make_upd volume share
# <chr> <dbl> <dbl>
# 1 honda 21 44.68085
# 2 others 12 25.53191
# 3 toyota 14 29.78723
R: get last row of each group in dataframe
Package dplyr
has a nice function for doing this.
library(tidyverse)
iris %>%
group_by(Species) %>%
slice_tail(n = 1)
Related Topics
Adding New Columns to a Data.Table By-Reference Within a Function Not Always Working
Detach All Packages While Working in R
Removing Display of Row Names from Data Frame
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Exact Number of Bins in Histogram in R
Cumulative Count of Each Value
What's the Difference Between Lapply and Do.Call
Geom_Text How to Position the Text on Bar as I Want
How to Group My Date Variable into Month/Year in R
Export a Graph to .Eps File with R
Duplicate 'Row.Names' Are Not Allowed Error
Same Function Over Multiple Data Frames in R
Setting Function Defaults R on a Project Specific Basis
R Error "Sum Not Meaningful for Factors"
Cumulative Sum Until Maximum Reached, Then Repeat from Zero in the Next Row