Get last row of each group in R
You might try:
a %>%
group_by(ID) %>%
arrange(NUM) %>%
slice(n())
R: get last row of each group in dataframe
Package dplyr
has a nice function for doing this.
library(tidyverse)
iris %>%
group_by(Species) %>%
slice_tail(n = 1)
Select first and last row from grouped data
There is probably a faster way:
df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())
How to select last N observation from each group in dplyr dataframe?
As it is a specific question based on dplyr
1) after the group_by
, use slice
on the row_number()
library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5
2) Or use filter
from dplyr
df %>%
group_by(a) %>%
filter(row_number() >= (n() - 1))
3) or with do
and tail
df %>%
group_by(a) %>%
do(tail(., 2))
4) In addition to the tidyverse
, methods, we can also use compact data.table
library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]
5) Or by
from base R
by(df, df$a, FUN = tail, 2)
6) or with aggregate
from base R
df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]
7) or with split
from base R
do.call(rbind, lapply(split(df, df$a), tail, 2))
How to flag the last row of a data frame group?
You can group_by
ID
and replace the last row for each ID
with 0.
library(dplyr)
df %>%
mutate(Calculate = Period * Value) %>%
group_by(ID) %>%
mutate(Calculate = replace(Calculate, n(), 0)) %>%
ungroup
# ID Period Value Calculate
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 10 10
#2 1 2 12 24
#3 1 3 11 0
#4 5 1 4 4
#5 5 2 6 0
Select the first and last row by group in a data frame
A plyr solution (tmp
is your data frame):
library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
Or with dplyr (see also here):
library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
Calculating the difference between first and last row in each group
(Assuming dplyr
.) Not assuming that date
is guaranteed to be in order; if it is, then one could also use first(.)
/last(.)
for the same results. I tend to prefer not trusting order ...)
If your discount
is always 0/1 and you are looking to group by contiguous same-values, then
dat %>%
group_by(discountgrp = cumsum(discount != lag(discount, default = discount[1]))) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discountgrp change
# <int> <dbl>
# 1 0 -0.871
# 2 1 -0.481
If your discount
is instead a categorical value and can exceed 1, then
dat %>%
group_by(discount) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discount change
# <dbl> <dbl>
# 1 0 -0.871
# 2 1 -0.481
They happen to be the same here, but if the row order were changed such that some of the 1
s occurred in the middle of 0
s (for instance), then the groups would be different.
Is there an R function to choose the last row of a group of variables?
Assuming you want Start to have the lowest Serial, and End to have the highest Serial:
library(tidyverse)
df <- tribble(~Group, ~State, ~Serial,
1, "Start", 1,
1, "End", 2,
1, "End", 3,
2, "Start", 4,
2, "End", 5,
2, "End", 6,
2, "End", 7)
df %>%
group_by(Group, State) %>%
filter(if_else(State == "START", Serial == min(Serial), Serial == max(Serial))) %>%
ungroup()
# A tibble: 4 x 3
Group State Serial
<dbl> <chr> <dbl>
1 1 Start 1
2 1 End 3
3 2 Start 4
4 2 End 7
Related Topics
Error in If/While (Condition) {:Argument Is of Length Zero
R Suppress Startupmessages from Dependency
Draw the Sum Value Above the Stacked Bar in Ggplot2
Cut Function in R- Labeling Without Scientific Notations for Use in Ggplot2
Select Only the First Row When Merging Data Frames with Multiple Matches
How Do Keep Only Unique Words Within Each String in a Vector
How to Insert New Line in R Shiny String
Dynamically Build Call for Lookup Multiple Columns
Center X and Y Axis with Ggplot2
Cannot Install an R Package from Github
Remove Everything After Space in String
Get Last Row of Each Group in R
How to Spread Columns with Duplicate Identifiers
Converting Geo Coordinates from Degree to Decimal
How to Install Development Version of R Packages Github Repository