Get First and Last Values Per Group - Dplyr Group_By with Last() and First()

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())

Use dplyr to take first and last row in a sequence by group

I think that slice should get you close:

df %>%
group_by(A,C) %>%
slice(c(1, n()))

gives

      A     B     C
<chr> <int> <dbl>
1 a 2 0
2 a 6 0
3 a 1 1
4 a 10 1
5 b 1 0
6 b 9 0
7 b 4 1
8 b 10 1

though this doesn't quite match your expected outcome. n() gives the last row in the group.

After your edit it is clear that you are not looking for the values within any group that is established (which is what my previous version did). You want to group by those runs of 1's or 0's. For that, you will need to create a column that checks whether or not the run of 1's/0's has changed and then one to identify the groups. Then, slice will work as described before. However, because some of your runs are only 1 row long, we need to only include n() if it is more than 1 (otherwise the 1 row shows up twice).

df %>%
mutate(groupChanged = (C != lag(C, default = C[1]))
, toCutBy = cumsum(groupChanged)
) %>%
group_by(toCutBy) %>%
slice(c(1, ifelse(n() == 1, NA, n())))

Gives

       A     B     C groupChanged toCutBy
<chr> <int> <dbl> <lgl> <int>
1 a 1 1 FALSE 0
2 a 2 0 TRUE 1
3 a 6 0 FALSE 1
4 a 7 1 TRUE 2
5 a 10 1 FALSE 2
6 b 1 0 TRUE 3
7 b 3 0 FALSE 3
8 b 4 1 TRUE 4
9 b 5 0 TRUE 5
10 b 9 0 FALSE 5
11 b 10 1 TRUE 6

If the runs of 1 or 0 must stay within the level in column A, you also need to add a check for a change in column A to the call. In this example, it does not have an effect (so returns exactly the same values), but it may be desirable in other instances.

df %>%
mutate(groupChanged = (C != lag(C, default = C[1]) |
A != lag(A, default = A[1]))
, toCutBy = cumsum(groupChanged)
) %>%
group_by(toCutBy) %>%
slice(c(1, ifelse(n() == 1, NA, n())))

How to select last N observation from each group in dplyr dataframe?

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5

2) Or use filter from dplyr

df %>% 
group_by(a) %>%
filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
group_by(a) %>%
do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))

paste together first and last value by group

You could do:

library(dplyr)

df %>%
group_by(group) %>%
mutate(
Key = paste(link[1], link[n()], sep = ", ")
)

Though that wouldn't match your desired output. In your example data frame, you have e.g. the group 91 where there's only 1 value. The above code would give you 9|10 repeatedly both as beginning and end.

If you'd like to only display one value in such cases, you can do:

df %>%
group_by(group) %>%
mutate(
Key = case_when(
n() > 1 ~ paste(link[1], link[n()], sep = ", "),
TRUE ~ as.character(link)
)
)

Calculating the difference between first and last row in each group

(Assuming dplyr.) Not assuming that date is guaranteed to be in order; if it is, then one could also use first(.)/last(.) for the same results. I tend to prefer not trusting order ...)

If your discount is always 0/1 and you are looking to group by contiguous same-values, then

dat %>%
group_by(discountgrp = cumsum(discount != lag(discount, default = discount[1]))) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discountgrp change
# <int> <dbl>
# 1 0 -0.871
# 2 1 -0.481

If your discount is instead a categorical value and can exceed 1, then

dat %>%
group_by(discount) %>%
summarize(change = price[which.max(date)] - price[which.min(date)])
# # A tibble: 2 x 2
# discount change
# <dbl> <dbl>
# 1 0 -0.871
# 2 1 -0.481

They happen to be the same here, but if the row order were changed such that some of the 1s occurred in the middle of 0s (for instance), then the groups would be different.

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Extract the first and last observation of each group, every time that group appears within a large dataset?

Here is a data.table approach. As mentioned by @Henrik in the comments, you can use rleid to create a new column to group by, instead of using station as values for station get repeated. rleid. Then, for each group, it will include the first and last .N values. Note that unique is added to consider situations when only one row of data may be present for a given group. I hope this may be a fast solution for you.

library(data.table)

setDT(df)

df[ , id := rleid(station)][ , .SD[unique(c(1, .N))], by = id]

Output

   id tagID       date station temp depth
1: 1 8272 2020-07-12 4 10 6.14
2: 1 8272 2020-07-13 4 11 21
3: 2 8272 2020-07-13 5 12 23.5
4: 2 8272 2020-07-16 5 10 15.4
5: 3 8272 2020-07-17 6 12 54
6: 3 8272 2020-07-29 6 12 23
7: 4 8272 2020-07-30 4 12 33.3
8: 4 8272 2020-08-04 4 9 32.7


Related Topics



Leave a reply



Submit