Select the First and Last Row by Group in a Data Frame

Select first and last row from grouped data

There is probably a faster way:

df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

Select start from first row and end from last row by group in R

Here is my approach with dplyr and rle():

library(dplyr)

my.data %>%
mutate(rlid = rep(seq_along(rle(value)$values), rle(value)$lengths)) %>%
group_by(rlid, ID) %>%
summarize(value = first(value), start = min(start), end = max(end)) %>%
ungroup() %>%
select(-rlid)

Returns:

# A tibble: 5 x 4
ID value start end
<chr> <dbl> <date> <date>
1 A 1 2020-01-01 2020-05-01
2 A 2 2020-06-03 2020-07-04
3 A 1 2020-07-04 NA
4 B 2 2020-02-02 2020-04-04
5 B 3 2020-04-04 NA

(Data used)

my.data <- structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", "B"), value = c(1, 1, 1, 2, 1, 2, 2, 3, 3), start = structure(c(18262, 18322, 18353, 18416, 18447, 18294, 18324, 18356, 18387), class = "Date"), end = structure(c(18322, 18353, 18383, 18447, NA, 18324, 18356, 18387, NA), class = "Date")), class = "data.frame", row.names = c(NA, -9L))

Get only the first and last rows of each group with pandas

Use groupby, find the head and tail for each group, and concat the two.

g = df.groupby('ID')

(pd.concat([g.head(1), g.tail(1)])
.drop_duplicates()
.sort_values('ID')
.reset_index(drop=True))

Time ID X Y
0 8:00 A 23 100
1 20:00 A 35 220
2 9:00 B 24 110
3 23:00 B 38 250
4 11:00 C 26 130
5 22:00 C 37 240
6 15:00 D 30 170

If you can guarantee each ID group has at least two rows, the drop_duplicates call is not needed.


Details

g.head(1)

Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170

g.tail(1)

Time ID X Y
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250

pd.concat([g.head(1), g.tail(1)])

Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250

Select first and last row for each group and take the column value difference in MySQL?

Using a MySQL-8.0/ MariaDB-10.2+ window function:

SELECT symbol,
LAST - FIRST AS price_change
FROM
(SELECT DISTINCT symbol,
first_value(price) OVER w AS FIRST,
last_value(price) OVER w AS LAST
FROM ticks WINDOW w AS (PARTITION BY symbol
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) AS p

ref: fiddle

Comparing first and last row of groupby and creating new value

Not sure if it is effecient enough, but it works well.

def check_status(group):
selected = [False] * len(group)
selected[0] = selected[-1] = True
new_group = group[selected]
new_group['status'] = 'change' if new_group.category.is_unique else 'no change'
return new_group

print(df.groupby('email').apply(check_status).reset_index(drop=True))

Get last row of each group in R

You might try:

a %>% 
group_by(ID) %>%
arrange(NUM) %>%
slice(n())

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>%
group_by(Species) %>%
slice_tail(n = 1)

group by pandas dataframe and select latest in each group

use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]

id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01


Related Topics



Leave a reply



Submit