Select first and last row from grouped data
There is probably a faster way:
df %>%
group_by(id) %>%
arrange(stopSequence) %>%
filter(row_number()==1 | row_number()==n())
Select the first and last row by group in a data frame
A plyr solution (tmp
is your data frame):
library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
Or with dplyr (see also here):
library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
Select start from first row and end from last row by group in R
Here is my approach with dplyr
and rle()
:
library(dplyr)
my.data %>%
mutate(rlid = rep(seq_along(rle(value)$values), rle(value)$lengths)) %>%
group_by(rlid, ID) %>%
summarize(value = first(value), start = min(start), end = max(end)) %>%
ungroup() %>%
select(-rlid)
Returns:
# A tibble: 5 x 4
ID value start end
<chr> <dbl> <date> <date>
1 A 1 2020-01-01 2020-05-01
2 A 2 2020-06-03 2020-07-04
3 A 1 2020-07-04 NA
4 B 2 2020-02-02 2020-04-04
5 B 3 2020-04-04 NA
(Data used)
my.data <- structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", "B"), value = c(1, 1, 1, 2, 1, 2, 2, 3, 3), start = structure(c(18262, 18322, 18353, 18416, 18447, 18294, 18324, 18356, 18387), class = "Date"), end = structure(c(18322, 18353, 18383, 18447, NA, 18324, 18356, 18387, NA), class = "Date")), class = "data.frame", row.names = c(NA, -9L))
Get only the first and last rows of each group with pandas
Use groupby
, find the head
and tail
for each group, and concat
the two.
g = df.groupby('ID')
(pd.concat([g.head(1), g.tail(1)])
.drop_duplicates()
.sort_values('ID')
.reset_index(drop=True))
Time ID X Y
0 8:00 A 23 100
1 20:00 A 35 220
2 9:00 B 24 110
3 23:00 B 38 250
4 11:00 C 26 130
5 22:00 C 37 240
6 15:00 D 30 170
If you can guarantee each ID group has at least two rows, the drop_duplicates
call is not needed.
Details
g.head(1)
Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170
g.tail(1)
Time ID X Y
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250
pd.concat([g.head(1), g.tail(1)])
Time ID X Y
0 8:00 A 23 100
1 9:00 B 24 110
3 11:00 C 26 130
7 15:00 D 30 170
7 15:00 D 30 170
12 20:00 A 35 220
14 22:00 C 37 240
15 23:00 B 38 250
Select first and last row for each group and take the column value difference in MySQL?
Using a MySQL-8.0/ MariaDB-10.2+ window function:
SELECT symbol,
LAST - FIRST AS price_change
FROM
(SELECT DISTINCT symbol,
first_value(price) OVER w AS FIRST,
last_value(price) OVER w AS LAST
FROM ticks WINDOW w AS (PARTITION BY symbol
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) AS p
ref: fiddle
Comparing first and last row of groupby and creating new value
Not sure if it is effecient enough, but it works well.
def check_status(group):
selected = [False] * len(group)
selected[0] = selected[-1] = True
new_group = group[selected]
new_group['status'] = 'change' if new_group.category.is_unique else 'no change'
return new_group
print(df.groupby('email').apply(check_status).reset_index(drop=True))
Get last row of each group in R
You might try:
a %>%
group_by(ID) %>%
arrange(NUM) %>%
slice(n())
R: get last row of each group in dataframe
Package dplyr
has a nice function for doing this.
library(tidyverse)
iris %>%
group_by(Species) %>%
slice_tail(n = 1)
group by pandas dataframe and select latest in each group
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()]
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
Related Topics
How to Add a Suffix (Or Prefix) Elements of an Existing List
Converting Data Frame into a List of Lists in R
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Remove Ids With Fewer Than 9 Unique Observations
Remove Space Between Plotted Data and the Axes
Multi-Row X-Axis Labels in Ggplot Line Chart
How to Force a Line Break in Rmarkdown'S Title
Remove Last N Rows in Data Frame With the Arbitrary Number of Rows
R: How to Get the Percentage Change from Two Different Columns
Filter a Data Frame According to Minimum and Maximum Values
Interpreting "Condition Has Length ≫ 1" Warning from 'If' Function
Reorder Bars in Geom_Bar Ggplot2 by Value
In R, How to Get an Object'S Name After It Is Sent to a Function
Apply a Function to Every Specified Column in a Data.Table and Update by Reference