Select the First and Last Row by Group in a Data Frame

Select first and last row from grouped data

There is probably a faster way:

df %>%
  group_by(id) %>%
  arrange(stopSequence) %>%
  filter(row_number()==1 | row_number()==n())

Select the first and last row by group in a data frame

A plyr solution (tmp is your data frame):

library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
#    id d gr  mm area
# 1  15 1  2 3.4    1
# 2  15 1  1 5.5    2
# 3  21 1  1 4.0    2
# 4  21 1  2 3.8    2
# 5  22 1  1 4.0    2
# 6  22 1  2 4.6    2
# 7  23 1  1 2.7    2
# 8  23 1  2 3.0    2
# 9  24 1  1 3.0    2
# 10 24 1  2 2.0    3

Or with dplyr (see also here):

library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
#       id     d    gr    mm  area
#    <int> <int> <int> <dbl> <int>
# 1     15     1     2   3.4     1
# 2     15     1     1   5.5     2
# 3     21     1     1   4.0     2
# 4     21     1     2   3.8     2
# 5     22     1     1   4.0     2
# 6     22     1     2   4.6     2
# 7     23     1     1   2.7     2
# 8     23     1     2   3.0     2
# 9     24     1     1   3.0     2
# 10    24     1     2   2.0     3

Select start from first row and end from last row by group in R

Here is my approach with dplyr and rle():

library(dplyr)

my.data %>%
  mutate(rlid = rep(seq_along(rle(value)$values), rle(value)$lengths)) %>%
  group_by(rlid, ID) %>%
  summarize(value = first(value), start = min(start), end = max(end)) %>%
  ungroup() %>%
  select(-rlid)

Returns:

# A tibble: 5 x 4
  ID    value start      end       
  <chr> <dbl> <date>     <date>    
1 A         1 2020-01-01 2020-05-01
2 A         2 2020-06-03 2020-07-04
3 A         1 2020-07-04 NA        
4 B         2 2020-02-02 2020-04-04
5 B         3 2020-04-04 NA

(Data used)

my.data <- structure(list(ID = c("A", "A", "A", "A", "A", "B", "B", "B", "B"), value = c(1, 1, 1, 2, 1, 2, 2, 3, 3), start = structure(c(18262, 18322, 18353, 18416, 18447, 18294, 18324, 18356, 18387), class = "Date"), end = structure(c(18322, 18353, 18383, 18447, NA, 18324, 18356, 18387, NA), class = "Date")), class = "data.frame", row.names = c(NA, -9L))

Get only the first and last rows of each group with pandas

Use groupby, find the head and tail for each group, and concat the two.

g = df.groupby('ID')

(pd.concat([g.head(1), g.tail(1)])
   .drop_duplicates()
   .sort_values('ID')
   .reset_index(drop=True))

    Time ID   X    Y
0   8:00  A  23  100
1  20:00  A  35  220
2   9:00  B  24  110
3  23:00  B  38  250
4  11:00  C  26  130
5  22:00  C  37  240
6  15:00  D  30  170

If you can guarantee each ID group has at least two rows, the drop_duplicates call is not needed.

Details

g.head(1)

    Time ID   X    Y
0   8:00  A  23  100
1   9:00  B  24  110
3  11:00  C  26  130
7  15:00  D  30  170

g.tail(1)

     Time ID   X    Y
7   15:00  D  30  170
12  20:00  A  35  220
14  22:00  C  37  240
15  23:00  B  38  250

pd.concat([g.head(1), g.tail(1)])

     Time ID   X    Y
0    8:00  A  23  100
1    9:00  B  24  110
3   11:00  C  26  130
7   15:00  D  30  170
7   15:00  D  30  170
12  20:00  A  35  220
14  22:00  C  37  240
15  23:00  B  38  250

Select first and last row for each group and take the column value difference in MySQL?

Using a MySQL-8.0/ MariaDB-10.2+ window function:

SELECT symbol,
       LAST - FIRST AS price_change
FROM
  (SELECT DISTINCT symbol,
                   first_value(price) OVER w AS FIRST,
                   last_value(price) OVER w AS LAST
   FROM ticks WINDOW w AS (PARTITION BY symbol
                           ORDER BY date
                           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
   ) AS p

ref: fiddle

Comparing first and last row of groupby and creating new value

Not sure if it is effecient enough, but it works well.

def check_status(group):
    selected = [False] * len(group)
    selected[0] = selected[-1] = True
    new_group = group[selected]
    new_group['status'] = 'change' if new_group.category.is_unique else 'no change'
    return new_group

print(df.groupby('email').apply(check_status).reset_index(drop=True))

Get last row of each group in R

You might try:

a %>% 
  group_by(ID) %>% 
  arrange(NUM) %>%  
  slice(n())

R: get last row of each group in dataframe

Package dplyr has a nice function for doing this.

library(tidyverse)

iris %>% 
    group_by(Species) %>% 
    slice_tail(n = 1)

group by pandas dataframe and select latest in each group

use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]

    id  product       date
2  220     6647 2014-10-16
5  826     3380 2015-05-19
8  901     4555 2014-11-01