How to Get Rows, by Group, of Data Frame with Earliest Timestamp

How to get rows, by group, of data frame with earliest timestamp?

You could use dplyr

library(dplyr)
group_by(df, group) %>% summarise(min = min(ts), letter = letter[which.min(ts)])
# group min letter
# 1 1 2013-02-01 e
# 2 2 2014-02-11 d
# 3 3 2014-02-11 i
# 4 4 2014-02-02 f

You could also slice the ranked rows

group_by(df, group) %>% 
mutate(rank = row_number(ts)) %>%
arrange(rank) %>%
slice(1)

group by pandas dataframe and select latest in each group

use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]

id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01

Slice the rows with the earliest datetime

df %>% 
group_by(ID) %>%
filter(DATETIME_OF_PROCEDURE == min(DATETIME_OF_PROCEDURE))

Pandas dataframe get first row of each group

>>> df.groupby('id').first()
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()
id value
0 1 first
1 2 first
2 3 first
3 4 second
4 5 first
5 6 first
6 7 fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)
id value
0 1 first
1 1 second
2 2 first
3 2 second
4 3 first
5 3 third
6 4 second
7 4 fifth
8 5 first
9 6 first
10 6 second
11 7 fourth
12 7 fifth

how to select oldest record of each group in a dataframe? using python

On Pandas, you can use groupby command to group values. Also, by using head command with groupby command, you can select first two values in the group. So, in your case, to group first two ticker, the command will be:

df.sort_values('date').groupby('ticker').head(2)

pandas groupby date select earliest per day

If you have unique index, you can use idxmin on timestamp to find out the indices of the minimum timestamp and extract them with loc:

df.timestamp = pd.to_datetime(df.timestamp)
df.loc[df.groupby(df.timestamp.dt.date, as_index=False).timestamp.idxmin()]

# value timestamp
#7 Fire 2017-10-03 14:31:55
#6 Water 2017-10-04 14:32:01
#5 Water 2017-10-05 14:32:13

Pandas dataframe: How to sort groups by the earliest time of a group

we can group by the eventid and get first(min) time as group value.

will get data like this

            time
eventid
1 9:10
2 9:00
3 9:40

then we merge to dataframe,and sort by the grouped time

groups = df.groupby('eventid').min('time')
df = df.merge(groups,on='eventid',suffixes=('','_right'))
df = df.sort_values('time_right')
    eventid time    time_right
2 2 9:20 9:00
3 2 9:00 9:00
0 1 9:10 9:10
1 1 9:30 9:10
4 3 9:40 9:40
5 3 9:50 9:40

Select row with most recent date by group

You can try

library(dplyr)
df %>%
group_by(ID) %>%
slice(which.max(as.Date(date, '%m/%d/%Y')))

data

df <- data.frame(ID= rep(1:3, each=3), date=c('02/20/1989',
'03/14/2001', '02/25/1990', '04/20/2002', '02/04/2005', '02/01/2008',
'08/22/2011','08/20/2009', '08/25/2010' ), stringsAsFactors=FALSE)


Related Topics



Leave a reply



Submit