How to Group a Pandas Dataframe by a Defined Time Interval

How to group a pandas dataframe by a defined time interval?

Use base=30 in conjunction with label='right' parameters in pd.Grouper.

Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30.
Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.

Suppose, you want to aggregate the first element of every sub-group, then:

df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()

yields:

                           data
index
2017-02-14 06:30:00 11198648.0
2017-02-14 07:30:00 11198650.0
2017-02-14 08:30:00 NaN
2017-02-14 09:30:00 NaN
2017-02-14 10:30:00 NaN
2017-02-14 11:30:00 NaN
2017-02-14 12:30:00 NaN
2017-02-14 13:30:00 NaN
2017-02-14 14:30:00 NaN
2017-02-14 15:30:00 NaN
2017-02-14 16:30:00 NaN
2017-02-14 17:30:00 NaN
2017-02-14 18:30:00 NaN
2017-02-14 19:30:00 NaN
2017-02-14 20:30:00 NaN
2017-02-14 21:30:00 NaN
2017-02-14 22:30:00 NaN
2017-02-14 23:30:00 11207728.0

Pandas group by time interval (5min, 10min, 1day, 1year) and count amount of entries

Are you looking for something like this, for minute intervals:

df.groupby(['point',df.timestamp_local.dt.floor('5Min')]).size()

and this, for month/year

df.groupby(['point', df.timestamp_local.dt.to_period('M')]).size()

Pandas dataframe Group by Time Interval and then ID with sum of Counts

first, a new columns is created where minute and seconds were made to zero, by flooring the hour. Then the Pivot_table gives the required result

df['selected_time_2'] = df['selected time'].astype('datetime64').dt.floor('h').dt.strftime('%m/%d/%YY %I:%M:%S %p')

df.pivot_table(index='id',columns='selected_time_2', values='count', aggfunc='sum').fillna('').reset_index()
selected_time_2     id  05/16/2022Y 02:00:00 PM     05/16/2022Y 03:00:00 PM
0 1 2 2
1 2 1 2
2 3 2
3 4 1 1
4 5 1

EDIT:
To show the columns in 12 hrs format

import datetime

df['selected_time_2'] = df['selected time'].astype('datetime64').dt.floor('h') #.dt.strftime('%m/%d/%Y %I:%M:%S %p').astype('datetime64')

df2 = df.pivot_table(index='id',columns='selected_time_2', values='count', aggfunc='sum').fillna('').reset_index()

df2.columns = [col.strftime('%m/%d/%Y %I:%M:%S %p')
if (isinstance(col, datetime.date) )
else col
for col in df2.columns]
df2

RESULT:

    id  05/16/2022 11:00:00 AM  05/16/2022 12:00:00 PM  05/16/2022 02:00:00 PM  05/16/2022 03:00:00 PM
0 1 1.0 1.0 2.0
1 2 1.0 2.0
2 3 2.0
3 4 1.0 1.0
4 5 1.0

aggfunc added to get the sum, missed the first time around in the response
PS: I'm not sure how to format the result properly, and can take advise on it.

Python GroupBy time interval

I think you need (assuming ts_A is set to DatetimeIndex) GroupBy.mean and omit transform function:

#if not datetimeindex
#df['ts_A'] = pd.to_datetime(df['ts_A'])
#df = df.set_index('ts_A')

df = df_a['value'].groupby([df_a['id_A'],
df_a['course'],
df_a['weight'],
pd.TimeGrouper(freq='30S')]).mean().reset_index()

Or:

df = df_a.groupby(['id_A','course','weight', 
pd.TimeGrouper(freq='30S')])['value'].mean().reset_index()
print (df)
id_A course weight ts_A value
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000

Solution with resample:

df = df_a.groupby(['id_A','course','weight'])['value'].resample('30S').mean().reset_index()
print (df)
id_A course weight ts_A value
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000

SETUP:

d = {'weight': {0: 3.5, 1: 3.5, 2: 3.5, 3: 3.5, 4: 3.5, 5: 3.5, 6: 3.5, 7: 3.5, 8: 3.5, 9: 3.5, 10: 5.0, 11: 5.0, 12: 5.0, 13: 5.0, 14: 5.0, 15: 5.0, 16: 5.0, 17: 5.0, 18: 5.0, 19: 5.0}, 'value': {0: 100, 1: 200, 2: 350, 3: 400, 4: 500, 5: 600, 6: 700, 7: 800, 8: 900, 9: 1000, 10: 100, 11: 200, 12: 450, 13: 300, 14: 600, 15: 500, 16: 522, 17: 540, 18: 320, 19: 900}, 'ts_A': {0: '2017-04-27 01:35:40', 1: '2017-04-27 01:35:50', 2: '2017-04-27 01:36:00', 3: '2017-04-27 01:36:10', 4: '2017-04-27 01:36:20', 5: '2017-04-27 01:36:30', 6: '2017-04-27 01:36:40', 7: '2017-04-27 01:36:50', 8: '2017-04-27 01:37:00', 9: '2017-04-27 01:37:10', 10: '2017-04-27 02:35:40', 11: '2017-04-27 02:35:50', 12: '2017-04-27 02:36:00', 13: '2017-04-27 02:36:10', 14: '2017-04-27 02:36:20', 15: '2017-04-27 02:36:30', 16: '2017-04-27 02:36:40', 17: '2017-04-27 02:36:50', 18: '2017-04-27 02:37:00', 19: '2017-04-27 02:37:10'}, 'course': {0: 'cotton', 1: 'cotton', 2: 'cotton', 3: 'cotton', 4: 'cotton', 5: 'cotton', 6: 'cotton', 7: 'cotton', 8: 'cotton', 9: 'cotton', 10: 'cotton blue', 11: 'cotton blue', 12: 'cotton blue', 13: 'cotton blue', 14: 'cotton blue', 15: 'cotton blue', 16: 'cotton blue', 17: 'cotton blue', 18: 'cotton blue', 19: 'cotton blue'}, 'id_A': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1', 6: 'id1', 7: 'id1', 8: 'id1', 9: 'id1', 10: 'id2', 11: 'id2', 12: 'id2', 13: 'id2', 14: 'id2', 15: 'id2', 16: 'id2', 17: 'id2', 18: 'id2', 19: 'id2'}}
df_a = pd.DataFrame(d)
df_a['ts_A'] = pd.to_datetime(df_a['ts_A'])
df_a = df_a.set_index('ts_A')
print (df_a)
course id_A value weight
ts_A
2017-04-27 01:35:40 cotton id1 100 3.5
2017-04-27 01:35:50 cotton id1 200 3.5
2017-04-27 01:36:00 cotton id1 350 3.5
2017-04-27 01:36:10 cotton id1 400 3.5
2017-04-27 01:36:20 cotton id1 500 3.5
2017-04-27 01:36:30 cotton id1 600 3.5
2017-04-27 01:36:40 cotton id1 700 3.5
2017-04-27 01:36:50 cotton id1 800 3.5
2017-04-27 01:37:00 cotton id1 900 3.5
2017-04-27 01:37:10 cotton id1 1000 3.5
2017-04-27 02:35:40 cotton blue id2 100 5.0
2017-04-27 02:35:50 cotton blue id2 200 5.0
2017-04-27 02:36:00 cotton blue id2 450 5.0
2017-04-27 02:36:10 cotton blue id2 300 5.0
2017-04-27 02:36:20 cotton blue id2 600 5.0
2017-04-27 02:36:30 cotton blue id2 500 5.0
2017-04-27 02:36:40 cotton blue id2 522 5.0
2017-04-27 02:36:50 cotton blue id2 540 5.0
2017-04-27 02:37:00 cotton blue id2 320 5.0
2017-04-27 02:37:10 cotton blue id2 900 5.0

Group by a pandas dataframe by 15 min time intervals but for the whole day

Add 00:00:00 to minimal time and 23:45:00 to maximal, so in ouput are all expected values:

s = df['time'].agg(['min','max']).dt.normalize().copy()
s['max'] = s['max'] + pd.DateOffset(hours=23, minutes=45)

df = df.append(s.to_frame().assign(Q = 0), ignore_index=True)
print (df)
time Q
0 2019-12-07 09:13:00 10
1 2019-12-07 09:33:00 1
2 2019-12-07 09:41:00 1
3 2019-12-07 10:03:00 6
4 2019-12-07 10:15:00 5
5 2019-12-07 10:37:00 3
6 2019-12-07 10:48:00 15
7 2019-12-07 11:05:00 3
8 2019-12-07 11:16:00 8
9 2019-12-07 11:34:00 5
10 2019-12-07 11:48:00 10
11 2019-12-07 12:01:00 6
12 2019-12-07 12:18:00 7
13 2019-12-07 00:00:00 0
14 2019-12-07 23:45:00 0

And then use your solution, e.g.:

df.groupby(pd.Grouper(key="time", freq="15Min"))['Q'].sum()

If need processing each dates separately - first use your solution and then add misisng Datetimes by Series.reindex:

print (df)
time Q
0 2019-12-07 09:13:00 10
1 2019-12-07 09:33:00 1
2 2019-12-07 09:41:00 1
3 2019-12-07 10:03:00 6
4 2019-12-07 10:15:00 5
5 2019-12-07 10:37:00 3
6 2019-12-07 10:48:00 15
7 2019-12-07 11:05:00 3
8 2019-12-09 11:16:00 8
9 2019-12-09 11:34:00 5
10 2019-12-09 11:48:00 10
11 2019-12-09 12:01:00 6
12 2019-12-09 12:18:00 7

dates = [y for x in df.time.dt.normalize().drop_duplicates()
for y in pd.date_range(x, x + pd.DateOffset(hours=23, minutes=45), freq='15T')]
print (dates[:2])
[Timestamp('2019-12-07 00:00:00', freq='15T'), Timestamp('2019-12-07 00:15:00', freq='15T')]


df = df.groupby(df.time.dt.floor('15T'))["Q"].sum().reindex(dates, fill_value=0)
print (df)
time
2019-12-07 00:00:00 0
2019-12-07 00:15:00 0
2019-12-07 00:30:00 0
2019-12-07 00:45:00 0
2019-12-07 01:00:00 0
..
2019-12-09 22:45:00 0
2019-12-09 23:00:00 0
2019-12-09 23:15:00 0
2019-12-09 23:30:00 0
2019-12-09 23:45:00 0
Name: Q, Length: 192, dtype: int64

How to group by time-interval from bottom to top using Pandas resample functionality?

Maybe you can use the iloc to reverse after resample? I'm not sure if that hinders your further calculations, but it can resample and reverse the set.

Since I do not have access to your exact sample data

Here's how I am testing it:

import yfinance as yf
import pandas as pd
import numpy as np

df = yf.download(tickers = 'BTC-USD',
start = '2022-05-16',
end = '2022-05-17',
interval = '1m',
group_by = 'ticker',
auto_adjust = True).reset_index()

df_1min = df.iloc[110:130,:] #sample timeframe extracted
df_1min.head()

This results in the 1 min df:

1min df

You can then apply resample and the iloc:

conversion = {'Open' : 'first', 
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume' : 'sum'}

df_1min = df_1min.set_index('Datetime')

df_5min = df_1min.resample('5T').agg(conversion)

df_5min.iloc[::-1].reset_index() #reverse

This results in a reversed df:

reversed and resampled



Related Topics



Leave a reply



Submit