Python Pandas Group by Date Using Datetime Data

Python Pandas Group by date using datetime data

resample

df.resample('D', on='Date_Time').mean()

B
Date_Time
2001-10-01 4.5
2001-10-02 6.0


Grouper

As suggested by @JosephCottam

df.set_index('Date_Time').groupby(pd.Grouper(freq='D')).mean()

B
Date_Time
2001-10-01 4.5
2001-10-02 6.0


Deprecated uses of TimeGrouper

You can set the index to be 'Date_Time' and use pd.TimeGrouper

df.set_index('Date_Time').groupby(pd.TimeGrouper('D')).mean().dropna()

B
Date_Time
2001-10-01 4.5
2001-10-02 6.0

How to group pandas DataFrame entries by date in a non-unique column

I'm using pandas 0.16.2. This has better performance on my large dataset:

data.groupby(data.date.dt.year)

Using the dt option and playing around with weekofyear, dayofweek etc. becomes far easier.

Pandas dataframe groupby datetime month

Managed to do it:

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

Grouping by date range (timedelta) with Pandas

You can use a groupby with a custom group:

# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
.apply(lambda s: s.diff().abs().gt('3days').cumsum())
)

# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
.agg({'date': 'last', 'val': 'sum'})
)

output:

   user_id     date  val
0 1 1-2-17 3
1 2 1-2-17 2
2 2 1-10-17 1
3 3 1-1-17 1
4 3 2-5-17 8

intermediates (sorted by user_id for clarity):

    user_id     date  val   datetime    diff     abs  >3days  cumsum
0 1 1-1-17 1 2017-01-01 NaT NaT False 0
3 1 1-1-17 1 2017-01-01 0 days 0 days False 0
4 1 1-2-17 1 2017-01-02 1 days 1 days False 0
1 2 1-1-17 1 2017-01-01 NaT NaT False 0
5 2 1-2-17 1 2017-01-02 1 days 1 days False 0
6 2 1-10-17 1 2017-01-10 8 days 8 days True 1
2 3 1-1-17 1 2017-01-01 NaT NaT False 0
7 3 2-1-17 1 2017-02-01 31 days 31 days True 1
8 3 2-2-17 1 2017-02-02 1 days 1 days False 1
9 3 2-3-17 2 2017-02-03 1 days 1 days False 1
10 3 2-4-17 3 2017-02-04 1 days 1 days False 1
11 3 2-5-17 1 2017-02-05 1 days 1 days False 1

How to group by multiple columns and with only date part of datetime in pandas

Here you go:

# df["Date & Time"] = pd.to_datetime(df["Date & Time"])  # If not already datetime
df.set_index(["CELL ID", "Party", df["Date & Time"].dt.date])

Output:

                             LOC         Date & Time
CELL ID Party Date & Time
10631 3009787 2021-10-01 bwp 2021-10-01 08:20:30
2021-10-01 bwp 2021-10-01 08:40:50
50987 2275172 2021-10-02 bwp 2021-10-02 07:50:20
2021-10-02 bwp 2021-10-02 07:23:16

Pandas groupby month and year (date as datetime64[ns]) and summarized by count

you can groupby and get the dt.year and the dt.month_name from the column date.

print (df.groupby([df['date'].dt.year.rename('year'), 
df['date'].dt.month_name().rename('month')])
['rides'].sum().reset_index())
year month rides
0 2019 January 2596765
1 2020 March 880003

group by week in pandas

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
.sum()
.reset_index()
.sort_values('Date')
print (df)
  Name       Date  Quantity
0 Apple 2017-07-10 90
3 orange 2017-07-10 20
1 Apple 2017-07-17 30
2 Orange 2017-07-24 40

Pandas groupby a column and sort by date and get only the latest row

If date has higher precendence than content_id, use that fact in sort_values:

out = df.sort_values(['user_id','date','content_id']).groupby(['user_id'])[['content_id','date']].last()

Another possibility is to convert date to datetime and the find the latest date's index using groupby + idxmax; then use loc to filter the desired output:

df['date'] = pd.to_datetime(df['date'])
out = df.loc[df.groupby('user_id')['date'].idxmax()]

Output:

         content_id        date
user_id
123 20 2020-10-14
234 19 2021-05-26


Related Topics



Leave a reply



Submit