Python Pandas Group by Date Using Datetime Data

Python Pandas Group by date using datetime data

`resample`

df.resample('D', on='Date_Time').mean()

              B
Date_Time      
2001-10-01  4.5
2001-10-02  6.0

`Grouper`

As suggested by @JosephCottam

df.set_index('Date_Time').groupby(pd.Grouper(freq='D')).mean()

              B
Date_Time      
2001-10-01  4.5
2001-10-02  6.0

Deprecated uses of `TimeGrouper`

You can set the index to be 'Date_Time' and use pd.TimeGrouper

df.set_index('Date_Time').groupby(pd.TimeGrouper('D')).mean().dropna()

              B
Date_Time      
2001-10-01  4.5
2001-10-02  6.0

How to group pandas DataFrame entries by date in a non-unique column

I'm using pandas 0.16.2. This has better performance on my large dataset:

data.groupby(data.date.dt.year)

Using the dt option and playing around with weekofyear, dayofweek etc. becomes far easier.

Pandas dataframe groupby datetime month

Managed to do it:

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

Grouping by date range (timedelta) with Pandas

You can use a groupby with a custom group:

# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
          .apply(lambda s: s.diff().abs().gt('3days').cumsum())
         )

# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
         .agg({'date': 'last', 'val': 'sum'})
      )

output:

   user_id     date  val
0        1   1-2-17    3
1        2   1-2-17    2
2        2  1-10-17    1
3        3   1-1-17    1
4        3   2-5-17    8

intermediates (sorted by user_id for clarity):

    user_id     date  val   datetime    diff     abs  >3days  cumsum
0         1   1-1-17    1 2017-01-01     NaT     NaT   False       0
3         1   1-1-17    1 2017-01-01  0 days  0 days   False       0
4         1   1-2-17    1 2017-01-02  1 days  1 days   False       0
1         2   1-1-17    1 2017-01-01     NaT     NaT   False       0
5         2   1-2-17    1 2017-01-02  1 days  1 days   False       0
6         2  1-10-17    1 2017-01-10  8 days  8 days    True       1
2         3   1-1-17    1 2017-01-01     NaT     NaT   False       0
7         3   2-1-17    1 2017-02-01 31 days 31 days    True       1
8         3   2-2-17    1 2017-02-02  1 days  1 days   False       1
9         3   2-3-17    2 2017-02-03  1 days  1 days   False       1
10        3   2-4-17    3 2017-02-04  1 days  1 days   False       1
11        3   2-5-17    1 2017-02-05  1 days  1 days   False       1

How to group by multiple columns and with only date part of datetime in pandas

Here you go:

# df["Date & Time"] = pd.to_datetime(df["Date & Time"])  # If not already datetime
df.set_index(["CELL ID", "Party", df["Date & Time"].dt.date])

Output:

                             LOC         Date & Time
CELL ID Party   Date & Time
10631   3009787 2021-10-01   bwp 2021-10-01 08:20:30
                2021-10-01   bwp 2021-10-01 08:40:50
50987   2275172 2021-10-02   bwp 2021-10-02 07:50:20
                2021-10-02   bwp 2021-10-02 07:23:16

Pandas groupby month and year (date as datetime64[ns]) and summarized by count

you can groupby and get the dt.year and the dt.month_name from the column date.

print (df.groupby([df['date'].dt.year.rename('year'), 
                   df['date'].dt.month_name().rename('month')])
         ['rides'].sum().reset_index())
   year    month    rides
0  2019  January  2596765
1  2020    March   880003

group by week in pandas

First, convert column date to_datetime and subtract one week as we want the sum for the week ahead of the date and not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
    .sum()
    .reset_index()
    .sort_values('Date')
print (df)

  Name       Date  Quantity
0   Apple 2017-07-10        90
3  orange 2017-07-10        20
1   Apple 2017-07-17        30
2  Orange 2017-07-24        40

Pandas groupby a column and sort by date and get only the latest row

If date has higher precendence than content_id, use that fact in sort_values:

out = df.sort_values(['user_id','date','content_id']).groupby(['user_id'])[['content_id','date']].last()

Another possibility is to convert date to datetime and the find the latest date's index using groupby + idxmax; then use loc to filter the desired output:

df['date'] = pd.to_datetime(df['date'])
out = df.loc[df.groupby('user_id')['date'].idxmax()]

Output:

         content_id        date
user_id                        
123              20  2020-10-14
234              19  2021-05-26

Python Pandas Group by Date Using Datetime Data