Max and Min Date in Pandas Groupby

Max and Min date in pandas groupby

You need to combine the functions that apply to the same column, like this:

In [116]: gb.agg({'sum_col' : np.sum,
     ...:         'date' : [np.min, np.max]})
Out[116]: 
                      date             sum_col
                      amin       amax      sum
type weekofyear                               
A    25         2014-06-22 2014-06-22        1
     26         2014-06-25 2014-06-25        1
     27         2014-07-05 2014-07-05        2
B    26         2014-06-24 2014-06-24        2
     27         2014-07-02 2014-07-02        1
C    26         2014-06-25 2014-06-25        3
     27         2014-07-06 2014-07-06        3
     30         2014-07-27 2014-07-27        1

Pandas groupby value and get value of max date and min date

Try sort_values by year, then you can groupby and select first for min and last for max:

g = df.sort_values('year').groupby('item')
out = g['value'].last() - g['value'].first()

Output:

item
A    12
B    20
Name: value, dtype: int64

Finding the min and max date from a timeseries range in pandas

I would advise using a groupby on the "site" column and aggregating each group into a min and max date.

df.groupby("Site").agg({'date': ['min', 'max']})

This will return the min and max date for each site.

I haven't tried out the code, but it should do what you want.

Pandas group by two fields, pick min date and next max date from other group

shifting max_date per group

Here max_date is defined as the min_date of the previous id per brand

(data
 .groupby(['model_id','brand'])
 .agg(min_date=('release_date', 'min'))
 .assign(max_date=lambda d: d.groupby('brand')['min_date'].shift(-1))
 #.astype(str).to_markdown() # uncomment for markdown
)

output:

|             | min_date   | max_date   |
|:------------|:-----------|:-----------|
| (1, 'nike') | 2021-01-01 | 2021-01-03 |
| (2, 'nike') | 2021-01-03 | NaT        |

previous answer

You need to mask the data afterwards:

(data
 .groupby(['model_id','brand'])
 .agg(min_date=('release_date', 'min'), max_date=('release_date', 'max'))
 .assign(max_date=lambda d: d['max_date'].mask(d['max_date'].eq(d['min_date'])))
 #.astype(str).to_markdown() # uncomment for markdown
)

output (as markdown):

|             | min_date   | max_date   |
|:------------|:-----------|:-----------|
| (1, 'nike') | 2021-01-01 | 2021-01-02 |
| (2, 'nike') | 2021-01-03 | NaT        |

How to calculate difference between max and min date for each user

create new dataframe grouped per id with named cols for min and max values of dates,
later merge with original.

data input:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({
        "user_id": (np.random.randint(10000,10004,15, dtype="int32")),
        "purchase_date": (pd.date_range(start='2022-01-01', periods=15, freq='8H')),
        "C": pd.Series(1, index=list(range(15)), dtype="float32"),
        "D": np.array([5] * 15, dtype="int32"),
        "E": "foo",
    })
    df['purchase_date'] = pd.to_datetime(df['purchase_date']).dt.normalize()

    

# Solution

df_grouped = df.groupby(['user_id']).agg(
    date_min=('purchase_date', 'min'),
    date_max=('purchase_date', 'max'))\
    .reset_index()
df_grouped['diff']=(df_grouped['date_max']-df_grouped['date_min']).dt.days
df1 = pd.merge(df, df_grouped)
df1

Out:

   user_id purchase_date    C  D    E   date_min   date_max  diff
0     10001    2022-01-01  1.0  5  foo 2022-01-01 2022-01-04     3
1     10001    2022-01-02  1.0  5  foo 2022-01-01 2022-01-04     3
2     10001    2022-01-03  1.0  5  foo 2022-01-01 2022-01-04     3
3     10001    2022-01-04  1.0  5  foo 2022-01-01 2022-01-04     3
4     10000    2022-01-01  1.0  5  foo 2022-01-01 2022-01-04     3
5     10000    2022-01-02  1.0  5  foo 2022-01-01 2022-01-04     3
6     10000    2022-01-03  1.0  5  foo 2022-01-01 2022-01-04     3
7     10000    2022-01-04  1.0  5  foo 2022-01-01 2022-01-04     3
8     10002    2022-01-01  1.0  5  foo 2022-01-01 2022-01-05     4
9     10002    2022-01-02  1.0  5  foo 2022-01-01 2022-01-05     4
10    10002    2022-01-03  1.0  5  foo 2022-01-01 2022-01-05     4
11    10002    2022-01-05  1.0  5  foo 2022-01-01 2022-01-05     4
12    10002    2022-01-05  1.0  5  foo 2022-01-01 2022-01-05     4
13    10003    2022-01-04  1.0  5  foo 2022-01-04 2022-01-05     1
14    10003    2022-01-05  1.0  5  foo 2022-01-04 2022-01-05     1

Pandas group by on one column with max date on another column python

You can use boolean indexing using groupby and transform

df_new = df[df.groupby('dealer').date.transform('max') == df['date']]

    invoice_no  dealer  billing_change_previous_month   date
1   100         1       -41981                          2017-01-30
2   5505        2       0                               2017-01-30

The solution works as expected even if there are more than two dealers (to address question posted by Ben Smith),

df = pd.DataFrame({'invoice_no':[110,100,5505,5635,10000,10001], 'dealer':[1,1,2,2,3,3],'billing_change_previous_month':[0,-41981,0,58730,9000,100], 'date':['2016-12-31','2017-01-30','2017-01-30','2016-12-31', '2019-12-31', '2020-01-31']})

df['date'] = pd.to_datetime(df['date'])
df[df.groupby('dealer').date.transform('max') == df['date']]

    invoice_no  dealer  billing_change_previous_month   date
1   100         1       -41981                          2017-01-30
2   5505        2       0                               2017-01-30
5   10001       3       100                             2020-01-31

Python Min/Max Dates with Groupby

Starting from your original dataframe , you can use a helper column with series.shift to compare the next row and use it for grouping, then groupby and agg with min and max, rename and reset the index:

s = df['Price'].ne(df['Price'].shift()).cumsum()

d = {"min":"start_dt", "max":"end_dt"}
out = (df.groupby([s,'Price'])['ds'].agg(['min','max']).rename(columns=d)
                                             .droplevel(0).reset_index())

print(out)

   Price    start_dt      end_dt
0      3  2017-01-01  2017-01-05
1      4  2017-01-06  2017-01-09
2      3  2017-01-10  2017-01-14

Pandas Groupby with Agg Min/Max date

In pandas, NaN is used as the missing value, and is ignored for most operations, so it's the right one to use. If you're still getting an error, it's probably because you've got a datetime.date there (well, you've definitely got that there, I mean that it's probably causing the problems).

For example, if your missing values are "" and your column dtypes are object with internal types of datetime.date, I get:

In [496]: df.groupby("issue").agg({"p_date": "min", "s_date": "max"})
[...]
TypeError: '<=' not supported between instances of 'datetime.date' and 'str'

but if I switch to pandas-native time objects and NaNs, it works:

In [500]: df["p_date"] = pd.to_datetime(df["p_date"])

In [501]: df["s_date"] = pd.to_datetime(df["s_date"])

In [502]: df
Out[502]: 
   issue     p_date     s_date
0  issue 2012-11-01        NaT
1  issue 2013-12-09        NaT
2  issue 2014-12-08        NaT
3  issue        NaT 2016-01-13
4  issue 2012-11-01        NaT
5  issue        NaT 2014-03-26
6  issue        NaT 2015-05-29
7  issue 2013-12-18        NaT
8  issue        NaT 2016-01-13

In [503]: df.groupby("issue").agg({"p_date": "min", "s_date": "max"})
Out[503]: 
          p_date     s_date
issue                      
issue 2012-11-01 2016-01-13

get the difference between max and min for a groupby in pandas and calculate the average

For pandas 0.25+ is possible use named aggregations, then subtract and divide columns:

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg(min1=('f_date','min'),
                                               max1=('f_date','max'),
                                               rn=('rn', 'max'))

df['new'] = df['max1'].sub(df['min1']).div(df['rn'].add(1))
print (df)
                              min1                max1  rn               new
ticker fy   fp                                                              
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1   0 days 00:00:00
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0   0 days 00:00:00
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1 790 days 12:00:00

Or if necessary convert difference of datetimes (timedeltas) to seconds by Series.dt.total_seconds:

df['new1'] = df['max1'].sub(df['min1']).dt.total_seconds().div(df['rn'].add(1))
print (df)
                              min1                max1  rn         new
ticker fy   fp                                                        
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34   1         0.0
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34   0         0.0
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34   1  68299200.0

Solution for oldier pandas versions:

df['f_date'] = pd.to_datetime(df['f_date'])
df = df.groupby(by=['ticker', 'fy', 'fp']).agg({'f_date':['min','max'],
                                               'rn':'max'})
df.columns = df.columns.map('_'.join)
df['new'] = df['f_date_max'].sub(df['f_date_min']).div(df['rn_max'].add(1))
print (df)
                        f_date_min          f_date_max  rn_max  \
ticker fy   fp                                                   
AAPL   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
       2011 0  2012-01-01 12:12:34 2012-01-01 12:12:34       1   
GOOG   2010 0  2010-01-01 12:12:34 2010-01-01 12:12:34       0   
MSFT   2009 0  2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                             new  
ticker fy   fp                    
AAPL   2010 0    0 days 00:00:00  
       2011 0    0 days 00:00:00  
GOOG   2010 0    0 days 00:00:00  
MSFT   2009 0  790 days 12:00:00

Last if necessary convert MultiIndex to columns:

df = df.reset_index()
print (df)
  ticker    fy  fp          f_date_min          f_date_max  rn_max  \
0   AAPL  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
1   AAPL  2011   0 2012-01-01 12:12:34 2012-01-01 12:12:34       1   
2   GOOG  2010   0 2010-01-01 12:12:34 2010-01-01 12:12:34       0   
3   MSFT  2009   0 2010-01-01 12:12:34 2014-05-01 12:12:34       1   

                new  
0   0 days 00:00:00  
1   0 days 00:00:00  
2   0 days 00:00:00  
3 790 days 12:00:00

Max and Min Date in Pandas Groupby