Pandas Timedelta in Days

You need 0.11 for this (0.11rc1 is out, final prob next week)

In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])

In [10]: df
Out[10]: 
                    0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [11]: df = DataFrame([ Timestamp('20010101'), 
                          Timestamp('20040601') ],columns=['age'])

In [12]: df
Out[12]: 
                  age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [13]: df['today'] = Timestamp('20130419')

In [14]: df['diff'] = df['today']-df['age']

In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)

In [17]: df
Out[17]: 
                  age               today                diff      years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00  12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00   8.887671

You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)

Extracting number of days from timedelta column in pandas

IMO, a better idea would be to convert to timedelta and extract the days component.

pd.to_timedelta(df.Aging, errors='coerce').dt.days

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int64

If you insist on using string methods, you can use str.extract.

pd.to_numeric(
    df.Aging.str.extract('(.*?) days', expand=False),
    errors='coerce')

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int32

Or, using str.split

pd.to_numeric(df.Aging.str.split(' days').str[0], errors='coerce')

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int64

Remove the days in the timedelta object

I think you can subtract days converted to timedeltas:

td = pd.to_timedelta(['-1 days +02:45:00','1 days +02:45:00','0 days +02:45:00'])
df = pd.DataFrame({'td': td})

df['td'] = df['td'] - pd.to_timedelta(df['td'].dt.days, unit='d')

print (df.head())

        td
0 02:45:00
1 02:45:00
2 02:45:00

print (type(df.loc[0, 'td']))
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>

Or convert timedeltas to strings and extract strings between days and .:

df['td'] = df['td'].astype(str).str.extract('days (.*?)\.')
print (df.head())
          td
0  +02:45:00
1   02:45:00
2   02:45:00

print (type(df.loc[0, 'td']))
<class 'str'>

Pandas dataframe Timedelta format: with days or with cumulative hours

Why does this change occur?

The list of time strings have all values less than 24 hours. Which means they all have day = 0. Therefore, when you print the df, pandas doesn't display it. If you change some value, let's say 12:05:00 to 25:05:00, you will get the following output

         Duration      Cumulative
0 0 days 01:07:37 0 days 01:07:37
1 0 days 13:16:44 0 days 14:24:21
2 0 days 11:09:56 1 days 01:34:17
3 1 days 01:05:00 2 days 02:39:17
4 0 days 01:33:01 2 days 04:12:18

Now, as we have different days in our Duration column, pandas display it's values.

How can I control it?

You don't have to worry about the difference in output. When, you need to get the values you can use components() function which returns a namedtuple

print(df['Duration'].iloc[0].components)

output:

Components(days=0, hours=1, minutes=7, seconds=37, milliseconds=0, microseconds=0, nanoseconds=0)

Convert timedelta of days into years

I can help you, check this out- > timedelta(days=5511).days this returns days in int and then you can divide it to 365 and you will take years. timedelta(days=5511).days/365 .

Grouping by date range (timedelta) with Pandas

You can use a groupby with a custom group:

# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
          .apply(lambda s: s.diff().abs().gt('3days').cumsum())
         )

# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
         .agg({'date': 'last', 'val': 'sum'})
      )

output:

   user_id     date  val
0        1   1-2-17    3
1        2   1-2-17    2
2        2  1-10-17    1
3        3   1-1-17    1
4        3   2-5-17    8

intermediates (sorted by user_id for clarity):

    user_id     date  val   datetime    diff     abs  >3days  cumsum
0         1   1-1-17    1 2017-01-01     NaT     NaT   False       0
3         1   1-1-17    1 2017-01-01  0 days  0 days   False       0
4         1   1-2-17    1 2017-01-02  1 days  1 days   False       0
1         2   1-1-17    1 2017-01-01     NaT     NaT   False       0
5         2   1-2-17    1 2017-01-02  1 days  1 days   False       0
6         2  1-10-17    1 2017-01-10  8 days  8 days    True       1
2         3   1-1-17    1 2017-01-01     NaT     NaT   False       0
7         3   2-1-17    1 2017-02-01 31 days 31 days    True       1
8         3   2-2-17    1 2017-02-02  1 days  1 days   False       1
9         3   2-3-17    2 2017-02-03  1 days  1 days   False       1
10        3   2-4-17    3 2017-02-04  1 days  1 days   False       1
11        3   2-5-17    1 2017-02-05  1 days  1 days   False       1

Pandas Timedelta in Days