Pandas Timedelta in Days

Pandas Timedelta in Days

You need 0.11 for this (0.11rc1 is out, final prob next week)

In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])

In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])

In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00

In [13]: df['today'] = Timestamp('20130419')

In [14]: df['diff'] = df['today']-df['age']

In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)

In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671

You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)

Extracting number of days from timedelta column in pandas

IMO, a better idea would be to convert to timedelta and extract the days component.

pd.to_timedelta(df.Aging, errors='coerce').dt.days

0 -84
1 -46
2 -131
3 -131
4 -130
5 -80
Name: Aging, dtype: int64

If you insist on using string methods, you can use str.extract.

pd.to_numeric(
df.Aging.str.extract('(.*?) days', expand=False),
errors='coerce')

0 -84
1 -46
2 -131
3 -131
4 -130
5 -80
Name: Aging, dtype: int32

Or, using str.split

pd.to_numeric(df.Aging.str.split(' days').str[0], errors='coerce')

0 -84
1 -46
2 -131
3 -131
4 -130
5 -80
Name: Aging, dtype: int64

Remove the days in the timedelta object

I think you can subtract days converted to timedeltas:

td = pd.to_timedelta(['-1 days +02:45:00','1 days +02:45:00','0 days +02:45:00'])
df = pd.DataFrame({'td': td})

df['td'] = df['td'] - pd.to_timedelta(df['td'].dt.days, unit='d')

print (df.head())

td
0 02:45:00
1 02:45:00
2 02:45:00

print (type(df.loc[0, 'td']))
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>

Or convert timedeltas to strings and extract strings between days and .:

df['td'] = df['td'].astype(str).str.extract('days (.*?)\.')
print (df.head())
td
0 +02:45:00
1 02:45:00
2 02:45:00

print (type(df.loc[0, 'td']))
<class 'str'>

Pandas dataframe Timedelta format: with days or with cumulative hours

Why does this change occur?

The list of time strings have all values less than 24 hours. Which means they all have day = 0. Therefore, when you print the df, pandas doesn't display it. If you change some value, let's say 12:05:00 to 25:05:00, you will get the following output

         Duration      Cumulative
0 0 days 01:07:37 0 days 01:07:37
1 0 days 13:16:44 0 days 14:24:21
2 0 days 11:09:56 1 days 01:34:17
3 1 days 01:05:00 2 days 02:39:17
4 0 days 01:33:01 2 days 04:12:18

Now, as we have different days in our Duration column, pandas display it's values.

How can I control it?

You don't have to worry about the difference in output. When, you need to get the values you can use components() function which returns a namedtuple

print(df['Duration'].iloc[0].components)

output:

Components(days=0, hours=1, minutes=7, seconds=37, milliseconds=0, microseconds=0, nanoseconds=0)

Convert timedelta of days into years

I can help you, check this out- > timedelta(days=5511).days this returns days in int and then you can divide it to 365 and you will take years. timedelta(days=5511).days/365 .

Grouping by date range (timedelta) with Pandas

You can use a groupby with a custom group:

# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
.apply(lambda s: s.diff().abs().gt('3days').cumsum())
)

# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
.agg({'date': 'last', 'val': 'sum'})
)

output:

   user_id     date  val
0 1 1-2-17 3
1 2 1-2-17 2
2 2 1-10-17 1
3 3 1-1-17 1
4 3 2-5-17 8

intermediates (sorted by user_id for clarity):

    user_id     date  val   datetime    diff     abs  >3days  cumsum
0 1 1-1-17 1 2017-01-01 NaT NaT False 0
3 1 1-1-17 1 2017-01-01 0 days 0 days False 0
4 1 1-2-17 1 2017-01-02 1 days 1 days False 0
1 2 1-1-17 1 2017-01-01 NaT NaT False 0
5 2 1-2-17 1 2017-01-02 1 days 1 days False 0
6 2 1-10-17 1 2017-01-10 8 days 8 days True 1
2 3 1-1-17 1 2017-01-01 NaT NaT False 0
7 3 2-1-17 1 2017-02-01 31 days 31 days True 1
8 3 2-2-17 1 2017-02-02 1 days 1 days False 1
9 3 2-3-17 2 2017-02-03 1 days 1 days False 1
10 3 2-4-17 3 2017-02-04 1 days 1 days False 1
11 3 2-5-17 1 2017-02-05 1 days 1 days False 1


Related Topics



Leave a reply



Submit