Pandas: Subtracting Two Date Columns and the Result Being an Integer

Pandas: Subtracting two date columns and the result being an integer

How about:

df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days

This will return difference as int if there are no missing values(NaT) and float if there is.

Pandas have a rich documentation on Time series / date functionality and Time deltas

How can I subtract two date time values in a Pandas Dataframe

Convert your columns to actual dates first:

df['ALARM_DATE'] = pd.to_datetime(df['ALARM_DATE'])
df['CONT_DATE'] = pd.to_datetime(df['CONT_DATE'])

Or:

df[['ALARM_DATE', 'CONT_DATE']] = df[['ALARM_DATE', 'CONT_DATE']].apply(pd.to_datetime)

Output:

>>> df['CONT_DATE'] - df['ALARM_DATE']
0 5 days
1 3 days
2 -162 days
3 1 days
4 5 days
dtype: timedelta64[ns]

Pandas - Python - how to subtract two different date columns

I think need subtract datetimes, so is necessary convert date in now and in Created_Date column, last for convert timedeltas to days use dt.days:

import datetime
now = datetime.date.today()
today = pd.Timestamp(now)

data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = today
data['Aging'] = data['Aging'].sub(data['Created_Date'], axis=0).dt.days

Solution should be simplify:

data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = data['Created_Date'].rsub(today, axis=0).dt.days

Subtract datetime column and get result in seconds pandas

Create datetimes first by to_datetime and then convert timedeltas by Series.dt.total_seconds:

df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))

df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds()
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 NaN
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14.0
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400.0
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410.0

For integers use integer na for integers with missing values:

df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().astype('Int64')
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 <NA>
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410

If need seconds instead floats add custom function with Series.map:

df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))

f = lambda x: '' if pd.isna(x) else f'{int(x)} seconds'
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().map(f)
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14 seconds
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400 seconds
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410 seconds

Is there a way to subtract two columns containing Quarters and return the integer number of Quarters between them?

This might not be the most elegant way to do this, but you skip having to define dates ans so on. I made a df for just the problem:

dfq = pd.read_csv(r"C:/users/k_sego/quarter.csv",sep=";")
print(dfq)

which looks like this

   Cohort EndQuarter
0 2015Q1 2015Q1
1 2015Q1 2015Q3
2 2015Q1 2018Q4
3 2015Q1 2019Q2
4 2015Q1 2019Q3
5 2015Q1 2020Q1

I extract the quarters from each date column into new column nd keep track of where they come from, as well as the year. Remember to convert to numeric.

dfq['CohortQ'] = dfq.Cohort.str.slice(5,6)
dfq['EndQuarterQ'] = dfq.EndQuarter.str.slice(5,6)
dfq['CohortYear'] = dfq.Cohort.str.slice(0,4)
dfq['EndQuarterYear'] = dfq.EndQuarter.str.slice(0,4)
cols = dfq.columns.drop(['Cohort','EndQuarter'])

dfq[cols] = dfq[cols].apply(pd.to_numeric, errors='coerce')

Now, the difference between years times 4 is the number of quarter, but to this you need to add the difference between the quarters at which the years were at.

dfq['CountQuarters'] = (dfq['EndQuarterYear']-dfq['CohortYear'])*4 +(dfq['EndQuarterQ']-dfq['CohortQ'])

which gives

   Cohort EndQuarter  CohortQ  EndQuarterQ  CohortYear  EndQuarterYear  \
0 2015Q1 2015Q1 1 1 2015 2015
1 2015Q1 2015Q3 1 3 2015 2015
2 2015Q1 2018Q4 1 4 2015 2018
3 2015Q1 2019Q2 1 2 2015 2019
4 2015Q1 2019Q3 1 3 2015 2019
5 2015Q1 2020Q1 1 1 2015 2020

CountQuarters
0 0
1 2
2 15
3 17
4 18
5 20

Trying to subtract a column of dates to another date

You can try Timestamp.floor for remove times and for convert timedeltas to days Series.dt.days:

df['datepond']= (pd.to_datetime('today').floor('d') - pd.to_datetime(df['dates'])).dt.days
print (df)
number dates coord datepond
AC 10 2018-07-10 11.54 97
AC 10 2018-07-11 11.19 96
AN 5 2018-07-12 69.40 95

Subtracting dates between columns with a condition to only subtract dates within the same year in Python

Python


Fake data:

import pandas as pd

data_1 = pd.DataFrame({
'SV_DATE': pd.to_datetime(['2015/03/05', '2015/03/10', '2016/01/01'])
})

data_2 = pd.DataFrame({
'Launch Date': pd.to_datetime(['2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01']),
'MFG': ['APPLE', 'WINDOWS', 'APPLE', 'WINDOWS']
})

print(data_1)

SV_DATE
0 2015-03-05
1 2015-03-10
2 2016-01-01

print(data_2)

Launch Date MFG
0 2015-03-05 APPLE
1 2015-12-01 WINDOWS
2 2016-01-01 APPLE
3 2017-01-01 WINDOWS

If I got it right, you can merge filter data_2 (only lines with MFG==APPLE), merge both dataframes by Year, calculate the difference between dates by Year, then verify if they are inside your desired range (0,30):

data_1 = data_1.assign(Year = data_1.SV_DATE.dt.year, Index = data_1.index)
data_2 = data_2.assign(Year = data_2['Launch Date'].dt.year).query('MFG=="APPLE"')

data = data_1.merge(data_2, on='Year')
data['Diff'] = data.groupby('Year')[['Launch Date','SV_DATE']].transform('diff', axis=1)['SV_DATE'].dt.days
data['in_target_range'] = data.Diff.between(0,30)

Output:

     SV_DATE  Year  Index Launch Date    MFG  Diff  in_target_range
0 2015-03-05 2015 0 2015-03-05 APPLE 0 True
1 2015-03-10 2015 1 2015-03-05 APPLE 5 True
2 2016-01-01 2016 2 2016-01-01 APPLE 0 True

With this output you can do whatever you wanna do, I suppose. Note that I kept the an Index column in order to retrieve those lines in data_1 if you'd like to.

R


A similar approach using R:

library(dplyr)

# Fake data
data_1 <- data.frame(SV_DATE = as.Date(c('2015/03/05', '2015/03/10', '2016/01/01')))

data_2 <- data.frame (
Launch_Date = as.Date(c('2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01')),
MFG = c('APPLE', 'WINDOWS', 'APPLE', 'WINDOWS')
)

# Merge and filters
data_2 <- data_2 %>%
mutate(Year = format(Launch_Date, "%Y")) %>%
filter(MFG=="APPLE")

data <- data_1 %>%
mutate(Year = format(SV_DATE, "%Y"), Index = 1:nrow(.)) %>%
inner_join(., mutate(data_2, Year=format(Launch_Date, "%Y")), by = "Year") %>%
group_by(Year) %>%
mutate(Diff = as.integer(SV_DATE - Launch_Date)) %>%
mutate(in_target_range = between(Diff, 0, 30))

which output is:

# A tibble: 3 x 7
# Groups: Year [2]
SV_DATE Year Index Launch_Date MFG Diff in_target_range
<date> <chr> <int> <date> <chr> <int> <lgl>
1 2015-03-05 2015 1 2015-03-05 APPLE 0 TRUE
2 2015-03-10 2015 2 2015-03-05 APPLE 5 TRUE
3 2016-01-01 2016 3 2016-01-01 APPLE 0 TRUE

I don't know what you really want with your launch.ind function, but it might be something like this (?):

low = 0
high = 3

data$AL030 <- data %>%
group_by(SV_DATE) %>%
summarise(launch.ind = sum(ifelse(between(Diff, low, high), 1, 0)), .groups='drop') %>%
mutate(launch.ind = ifelse(launch.ind > 0, 1, 0)) %>%
pull(launch.ind)

Notes


Although this code works for the fake data I provided, it might not work for you. In any case, I believe it provides some ways to achieve your goal by modifying it.

Also, note that I left in_target_range as boolean in both code chunks, but you can easily change it to integer with .astype(int) and as.integer(...) in Python and R, respectively.



Related Topics



Leave a reply



Submit