Pandas: Subtracting Two Date Columns and the Result Being an Integer

Pandas: Subtracting two date columns and the result being an integer

How about:

df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days

This will return difference as int if there are no missing values(NaT) and float if there is.

Pandas have a rich documentation on Time series / date functionality and Time deltas

How can I subtract two date time values in a Pandas Dataframe

Convert your columns to actual dates first:

df['ALARM_DATE'] = pd.to_datetime(df['ALARM_DATE'])
df['CONT_DATE'] = pd.to_datetime(df['CONT_DATE'])

Or:

df[['ALARM_DATE', 'CONT_DATE']] = df[['ALARM_DATE', 'CONT_DATE']].apply(pd.to_datetime)

Output:

>>> df['CONT_DATE'] - df['ALARM_DATE']
0      5 days
1      3 days
2   -162 days
3      1 days
4      5 days
dtype: timedelta64[ns]

Pandas - Python - how to subtract two different date columns

I think need subtract datetimes, so is necessary convert date in now and in Created_Date column, last for convert timedeltas to days use dt.days:

import datetime
now = datetime.date.today()
today = pd.Timestamp(now)

data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = today
data['Aging'] = data['Aging'].sub(data['Created_Date'], axis=0).dt.days

Solution should be simplify:

data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = data['Created_Date'].rsub(today, axis=0).dt.days

Subtract datetime column and get result in seconds pandas

Create datetimes first by to_datetime and then convert timedeltas by Series.dt.total_seconds:

df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))

df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds()
print (df)
  Name        Date      Time            Datetime     diff
0    A  02/20/2021  12:30:06 2021-02-20 12:30:06      NaN
1    A  02/20/2021  12:30:20 2021-02-20 12:30:20     14.0
2    A  02/21/2021  12:30:20 2021-02-21 12:30:20  86400.0
3    A  02/22/2021  02:30:30 2021-02-22 02:30:30  50410.0

For integers use integer na for integers with missing values:

df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().astype('Int64')
print (df)
  Name        Date      Time            Datetime   diff
0    A  02/20/2021  12:30:06 2021-02-20 12:30:06   <NA>
1    A  02/20/2021  12:30:20 2021-02-20 12:30:20     14
2    A  02/21/2021  12:30:20 2021-02-21 12:30:20  86400
3    A  02/22/2021  02:30:30 2021-02-22 02:30:30  50410

If need seconds instead floats add custom function with Series.map:

df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))

f = lambda x: '' if pd.isna(x) else f'{int(x)} seconds'
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().map(f)
print (df)
  Name        Date      Time            Datetime           diff
0    A  02/20/2021  12:30:06 2021-02-20 12:30:06               
1    A  02/20/2021  12:30:20 2021-02-20 12:30:20     14 seconds
2    A  02/21/2021  12:30:20 2021-02-21 12:30:20  86400 seconds
3    A  02/22/2021  02:30:30 2021-02-22 02:30:30  50410 seconds

Is there a way to subtract two columns containing Quarters and return the integer number of Quarters between them?

This might not be the most elegant way to do this, but you skip having to define dates ans so on. I made a df for just the problem:

dfq = pd.read_csv(r"C:/users/k_sego/quarter.csv",sep=";")
print(dfq)

which looks like this

   Cohort EndQuarter
0  2015Q1     2015Q1
1  2015Q1     2015Q3
2  2015Q1     2018Q4
3  2015Q1     2019Q2
4  2015Q1     2019Q3
5  2015Q1     2020Q1

I extract the quarters from each date column into new column nd keep track of where they come from, as well as the year. Remember to convert to numeric.

dfq['CohortQ'] = dfq.Cohort.str.slice(5,6)
dfq['EndQuarterQ'] = dfq.EndQuarter.str.slice(5,6)
dfq['CohortYear'] = dfq.Cohort.str.slice(0,4)
dfq['EndQuarterYear'] = dfq.EndQuarter.str.slice(0,4)
cols = dfq.columns.drop(['Cohort','EndQuarter'])

dfq[cols] = dfq[cols].apply(pd.to_numeric, errors='coerce')

Now, the difference between years times 4 is the number of quarter, but to this you need to add the difference between the quarters at which the years were at.

dfq['CountQuarters'] = (dfq['EndQuarterYear']-dfq['CohortYear'])*4 +(dfq['EndQuarterQ']-dfq['CohortQ'])

which gives

   Cohort EndQuarter  CohortQ  EndQuarterQ  CohortYear  EndQuarterYear  \
0  2015Q1     2015Q1        1            1        2015            2015   
1  2015Q1     2015Q3        1            3        2015            2015   
2  2015Q1     2018Q4        1            4        2015            2018   
3  2015Q1     2019Q2        1            2        2015            2019   
4  2015Q1     2019Q3        1            3        2015            2019   
5  2015Q1     2020Q1        1            1        2015            2020   

   CountQuarters  
0              0  
1              2  
2             15  
3             17  
4             18  
5             20

Trying to subtract a column of dates to another date

You can try Timestamp.floor for remove times and for convert timedeltas to days Series.dt.days:

df['datepond']= (pd.to_datetime('today').floor('d') - pd.to_datetime(df['dates'])).dt.days
print (df)
    number       dates  coord  datepond
AC      10  2018-07-10  11.54        97
AC      10  2018-07-11  11.19        96
AN       5  2018-07-12  69.40        95

Subtracting dates between columns with a condition to only subtract dates within the same year in Python

Python

Fake data:

import pandas as pd

data_1 = pd.DataFrame({
    'SV_DATE': pd.to_datetime(['2015/03/05', '2015/03/10', '2016/01/01'])
})

data_2 = pd.DataFrame({
    'Launch Date': pd.to_datetime(['2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01']),
    'MFG': ['APPLE', 'WINDOWS', 'APPLE', 'WINDOWS']
})

print(data_1)

     SV_DATE
0 2015-03-05
1 2015-03-10
2 2016-01-01

print(data_2)

  Launch Date      MFG
0  2015-03-05    APPLE
1  2015-12-01  WINDOWS
2  2016-01-01    APPLE
3  2017-01-01  WINDOWS

If I got it right, you can merge filter data_2 (only lines with MFG==APPLE), merge both dataframes by Year, calculate the difference between dates by Year, then verify if they are inside your desired range (0,30):

data_1 = data_1.assign(Year = data_1.SV_DATE.dt.year, Index = data_1.index)
data_2 = data_2.assign(Year = data_2['Launch Date'].dt.year).query('MFG=="APPLE"')

data = data_1.merge(data_2, on='Year')
data['Diff'] = data.groupby('Year')[['Launch Date','SV_DATE']].transform('diff', axis=1)['SV_DATE'].dt.days
data['in_target_range'] = data.Diff.between(0,30)

Output:

     SV_DATE  Year  Index Launch Date    MFG  Diff  in_target_range
0 2015-03-05  2015      0  2015-03-05  APPLE     0             True
1 2015-03-10  2015      1  2015-03-05  APPLE     5             True
2 2016-01-01  2016      2  2016-01-01  APPLE     0             True

With this output you can do whatever you wanna do, I suppose. Note that I kept the an Index column in order to retrieve those lines in data_1 if you'd like to.

R

A similar approach using R:

library(dplyr)

# Fake data
data_1 <- data.frame(SV_DATE = as.Date(c('2015/03/05', '2015/03/10', '2016/01/01')))

data_2 <- data.frame (
  Launch_Date = as.Date(c('2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01')),
  MFG = c('APPLE', 'WINDOWS', 'APPLE', 'WINDOWS')
)

# Merge and filters
data_2 <- data_2 %>%
  mutate(Year = format(Launch_Date, "%Y")) %>%
  filter(MFG=="APPLE")

data <- data_1 %>% 
  mutate(Year = format(SV_DATE, "%Y"), Index = 1:nrow(.)) %>%
  inner_join(., mutate(data_2, Year=format(Launch_Date, "%Y")), by = "Year") %>%
  group_by(Year) %>%
  mutate(Diff = as.integer(SV_DATE - Launch_Date)) %>%
  mutate(in_target_range = between(Diff, 0, 30))

which output is:

# A tibble: 3 x 7
# Groups:   Year [2]
  SV_DATE    Year  Index Launch_Date MFG    Diff in_target_range
  <date>     <chr> <int> <date>      <chr> <int> <lgl>          
1 2015-03-05 2015      1 2015-03-05  APPLE     0 TRUE           
2 2015-03-10 2015      2 2015-03-05  APPLE     5 TRUE           
3 2016-01-01 2016      3 2016-01-01  APPLE     0 TRUE

I don't know what you really want with your launch.ind function, but it might be something like this (?):

low = 0
high = 3

data$AL030 <- data %>% 
  group_by(SV_DATE) %>%
  summarise(launch.ind = sum(ifelse(between(Diff, low, high), 1, 0)), .groups='drop') %>%
  mutate(launch.ind = ifelse(launch.ind > 0, 1, 0)) %>%
  pull(launch.ind)

Notes

Although this code works for the fake data I provided, it might not work for you. In any case, I believe it provides some ways to achieve your goal by modifying it.

Also, note that I left in_target_range as boolean in both code chunks, but you can easily change it to integer with .astype(int) and as.integer(...) in Python and R, respectively.

Pandas: Subtracting Two Date Columns and the Result Being an Integer