Pandas: Subtracting two date columns and the result being an integer
How about:
df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days
This will return difference as int
if there are no missing values(NaT
) and float
if there is.
Pandas have a rich documentation on Time series / date functionality and Time deltas
How can I subtract two date time values in a Pandas Dataframe
Convert your columns to actual dates first:
df['ALARM_DATE'] = pd.to_datetime(df['ALARM_DATE'])
df['CONT_DATE'] = pd.to_datetime(df['CONT_DATE'])
Or:
df[['ALARM_DATE', 'CONT_DATE']] = df[['ALARM_DATE', 'CONT_DATE']].apply(pd.to_datetime)
Output:
>>> df['CONT_DATE'] - df['ALARM_DATE']
0 5 days
1 3 days
2 -162 days
3 1 days
4 5 days
dtype: timedelta64[ns]
Pandas - Python - how to subtract two different date columns
I think need subtract datetime
s, so is necessary convert date
in now
and in Created_Date
column, last for convert timedelta
s to days use dt.days
:
import datetime
now = datetime.date.today()
today = pd.Timestamp(now)
data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = today
data['Aging'] = data['Aging'].sub(data['Created_Date'], axis=0).dt.days
Solution should be simplify:
data['Created_Date'] = pd.to_datetime(data['Created_Date'])
data['Aging'] = data['Created_Date'].rsub(today, axis=0).dt.days
Subtract datetime column and get result in seconds pandas
Create datetimes first by to_datetime
and then convert timedeltas by Series.dt.total_seconds
:
df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds()
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 NaN
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14.0
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400.0
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410.0
For integers use integer na for integers with missing values:
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().astype('Int64')
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06 <NA>
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410
If need seconds instead floats
add custom function with Series.map
:
df['Datetime'] = pd.to_datetime(df['Date'].astype(str) +' '+df['Time'].astype(str))
f = lambda x: '' if pd.isna(x) else f'{int(x)} seconds'
df['diff'] = df.groupby('Name')['Datetime'].diff().dt.total_seconds().map(f)
print (df)
Name Date Time Datetime diff
0 A 02/20/2021 12:30:06 2021-02-20 12:30:06
1 A 02/20/2021 12:30:20 2021-02-20 12:30:20 14 seconds
2 A 02/21/2021 12:30:20 2021-02-21 12:30:20 86400 seconds
3 A 02/22/2021 02:30:30 2021-02-22 02:30:30 50410 seconds
Is there a way to subtract two columns containing Quarters and return the integer number of Quarters between them?
This might not be the most elegant way to do this, but you skip having to define dates ans so on. I made a df for just the problem:
dfq = pd.read_csv(r"C:/users/k_sego/quarter.csv",sep=";")
print(dfq)
which looks like this
Cohort EndQuarter
0 2015Q1 2015Q1
1 2015Q1 2015Q3
2 2015Q1 2018Q4
3 2015Q1 2019Q2
4 2015Q1 2019Q3
5 2015Q1 2020Q1
I extract the quarters from each date column into new column nd keep track of where they come from, as well as the year. Remember to convert to numeric.
dfq['CohortQ'] = dfq.Cohort.str.slice(5,6)
dfq['EndQuarterQ'] = dfq.EndQuarter.str.slice(5,6)
dfq['CohortYear'] = dfq.Cohort.str.slice(0,4)
dfq['EndQuarterYear'] = dfq.EndQuarter.str.slice(0,4)
cols = dfq.columns.drop(['Cohort','EndQuarter'])
dfq[cols] = dfq[cols].apply(pd.to_numeric, errors='coerce')
Now, the difference between years times 4 is the number of quarter, but to this you need to add the difference between the quarters at which the years were at.
dfq['CountQuarters'] = (dfq['EndQuarterYear']-dfq['CohortYear'])*4 +(dfq['EndQuarterQ']-dfq['CohortQ'])
which gives
Cohort EndQuarter CohortQ EndQuarterQ CohortYear EndQuarterYear \
0 2015Q1 2015Q1 1 1 2015 2015
1 2015Q1 2015Q3 1 3 2015 2015
2 2015Q1 2018Q4 1 4 2015 2018
3 2015Q1 2019Q2 1 2 2015 2019
4 2015Q1 2019Q3 1 3 2015 2019
5 2015Q1 2020Q1 1 1 2015 2020
CountQuarters
0 0
1 2
2 15
3 17
4 18
5 20
Trying to subtract a column of dates to another date
You can try Timestamp.floor
for remove times and for convert timedeltas to days Series.dt.days
:
df['datepond']= (pd.to_datetime('today').floor('d') - pd.to_datetime(df['dates'])).dt.days
print (df)
number dates coord datepond
AC 10 2018-07-10 11.54 97
AC 10 2018-07-11 11.19 96
AN 5 2018-07-12 69.40 95
Subtracting dates between columns with a condition to only subtract dates within the same year in Python
Python
Fake data:
import pandas as pd
data_1 = pd.DataFrame({
'SV_DATE': pd.to_datetime(['2015/03/05', '2015/03/10', '2016/01/01'])
})
data_2 = pd.DataFrame({
'Launch Date': pd.to_datetime(['2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01']),
'MFG': ['APPLE', 'WINDOWS', 'APPLE', 'WINDOWS']
})
print(data_1)
SV_DATE
0 2015-03-05
1 2015-03-10
2 2016-01-01
print(data_2)
Launch Date MFG
0 2015-03-05 APPLE
1 2015-12-01 WINDOWS
2 2016-01-01 APPLE
3 2017-01-01 WINDOWS
If I got it right, you can merge filter data_2 (only lines with MFG==APPLE
), merge both dataframes by Year, calculate the difference between dates by Year, then verify if they are inside your desired range (0,30)
:
data_1 = data_1.assign(Year = data_1.SV_DATE.dt.year, Index = data_1.index)
data_2 = data_2.assign(Year = data_2['Launch Date'].dt.year).query('MFG=="APPLE"')
data = data_1.merge(data_2, on='Year')
data['Diff'] = data.groupby('Year')[['Launch Date','SV_DATE']].transform('diff', axis=1)['SV_DATE'].dt.days
data['in_target_range'] = data.Diff.between(0,30)
Output:
SV_DATE Year Index Launch Date MFG Diff in_target_range
0 2015-03-05 2015 0 2015-03-05 APPLE 0 True
1 2015-03-10 2015 1 2015-03-05 APPLE 5 True
2 2016-01-01 2016 2 2016-01-01 APPLE 0 True
With this output you can do whatever you wanna do, I suppose. Note that I kept the an Index column in order to retrieve those lines in data_1
if you'd like to.
R
A similar approach using R:
library(dplyr)
# Fake data
data_1 <- data.frame(SV_DATE = as.Date(c('2015/03/05', '2015/03/10', '2016/01/01')))
data_2 <- data.frame (
Launch_Date = as.Date(c('2015/03/05', '2015/12/01', '2016/01/01', '2017/01/01')),
MFG = c('APPLE', 'WINDOWS', 'APPLE', 'WINDOWS')
)
# Merge and filters
data_2 <- data_2 %>%
mutate(Year = format(Launch_Date, "%Y")) %>%
filter(MFG=="APPLE")
data <- data_1 %>%
mutate(Year = format(SV_DATE, "%Y"), Index = 1:nrow(.)) %>%
inner_join(., mutate(data_2, Year=format(Launch_Date, "%Y")), by = "Year") %>%
group_by(Year) %>%
mutate(Diff = as.integer(SV_DATE - Launch_Date)) %>%
mutate(in_target_range = between(Diff, 0, 30))
which output is:
# A tibble: 3 x 7
# Groups: Year [2]
SV_DATE Year Index Launch_Date MFG Diff in_target_range
<date> <chr> <int> <date> <chr> <int> <lgl>
1 2015-03-05 2015 1 2015-03-05 APPLE 0 TRUE
2 2015-03-10 2015 2 2015-03-05 APPLE 5 TRUE
3 2016-01-01 2016 3 2016-01-01 APPLE 0 TRUE
I don't know what you really want with your launch.ind
function, but it might be something like this (?):
low = 0
high = 3
data$AL030 <- data %>%
group_by(SV_DATE) %>%
summarise(launch.ind = sum(ifelse(between(Diff, low, high), 1, 0)), .groups='drop') %>%
mutate(launch.ind = ifelse(launch.ind > 0, 1, 0)) %>%
pull(launch.ind)
Notes
Although this code works for the fake data I provided, it might not work for you. In any case, I believe it provides some ways to achieve your goal by modifying it.
Also, note that I left in_target_range
as boolean in both code chunks, but you can easily change it to integer with .astype(int)
and as.integer(...)
in Python and R, respectively.
Related Topics
Pandas Populate New Dataframe Column Based on Matching Columns in Another Dataframe
How to Continue a Loop After Catching Exception in Try ... Except
Pandas Convert from Datetime to Integer Timestamp
Finding the Index of the First Occurrence of Any Item in a List
How to Update a Label Inside While Loop in Tkinter
How-To Run Tensorflow on Multiple Core and Threads
How to Completely Remove Python from a Windows Machine
How to Clear Only Last One Line in Python Output Console
How to Use Variables in SQL Statement in Python
Bold Formatting in Python Console
Using Regex to Get the Value Between Two Characters (Python 3)
Find the Item With Maximum Occurrences in a List
Python Handling Socket.Error: [Errno 104] Connection Reset by Peer