Select DataFrame rows between two dates
There are two possible solutions:
- Use a boolean mask, then use
df.loc[mask]
- Set the date column as a DatetimeIndex, then use
df[start_date : end_date]
Using a boolean mask:
Ensure df['date']
is a Series with dtype datetime64[ns]
:
df['date'] = pd.to_datetime(df['date'])
Make a boolean mask. start_date
and end_date
can be datetime.datetime
s,np.datetime64
s, pd.Timestamp
s, or even datetime strings:
#greater than the start date and smaller than the end date
mask = (df['date'] > start_date) & (df['date'] <= end_date)
Select the sub-DataFrame:
df.loc[mask]
or re-assign to df
df = df.loc[mask]
For example,
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])
yields
0 1 2 date
153 0.208875 0.727656 0.037787 2000-06-02
154 0.750800 0.776498 0.237716 2000-06-03
155 0.812008 0.127338 0.397240 2000-06-04
156 0.639937 0.207359 0.533527 2000-06-05
157 0.416998 0.845658 0.872826 2000-06-06
158 0.440069 0.338690 0.847545 2000-06-07
159 0.202354 0.624833 0.740254 2000-06-08
160 0.465746 0.080888 0.155452 2000-06-09
161 0.858232 0.190321 0.432574 2000-06-10
Using a DatetimeIndex:
If you are going to do a lot of selections by date, it may be quicker to set thedate
column as the index first. Then you can select rows by date usingdf.loc[start_date:end_date]
.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2000-6-1':'2000-6-10'])
yields
0 1 2
date
2000-06-01 0.040457 0.326594 0.492136 # <- includes start_date
2000-06-02 0.279323 0.877446 0.464523
2000-06-03 0.328068 0.837669 0.608559
2000-06-04 0.107959 0.678297 0.517435
2000-06-05 0.131555 0.418380 0.025725
2000-06-06 0.999961 0.619517 0.206108
2000-06-07 0.129270 0.024533 0.154769
2000-06-08 0.441010 0.741781 0.470402
2000-06-09 0.682101 0.375660 0.009916
2000-06-10 0.754488 0.352293 0.339337
While Python list indexing, e.g. seq[start:end]
includes start
but not end
, in contrast, Pandas df.loc[start_date : end_date]
includes both end-points in the result if they are in the index. Neither start_date
nor end_date
has to be in the index however.
Also note that pd.read_csv
has a parse_dates
parameter which you could use to parse the date
column as datetime64
s. Thus, if you use parse_dates
, you would not need to use df['date'] = pd.to_datetime(df['date'])
.
Pandas: Select all data from Pandas DataFrame between two dates
I used the following test DataFrame:
Transaction_date Value
0 2019-12-31 10
1 2020-01-10 10
2 2020-01-15 10
3 2020-01-20 10
4 2020-01-25 10
5 2020-01-28 10
6 2020-01-29 10
7 2020-01-30 10
8 2020-01-31 10
9 2020-02-01 10
10 2020-02-01 10
Start / end dates are:
start_day = '01.01.2020'
end_day = '31.01.2020'
The code is:
# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)
I noticed that you use datetime module, which I think is a bad practice.
To this end use dedicated pandasonic function (to_datetime).
It is even clever enough to recognize many of commonly used date formats,
so there is no need to specify it on your own.
To get the actual result, run:
df[df['Transaction_date'].between(start_day, end_day)]
The result is:
Transaction_date Value
1 2020-01-10 10
2 2020-01-15 10
3 2020-01-20 10
4 2020-01-25 10
5 2020-01-28 10
6 2020-01-29 10
7 2020-01-30 10
8 2020-01-31 10
So as you can see:
- The date from 2019 has been eliminated.
- The output contains only January dates, to the very end.
- Dates from the beginning of February have also been eliminated.
Repeat my example on your computer. Maybe the source of your problem
(that the result dous not contain all dates from January) is somewhere
else?
Extract a subset given two dates from a python dataframe with timezone date format
pd.to_datetime
with utc=True
You can pass the optional parameter utc=True
to pd.to_datetime
function in order to convert the timezone-aware inputs to UTC. Then you should be able to compare the date column with start_date
and end_date
in order to subset the df
Here is the step by step example,
print(df)
ID date
0 1 2022-02-03 22:01:12+01:00
1 2 2022-02-04 21:11:21+01:00
2 3 2022-02-05 11:11:21+01:00
3 4 2022-02-07 23:01:12+01:00
4 5 2022-02-07 14:31:14+02:00
5 6 2022-02-08 18:12:01+02:00
6 7 2022-02-09 20:21:02+02:00
7 8 2022-02-11 15:41:25+02:00
8 9 2022-02-15 11:21:27+02:00
# Convert to UTC
df['date'] = pd.to_datetime(df['date'], utc=True)
print(df)
ID date
0 1 2022-02-03 21:01:12+00:00
1 2 2022-02-04 20:11:21+00:00
2 3 2022-02-05 10:11:21+00:00
3 4 2022-02-07 22:01:12+00:00
4 5 2022-02-07 12:31:14+00:00
5 6 2022-02-08 16:12:01+00:00
6 7 2022-02-09 18:21:02+00:00
7 8 2022-02-11 13:41:25+00:00
8 9 2022-02-15 09:21:27+00:00
# Filter the rows with boolean indexing
subset = df[df['date'].between('2022-02-03 21:01:12', '2022-02-07 22:01:11')]
print(subset)
ID date
0 1 2022-02-03 21:01:12+00:00
1 2 2022-02-04 20:11:21+00:00
2 3 2022-02-05 10:11:21+00:00
4 5 2022-02-07 12:31:14+00:00
How to select dataframe rows between two datetimes?
pd.date_range
i = pd.date_range('2018-01-01 09:00:00', periods=3, freq='1H')
# Result
DatetimeIndex(['2018-01-01 09:00:00', '2018-01-01 10:00:00',
'2018-01-01 11:00:00'],
dtype='datetime64[ns]', freq='H')
to_datetime
df['date'] = pd.to_datetime(df.date)
loc
df.loc[(df.date >= i[0]) & (df.date <= i[-1])]
date msft nok aapl ibm amzn
1 2018-01-01 09:00:00 112 1 143 130 1298
2 2018-01-01 10:00:00 109 10 185 137 1647
3 2018-01-01 11:00:00 98 11 146 105 1331
How to subset a Pandas dataframe by date when the column also contains hours, minutes, seconds, millis, and timezone
You can check between
out = df[df['timestamp'].between(start_date,end_date)]
Out[219]:
foo timestamp
2 eggs 2022-04-18 06:38:28.928778887-04:00
Select Data frame between two dates of a date column
First you have to convert your starting date and final date into a datetime format. Then you can apply multiple conditions inside df.loc. Do not forget to reassign your modifications to your df :
import pandas as pd
from datetime import datetime
df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')
date1 = datetime.strptime('2013-03-23', '%Y-%m-%d')
date2 = datetime.strptime('2013-03-25', '%Y-%m-%d')
df = df.loc[(df['date']>date1) & (df['date']<date2)]
Pandas: Selecting DataFrame rows between two dates (Datetime Index)
Option 1:
Sample df:
df
a
2015-07-16 07:14:41 12
2015-07-16 07:14:48 34
2015-07-16 07:14:54 65
2015-07-16 07:15:01 34
2015-07-16 07:15:07 23
2015-07-16 07:15:14 1
It looks like you're trying this without .loc
(won't work without it):
df.loc['2015-07-16 07:00:00':'2015-07-16 23:00:00']
a
2015-07-16 07:14:41 12
2015-07-16 07:14:48 34
2015-07-16 07:14:54 65
2015-07-16 07:15:01 34
2015-07-16 07:15:07 23
2015-07-16 07:15:14 1
Option 2:
You can use boolean indexing on the index:
df[(df.index.get_level_values(0) >= '2015-07-16 07:00:00') & (df.index.get_level_values(0) <= '2015-07-16 23:00:00')]
R - How to subset a table between two specific dates?
You need to convert the date
column in the file to date
class. For example:
LValley <- read.table("LValley.txt", header=TRUE,dec=",", sep="\t", stringsAsFactors=FALSE)
date1 <- as.Date(LValley$date, "%d.%m.%Y %H:%M")
Test2007 <- subset(LValley, date1>=DATE1 & date1 <=DATE2)
dim(Test2007)
#[1] 6249 4
Related Topics
Reshape Multiple Values At Once
Replace Na in Column With Value in Adjacent Column
R: Gsub, Pattern = Vector and Replacement = Vector
Dplyr Filter: Get Rows With Minimum of Variable, But Only the First If Multiple Minima
Filtering Out Duplicated/Non-Unique Rows in Data.Table
How to Read a CSV File in R With Different Number of Columns
Trimming a Huge (3.5 Gb) CSV File to Read into R
What Does the Dplyr Period Character "." Reference
Creating a Comma Separated Vector
How to Assign Values to Dynamic Names Variables
How to Install a Package That Has Been Archived from Cran
Difference Between the == and %In% Operators in R
Creating Arbitrary Panes in Ggplot2