Subset a Dataframe Between 2 Dates

Select DataFrame rows between two dates

There are two possible solutions:

Use a boolean mask, then use df.loc[mask]
Set the date column as a DatetimeIndex, then use df[start_date : end_date]

Using a boolean mask:

Ensure df['date'] is a Series with dtype datetime64[ns]:

df['date'] = pd.to_datetime(df['date'])

Make a boolean mask. start_date and end_date can be datetime.datetimes,
np.datetime64s, pd.Timestamps, or even datetime strings:

#greater than the start date and smaller than the end date
mask = (df['date'] > start_date) & (df['date'] <= end_date)

Select the sub-DataFrame:

df.loc[mask]

or re-assign to df

df = df.loc[mask]

For example,

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])

yields

            0         1         2       date
153  0.208875  0.727656  0.037787 2000-06-02
154  0.750800  0.776498  0.237716 2000-06-03
155  0.812008  0.127338  0.397240 2000-06-04
156  0.639937  0.207359  0.533527 2000-06-05
157  0.416998  0.845658  0.872826 2000-06-06
158  0.440069  0.338690  0.847545 2000-06-07
159  0.202354  0.624833  0.740254 2000-06-08
160  0.465746  0.080888  0.155452 2000-06-09
161  0.858232  0.190321  0.432574 2000-06-10

Using a DatetimeIndex:

If you are going to do a lot of selections by date, it may be quicker to set the
date column as the index first. Then you can select rows by date using
df.loc[start_date:end_date].

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2000-6-1':'2000-6-10'])

yields

                   0         1         2
date                                    
2000-06-01  0.040457  0.326594  0.492136    # <- includes start_date
2000-06-02  0.279323  0.877446  0.464523
2000-06-03  0.328068  0.837669  0.608559
2000-06-04  0.107959  0.678297  0.517435
2000-06-05  0.131555  0.418380  0.025725
2000-06-06  0.999961  0.619517  0.206108
2000-06-07  0.129270  0.024533  0.154769
2000-06-08  0.441010  0.741781  0.470402
2000-06-09  0.682101  0.375660  0.009916
2000-06-10  0.754488  0.352293  0.339337

While Python list indexing, e.g. seq[start:end] includes start but not end, in contrast, Pandas df.loc[start_date : end_date] includes both end-points in the result if they are in the index. Neither start_date nor end_date has to be in the index however.

Also note that pd.read_csv has a parse_dates parameter which you could use to parse the date column as datetime64s. Thus, if you use parse_dates, you would not need to use df['date'] = pd.to_datetime(df['date']).

Pandas: Select all data from Pandas DataFrame between two dates

I used the following test DataFrame:

   Transaction_date  Value
0        2019-12-31     10
1        2020-01-10     10
2        2020-01-15     10
3        2020-01-20     10
4        2020-01-25     10
5        2020-01-28     10
6        2020-01-29     10
7        2020-01-30     10
8        2020-01-31     10
9        2020-02-01     10
10       2020-02-01     10

Start / end dates are:

start_day = '01.01.2020'
end_day = '31.01.2020'

The code is:

# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)

I noticed that you use datetime module, which I think is a bad practice.
To this end use dedicated pandasonic function (to_datetime).
It is even clever enough to recognize many of commonly used date formats,
so there is no need to specify it on your own.

To get the actual result, run:

df[df['Transaction_date'].between(start_day, end_day)]

The result is:

  Transaction_date  Value
1       2020-01-10     10
2       2020-01-15     10
3       2020-01-20     10
4       2020-01-25     10
5       2020-01-28     10
6       2020-01-29     10
7       2020-01-30     10
8       2020-01-31     10

So as you can see:

The date from 2019 has been eliminated.
The output contains only January dates, to the very end.
Dates from the beginning of February have also been eliminated.

Repeat my example on your computer. Maybe the source of your problem
(that the result dous not contain all dates from January) is somewhere
else?

Extract a subset given two dates from a python dataframe with timezone date format

`pd.to_datetime` with `utc=True`

You can pass the optional parameter utc=True to pd.to_datetime function in order to convert the timezone-aware inputs to UTC. Then you should be able to compare the date column with start_date and end_date in order to subset the df

Here is the step by step example,

print(df)

   ID                       date
0   1  2022-02-03 22:01:12+01:00
1   2  2022-02-04 21:11:21+01:00
2   3  2022-02-05 11:11:21+01:00
3   4  2022-02-07 23:01:12+01:00
4   5  2022-02-07 14:31:14+02:00
5   6  2022-02-08 18:12:01+02:00
6   7  2022-02-09 20:21:02+02:00
7   8  2022-02-11 15:41:25+02:00
8   9  2022-02-15 11:21:27+02:00

# Convert to UTC
df['date'] = pd.to_datetime(df['date'], utc=True)
print(df)

   ID                      date
0   1 2022-02-03 21:01:12+00:00
1   2 2022-02-04 20:11:21+00:00
2   3 2022-02-05 10:11:21+00:00
3   4 2022-02-07 22:01:12+00:00
4   5 2022-02-07 12:31:14+00:00
5   6 2022-02-08 16:12:01+00:00
6   7 2022-02-09 18:21:02+00:00
7   8 2022-02-11 13:41:25+00:00
8   9 2022-02-15 09:21:27+00:00

# Filter the rows with boolean indexing
subset = df[df['date'].between('2022-02-03 21:01:12', '2022-02-07 22:01:11')]
print(subset)

   ID                      date
0   1 2022-02-03 21:01:12+00:00
1   2 2022-02-04 20:11:21+00:00
2   3 2022-02-05 10:11:21+00:00
4   5 2022-02-07 12:31:14+00:00

How to select dataframe rows between two datetimes?

pd.date_range

i = pd.date_range('2018-01-01 09:00:00', periods=3, freq='1H')

# Result
DatetimeIndex(['2018-01-01 09:00:00', '2018-01-01 10:00:00',
           '2018-01-01 11:00:00'],
          dtype='datetime64[ns]', freq='H')

to_datetime

df['date'] = pd.to_datetime(df.date)

loc

df.loc[(df.date >= i[0]) & (df.date <= i[-1])]

                 date  msft  nok  aapl  ibm  amzn
1 2018-01-01 09:00:00   112    1   143  130  1298
2 2018-01-01 10:00:00   109   10   185  137  1647
3 2018-01-01 11:00:00    98   11   146  105  1331

How to subset a Pandas dataframe by date when the column also contains hours, minutes, seconds, millis, and timezone

You can check between

out = df[df['timestamp'].between(start_date,end_date)]
Out[219]: 
    foo                           timestamp
2  eggs 2022-04-18 06:38:28.928778887-04:00

Select Data frame between two dates of a date column

First you have to convert your starting date and final date into a datetime format. Then you can apply multiple conditions inside df.loc. Do not forget to reassign your modifications to your df :

import pandas as pd
from datetime import datetime

df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')

date1 = datetime.strptime('2013-03-23', '%Y-%m-%d')
date2 = datetime.strptime('2013-03-25', '%Y-%m-%d')

df = df.loc[(df['date']>date1) & (df['date']<date2)]

Pandas: Selecting DataFrame rows between two dates (Datetime Index)

Option 1:

Sample df:

df
                      a
2015-07-16 07:14:41  12
2015-07-16 07:14:48  34
2015-07-16 07:14:54  65
2015-07-16 07:15:01  34
2015-07-16 07:15:07  23
2015-07-16 07:15:14   1

It looks like you're trying this without .loc (won't work without it):

df.loc['2015-07-16 07:00:00':'2015-07-16 23:00:00']
                      a
2015-07-16 07:14:41  12
2015-07-16 07:14:48  34
2015-07-16 07:14:54  65
2015-07-16 07:15:01  34
2015-07-16 07:15:07  23
2015-07-16 07:15:14   1

Option 2:

You can use boolean indexing on the index:

df[(df.index.get_level_values(0) >= '2015-07-16 07:00:00') & (df.index.get_level_values(0) <= '2015-07-16 23:00:00')]

R - How to subset a table between two specific dates?

You need to convert the date column in the file to date class. For example:

 LValley <- read.table("LValley.txt", header=TRUE,dec=",", sep="\t", stringsAsFactors=FALSE)

  date1 <- as.Date(LValley$date, "%d.%m.%Y %H:%M")
  Test2007 <- subset(LValley, date1>=DATE1 & date1 <=DATE2)
  dim(Test2007)
   #[1] 6249    4

Subset a Dataframe Between 2 Dates

Select DataFrame rows between two dates

Pandas: Select all data from Pandas DataFrame between two dates

Extract a subset given two dates from a python dataframe with timezone date format

`pd.to_datetime` with `utc=True`

How to select dataframe rows between two datetimes?

How to subset a Pandas dataframe by date when the column also contains hours, minutes, seconds, millis, and timezone

Select Data frame between two dates of a date column

Pandas: Selecting DataFrame rows between two dates (Datetime Index)

R - How to subset a table between two specific dates?

Related Topics

Leave a reply

Select DataFrame rows between two dates

Pandas: Select all data from Pandas DataFrame between two dates

Extract a subset given two dates from a python dataframe with timezone date format

pd.to_datetime with utc=True

How to select dataframe rows between two datetimes?

How to subset a Pandas dataframe by date when the column also contains hours, minutes, seconds, millis, and timezone

Select Data frame between two dates of a date column

Pandas: Selecting DataFrame rows between two dates (Datetime Index)

R - How to subset a table between two specific dates?

Related Topics

Leave a reply

`pd.to_datetime` with `utc=True`