Select DataFrame rows between two dates
There are two possible solutions:
- Use a boolean mask, then use
df.loc[mask]
- Set the date column as a DatetimeIndex, then use
df[start_date : end_date]
Using a boolean mask:
Ensure df['date']
is a Series with dtype datetime64[ns]
:
df['date'] = pd.to_datetime(df['date'])
Make a boolean mask. start_date
and end_date
can be datetime.datetime
s,np.datetime64
s, pd.Timestamp
s, or even datetime strings:
#greater than the start date and smaller than the end date
mask = (df['date'] > start_date) & (df['date'] <= end_date)
Select the sub-DataFrame:
df.loc[mask]
or re-assign to df
df = df.loc[mask]
For example,
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])
yields
0 1 2 date
153 0.208875 0.727656 0.037787 2000-06-02
154 0.750800 0.776498 0.237716 2000-06-03
155 0.812008 0.127338 0.397240 2000-06-04
156 0.639937 0.207359 0.533527 2000-06-05
157 0.416998 0.845658 0.872826 2000-06-06
158 0.440069 0.338690 0.847545 2000-06-07
159 0.202354 0.624833 0.740254 2000-06-08
160 0.465746 0.080888 0.155452 2000-06-09
161 0.858232 0.190321 0.432574 2000-06-10
Using a DatetimeIndex:
If you are going to do a lot of selections by date, it may be quicker to set thedate
column as the index first. Then you can select rows by date usingdf.loc[start_date:end_date]
.
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
df = df.set_index(['date'])
print(df.loc['2000-6-1':'2000-6-10'])
yields
0 1 2
date
2000-06-01 0.040457 0.326594 0.492136 # <- includes start_date
2000-06-02 0.279323 0.877446 0.464523
2000-06-03 0.328068 0.837669 0.608559
2000-06-04 0.107959 0.678297 0.517435
2000-06-05 0.131555 0.418380 0.025725
2000-06-06 0.999961 0.619517 0.206108
2000-06-07 0.129270 0.024533 0.154769
2000-06-08 0.441010 0.741781 0.470402
2000-06-09 0.682101 0.375660 0.009916
2000-06-10 0.754488 0.352293 0.339337
While Python list indexing, e.g. seq[start:end]
includes start
but not end
, in contrast, Pandas df.loc[start_date : end_date]
includes both end-points in the result if they are in the index. Neither start_date
nor end_date
has to be in the index however.
Also note that pd.read_csv
has a parse_dates
parameter which you could use to parse the date
column as datetime64
s. Thus, if you use parse_dates
, you would not need to use df['date'] = pd.to_datetime(df['date'])
.
Pandas: Select all data from Pandas DataFrame between two dates
I used the following test DataFrame:
Transaction_date Value
0 2019-12-31 10
1 2020-01-10 10
2 2020-01-15 10
3 2020-01-20 10
4 2020-01-25 10
5 2020-01-28 10
6 2020-01-29 10
7 2020-01-30 10
8 2020-01-31 10
9 2020-02-01 10
10 2020-02-01 10
Start / end dates are:
start_day = '01.01.2020'
end_day = '31.01.2020'
The code is:
# Convert start / end dates to datetime
start_day = pd.to_datetime(start_day)
end_day = pd.to_datetime(end_day)
I noticed that you use datetime module, which I think is a bad practice.
To this end use dedicated pandasonic function (to_datetime).
It is even clever enough to recognize many of commonly used date formats,
so there is no need to specify it on your own.
To get the actual result, run:
df[df['Transaction_date'].between(start_day, end_day)]
The result is:
Transaction_date Value
1 2020-01-10 10
2 2020-01-15 10
3 2020-01-20 10
4 2020-01-25 10
5 2020-01-28 10
6 2020-01-29 10
7 2020-01-30 10
8 2020-01-31 10
So as you can see:
- The date from 2019 has been eliminated.
- The output contains only January dates, to the very end.
- Dates from the beginning of February have also been eliminated.
Repeat my example on your computer. Maybe the source of your problem
(that the result dous not contain all dates from January) is somewhere
else?
How to select pandas dataframe rows between two dates without knowing the exact time
You can try boolean masking:
df.loc[(df['TIME'].dt.date > start_date.date()) & (df['TIME'].dt.date< end_date,date())]
OR
You can also use boolean masking and between()
method:
df[df['TIME'].dt.date.between(start_date.date(),end_date.date())]
Select rows between two dates - recent 3 month period -
By following your code, I found that the data type of both 'start_date' and 'end_date' is Series (NOT Timestamp like df['week']). Check by:
type(df['week'][0]), type(start_date)
Then they cannot be compared.
You may try this code:id = start_date.index[0]
start_date = pd.to_datetime(start_date[id])
Note: "id" stores the index (based on your data, it is 199.)
Select Pandas dataframe rows between two dates
If your dataframes are not very big, you can simply do the join on a dummy key and then do filtering to narrow it down to what you need. See example below (note that I had to update your example a little bit to have correct date formatting)
import pandas as pd
rates = {'rate': [ 0.974, 0.966, 0.996, 0.998, 0.994, 1.006, 1.042, 1.072, 0.954],
'valid_from': ['31/12/2018','15/01/2019','01/02/2019','01/03/2019','01/04/2019','15/04/2019','01/05/2019','01/06/2019','30/06/2019'],
'valid_to': ['14/01/2019','31/01/2019','28/02/2019','31/03/2019','14/04/2019','30/04/2019','31/05/2019','29/06/2019','31/07/2019']}
df1 = pd.DataFrame(rates)
df1['valid_to'] = pd.to_datetime(df1['valid_to'],format ='%d/%m/%Y')
df1['valid_from'] = pd.to_datetime(df1['valid_from'],format='%d/%m/%Y')
Then you df1
would be
rate valid_from valid_to
0 0.974 2018-12-31 2019-01-14
1 0.966 2019-01-15 2019-01-31
2 0.996 2019-02-01 2019-02-28
3 0.998 2019-03-01 2019-03-31
4 0.994 2019-04-01 2019-04-14
5 1.006 2019-04-15 2019-04-30
6 1.042 2019-05-01 2019-05-31
7 1.072 2019-06-01 2019-06-29
8 0.954 2019-06-30 2019-07-31
This is your second data frame df2
data = {'date': ['03/01/2019','23/01/2019','27/02/2019','14/03/2019','05/04/2019','30/04/2019','14/06/2019'],
'amount': [200,305,155,67,95,174,236,]}
df2 = pd.DataFrame(data)
df2['date'] = pd.to_datetime(df2['date'],format ='%d/%m/%Y')
Then your df2
would look like the following
date amount
0 2019-01-03 200
1 2019-01-23 305
2 2019-02-27 155
3 2019-03-14 67
4 2019-04-05 95
5 2019-04-30 174
6 2019-06-14 236
Your solution:
df1['key'] = 1
df2['key'] = 1
df_output = pd.merge(df1, df2, on='key').drop('key',axis=1)
df_output = df_output[(df_output['date'] > df_output['valid_from']) & (df_output['date'] <= df_output['valid_to'])]
This is how would the result look like df_output
:
rate valid_from valid_to date amount
0 0.974 2018-12-31 2019-01-14 2019-01-03 200
8 0.966 2019-01-15 2019-01-31 2019-01-23 305
16 0.996 2019-02-01 2019-02-28 2019-02-27 155
24 0.998 2019-03-01 2019-03-31 2019-03-14 67
32 0.994 2019-04-01 2019-04-14 2019-04-05 95
40 1.006 2019-04-15 2019-04-30 2019-04-30 174
55 1.072 2019-06-01 2019-06-29 2019-06-14 236
How to subset a Pandas dataframe by date when the column also contains hours, minutes, seconds, millis, and timezone
You can check between
out = df[df['timestamp'].between(start_date,end_date)]
Out[219]:
foo timestamp
2 eggs 2022-04-18 06:38:28.928778887-04:00
How to select dataframe rows between two datetimes?
pd.date_range
i = pd.date_range('2018-01-01 09:00:00', periods=3, freq='1H')
# Result
DatetimeIndex(['2018-01-01 09:00:00', '2018-01-01 10:00:00',
'2018-01-01 11:00:00'],
dtype='datetime64[ns]', freq='H')
to_datetime
df['date'] = pd.to_datetime(df.date)
loc
df.loc[(df.date >= i[0]) & (df.date <= i[-1])]
date msft nok aapl ibm amzn
1 2018-01-01 09:00:00 112 1 143 130 1298
2 2018-01-01 10:00:00 109 10 185 137 1647
3 2018-01-01 11:00:00 98 11 146 105 1331
Create a row for each year between two dates
You can use a custom function to compute the range then explode the column:
# Ensure to have datetime
df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])
# Create the new column
date_range = lambda x: range(x['date1'].year, x['date2'].year+1)
df = df.assign(year=df.apply(date_range, axis=1)).explode('year', ignore_index=True)
Output:
>>> df
date1 date2 year
0 2018-01-01 2020-01-01 2018
1 2018-01-01 2020-01-01 2019
2 2018-01-01 2020-01-01 2020
Select rows between two DatetimeIndex dates
Using query
method:
df = pd.read_csv("my_file.csv", index_col=1, parse_dates=True)
In [121]: df.query("'2017-05-30' <= index <= '2017-06-01'")
Out[121]:
vm LoadInt1
time
2017-05-31 10:00:00 abc-webapp-02 3.133333
2017-05-31 10:05:00 abc-webapp-02 0.000000
2017-05-31 10:10:00 abc-webapp-02 0.000000
2017-05-31 10:15:00 abc-webapp-02 0.000000
2017-05-31 10:20:00 abc-webapp-02 0.000000
2017-05-31 10:25:00 abc-webapp-02 0.000000
2017-05-31 10:30:00 abc-webapp-02 0.000000
2017-05-31 10:35:00 abc-webapp-02 0.000000
2017-05-31 10:40:00 abc-webapp-02 0.000000
Related Topics
All Combinations of a List of Lists
Convert Hex String to Integer in Python
What Is a Good Way to Draw Images Using Pygame
How to Find the Cumulative Sum of Numbers in a List
List Comprehension Vs. Lambda + Filter
Pandas Get Rows Which Are Not in Other Dataframe
Create List of Single Item Repeated N Times
Numpy or Pandas: Keeping Array Type as Integer While Having a Nan Value
How to Read Large Text Files Line by Line, Without Loading It into Memory
Flask View Return Error "View Function Did Not Return a Response"
How to Call a Function Within a Class
Using @Property Versus Getters and Setters
How to Select a Variable by (String) Name
What Does the "At" (@) Symbol Do in Python
How to Resize an Image Using Pil and Maintain Its Aspect Ratio