Filtering Date Column in Python

How to filter python pandas dataframe column by date

Let's start from the way how you read your DataFrame:

df = pd.DataFrame(pd.read_csv("Dates.csv"))

Note that:

  • pd.read_csv already returns a DataFrame,
  • so there is no need to create another DataFrame from the first one.

A simpler approach is: df = pd.read_csv("Dates.csv").

But this is not all. If you have a column containing a date then convert it
to datetime type as early as when your read the DateFrame, so, assuming that
your file contains only Met By and Date columns (no index column),
the proper formula to read is:

df = pd.read_csv("Dates.csv", parse_dates=[1])

And now how to filter your DataFrame:

The first hint is not to use datetime module, as Pandas has its native
today and Timedelta functions.
As Date column is now of proper (datetime) type, you don't need any conversions.
Just use:

df[df.Date > pd.Timestamp.today() - pd.Timedelta('30D')]

If you have also future dates and want to filter them out, run:

df[df.Date > (pd.Timestamp.today() - pd.Timedelta('30D'))
and df.Date < pd.Timestamp.today()]

How to filter by date in column name in pandas

The deal is

my_df.columns = my_df.columns[:1].tolist() + pd.to_datetime(my_df.columns[1:]).tolist()

This statement is casting the column header as a str. Use the following to set index the 'Cat' column first then pd.to_datetime, to keep the date columns as datetime dtype.

And, also my_df.columns

my_dict = {'Cat': {0: 'A', 1: 'B', 2: 'C'}, 'Fri,01/01/21': {0: 181.0, 1: 359.0, 2: 162.0}, 'Sat,01/02/21': {0: 519.0, 1: 379.0, 2: 419.0}}
my_df = pd.DataFrame(my_dict)
my_df = my_df.set_index('Cat')
my_df.columns = pd.to_datetime(my_df.columns)
test = my_df.loc[:, my_df.columns <= datetime(2021, 1, 1)]

Output:

     2021-01-01
Cat
A 181.0
B 359.0
C 162.0

Pandas, filter column for > date

As your column AssessedDate and constant assessdateprev are of string type rather than datetime type, your existing code actually filters by string comparison and gave the wrong result.

It is because the string 8/6/2021 8:47:07 AM when compared with the other string 8/31/2021 00:00, the string comparison result will be 8/6/2021 8:47:07 AM > 8/31/2021 00:00 since when compared character by character, '6' on the left is larger than '3' on the right.

To solve the problem, you have to convert both the column and the date string constant to datetime format before comparison:

You can use pd.to_datetime() with supplying the correct format string in the format= parameter:

  1. Use pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p') in place of netnewprocess['AssessedDate'], and
  2. Use pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M') in place of assessdateprev

to change your code to:

netnewprocess = netnewprocess[(pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p') > pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M'))]

You may find the codes above also works without supplying the format strings. However, there's 2 advantages doing this: (1) avoid ambiguity whether 8/6/2021 is Aug 6 or Jun 8; (2) possibly speed up the datetime format conversion by saving internal processing time at inferring the actual date format.

Result:

(replaced your sample data with all dates in year 2021, instead of the first half in 2020)

print(netnewprocess)

AssessedDate
9 9/15/2021 1:40:27 PM
136 9/14/2021 4:07:19 PM
146 9/21/2021 4:28:59 PM
185 9/18/2021 2:20:15 PM
200 9/8/2021 9:59:22 AM

Or, better still, if you are fine to change the format of the column and the date string constant to datetime format, you can use:

# convert to datetime first
netnewprocess['AssessedDate'] = pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p')
assessdateprev = pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M')

# Then, you can use your code
netnewprocess = netnewprocess[(netnewprocess['AssessedDate'] > assessdateprev)]

Pandas filter df by date range and condition

IIUC, you want something like this:

#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])

#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")

#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]

How do I filter by a certain date and hour using Pandas dataframe in python

You need .dt accessor with () for second and third condition:

newData = data[(data.Datetime.dt.day == data.Datetime.dt.day.max()) & 
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]

For converting to days only once:

s = data.Datetime.dt.day
newData = data[(s == s.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]


Related Topics



Leave a reply



Submit