How to filter python pandas dataframe column by date
Let's start from the way how you read your DataFrame:
df = pd.DataFrame(pd.read_csv("Dates.csv"))
Note that:
pd.read_csv
already returns a DataFrame,- so there is no need to create another DataFrame from the first one.
A simpler approach is: df = pd.read_csv("Dates.csv")
.
But this is not all. If you have a column containing a date then convert it
to datetime type as early as when your read the DateFrame, so, assuming that
your file contains only Met By and Date columns (no index column),
the proper formula to read is:
df = pd.read_csv("Dates.csv", parse_dates=[1])
And now how to filter your DataFrame:
The first hint is not to use datetime module, as Pandas has its native
today and Timedelta functions.
As Date column is now of proper (datetime) type, you don't need any conversions.
Just use:
df[df.Date > pd.Timestamp.today() - pd.Timedelta('30D')]
If you have also future dates and want to filter them out, run:
df[df.Date > (pd.Timestamp.today() - pd.Timedelta('30D'))
and df.Date < pd.Timestamp.today()]
How to filter by date in column name in pandas
The deal is
my_df.columns = my_df.columns[:1].tolist() + pd.to_datetime(my_df.columns[1:]).tolist()
This statement is casting the column header as a str. Use the following to set index the 'Cat' column first then pd.to_datetime, to keep the date columns as datetime dtype.
And, also my_df.columns
my_dict = {'Cat': {0: 'A', 1: 'B', 2: 'C'}, 'Fri,01/01/21': {0: 181.0, 1: 359.0, 2: 162.0}, 'Sat,01/02/21': {0: 519.0, 1: 379.0, 2: 419.0}}
my_df = pd.DataFrame(my_dict)
my_df = my_df.set_index('Cat')
my_df.columns = pd.to_datetime(my_df.columns)
test = my_df.loc[:, my_df.columns <= datetime(2021, 1, 1)]
Output:
2021-01-01
Cat
A 181.0
B 359.0
C 162.0
Pandas, filter column for > date
As your column AssessedDate
and constant assessdateprev
are of string type rather than datetime type, your existing code actually filters by string comparison and gave the wrong result.
It is because the string 8/6/2021 8:47:07 AM
when compared with the other string 8/31/2021 00:00
, the string comparison result will be 8/6/2021 8:47:07 AM
> 8/31/2021 00:00
since when compared character by character, '6' on the left is larger than '3' on the right.
To solve the problem, you have to convert both the column and the date string constant to datetime format before comparison:
You can use pd.to_datetime()
with supplying the correct format string in the format=
parameter:
- Use
pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p')
in place ofnetnewprocess['AssessedDate']
, and - Use
pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M')
in place ofassessdateprev
to change your code to:
netnewprocess = netnewprocess[(pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p') > pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M'))]
You may find the codes above also works without supplying the format strings. However, there's 2 advantages doing this: (1) avoid ambiguity whether 8/6/2021
is Aug 6 or Jun 8; (2) possibly speed up the datetime format conversion by saving internal processing time at inferring the actual date format.
Result:
(replaced your sample data with all dates in year 2021, instead of the first half in 2020)
print(netnewprocess)
AssessedDate
9 9/15/2021 1:40:27 PM
136 9/14/2021 4:07:19 PM
146 9/21/2021 4:28:59 PM
185 9/18/2021 2:20:15 PM
200 9/8/2021 9:59:22 AM
Or, better still, if you are fine to change the format of the column and the date string constant to datetime format, you can use:
# convert to datetime first
netnewprocess['AssessedDate'] = pd.to_datetime(netnewprocess['AssessedDate'], format='%m/%d/%Y %I:%M:%S %p')
assessdateprev = pd.to_datetime('8/31/2021 00:00', format='%m/%d/%Y %H:%M')
# Then, you can use your code
netnewprocess = netnewprocess[(netnewprocess['AssessedDate'] > assessdateprev)]
Pandas filter df by date range and condition
IIUC, you want something like this:
#convert the date columns to datetime
df["HireStart"] = pd.to_datetime(df["HireStart"])
df["DCompleteDate"] = pd.to_datetime(df["DCompleteDate"])
df["OffHire"] = pd.to_datetime(df["OffHire"])
#convert inputs to datetime
start_date = pd.to_datetime(start_date, format="%d/%m/%Y")
end_date = pd.to_datetime(end_date, format="%d/%m/%Y")
#select the required rows
output = df[df["HireStart"].le(end_date)&df["DCompleteDate"].fillna(start_date).ge(start_date)]
How do I filter by a certain date and hour using Pandas dataframe in python
You need .dt
accessor with ()
for second and third condition:
newData = data[(data.Datetime.dt.day == data.Datetime.dt.day.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]
For converting to days only once:
s = data.Datetime.dt.day
newData = data[(s == s.max()) &
(data.Datetime.dt.hour == 9) &
(data.Datetime.dt.minute == 30)]
Related Topics
Python: How to Match Nested Parentheses With Regex
Compare a Column Between 2 CSV Files and Write Differences Using Python
How to Make a Discord Bot Leave a Server from a Command in Another Server
Grab a Number After a String in a File
How to Change the Title Bar in Tkinter
Python Executable Not Finding Libpython Shared Library
Finding the Value of the Min and Max Pixel
Overlay a Smaller Image on a Larger Image Python Opencv
Possible to Get User Input Without Inserting a New Line
How to Download Outlook Attachment from Python Script
Remove Very First Row in Pandas
How to Write Multiple Images (Subplots) into One Image
How to Append Data Using Openpyxl Python to Excel File from a Specified Row
How to Read a List of Parquet Files from S3 as a Pandas Dataframe Using Pyarrow
Pandas Dataframe Check If Column Value Exists in a Group of Columns