Can pandas automatically read dates from a CSV file?
You should add parse_dates=True
, or parse_dates=['column name']
when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.
Suppose you have a column 'datetime' with your string, then:
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)
This way you can even combine multiple columns into a single datetime column, this merges a 'date' and a 'time' column into a single 'datetime' column:
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)
You can find directives (i.e. the letters to be used for different formats) for strptime
and strftime
in this page.
Reading CSV dates with pandas returns datetime instead of Timestamp
You can specify which date_parser
function to be used:
data = pd.read_csv('temp.csv',
parse_dates = ["Local time"],
date_parser=pd.Timestamp)
Output:
>>> data
Local time Open High Low Close Volume
0 2014-02-03 02:00:00-02:00 1.37620 1.37882 1.37586 1.37745 5616.0400
1 2014-03-03 02:00:00-03:00 1.37745 1.37928 1.37264 1.37357 136554.6563
2 2014-04-03 02:00:00-02:00 1.37356 1.37820 1.37211 1.37421 124863.8203
>>> type(data['Local time'][0])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
By my observation pandas automatically parses each entry as datetime when timezone are different for individual observation.
The above should work if you really need to use pd.Timestamp
.
Running the above however also gives me a FutureWarning, which I researched and found to be harmless as of now.
EDIT
After a bit more research:
pandas tries to convert a date type column to DatetimeIndex
for more efficiency in datetime based operations.
But for this pandas needs to have a common timezone for the entire column.
On explicitly trying to convert to pd.DatetimeIndex
>>> data
Local time Open High Low Close Volume
0 2014-02-03 02:00:00-02:00 1.37620 1.37882 1.37586 1.37745 5616.0400
1 2014-03-03 02:00:00-03:00 1.37745 1.37928 1.37264 1.37357 136554.6563
2 2014-04-03 02:00:00-04:00 1.37356 1.37820 1.37211 1.37421 124863.8203
>>> pd.DatetimeIndex(data['Local time'])
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
So when converting to DatetimeIndex
fails pandas then keeps the data as strings (dtype : object) internally and individual entries to be processed as datetime
.
Documentation recommends that if timezones in the data are different specify UTC=True, so the timezone would be set as UTC and time values would be changed accordingly.
From Documentation:
pandas cannot natively represent a column or index with mixed timezones. If your CSV file contains columns with a mixture of timezones, the default result will be an object-dtype column with strings, even with parse_dates.
To parse the mixed-timezone values as a datetime column, pass a partially-applied to_datetime() with utc=True
In a data that already has the same timezone DatetimeIndex works seamlessly:
>>> data
Local time Open High Low Close Volume
0 2014-02-03 02:00:00-02:00 1.37620 1.37882 1.37586 1.37745 5616.0400
1 2014-03-03 02:00:00-02:00 1.37745 1.37928 1.37264 1.37357 136554.6563
2 2014-04-03 02:00:00-02:00 1.37356 1.37820 1.37211 1.37421 124863.8203
>>> pd.DatetimeIndex(data['Local time'])
DatetimeIndex(['2014-02-03 02:00:00-02:00', '2014-03-03 02:00:00-02:00',
'2014-04-03 02:00:00-02:00'],
dtype='datetime64[ns, pytz.FixedOffset(-120)]', name='Local time', freq=None)
>>> type(pd.DatetimeIndex(data['Local time'])[0])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
References:
- https://pandas.pydata.org/docs/user_guide/io.html#io-csv-mixed-timezones
- https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.html
- https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#parse_dates
Parsing date in pandas.read_csv
Just specify a list of columns that should be convert to dates in the parse_dates=
of pd.read_csv
:
>>> df = pd.read_csv('file.csv', parse_dates=['date'])
>>> df
date a b c d
0 2021-12-30 1.1 1.2 1.3 1
>>> df.dtypes
date datetime64[ns]
a float64
b float64
c float64
d int64
how to read data from csv as date format in python pandas
Try this
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv('history.csv', parse_dates=['Month'], date_parser=dateparse)
Pandas changes date format while reading csv file altough format in the file does not change
Use dayfirst=True
as parameter of read_csv
:
df = pd.read_csv('Test_Read_Date.csv', sep=';',
parse_dates=['timestamp'], dayfirst=True)
Output
>>> df
timestamp temperatures
0 2021-06-07 22:00:00 17.00
1 2021-06-07 22:15:00 16.88
2 2021-06-07 22:30:00 16.75
3 2021-06-07 22:45:00 16.63
4 2021-06-07 23:00:00 16.50
... ... ...
9699 2021-09-16 22:45:00 13.25
9700 2021-09-16 23:00:00 13.40
9701 2021-09-16 23:15:00 13.33
9702 2021-09-16 23:30:00 13.25
9703 2021-09-16 23:45:00 13.18
[9704 rows x 2 columns]
>>> df.loc[487:488]
timestamp temperatures
487 2021-06-12 23:45:00 18.38
488 2021-06-13 00:00:00 18.30
Can pandas format individual dates in a csv file?
Usually, pd.to_datetime()
is smart enough to infer the format on its own. To convert a series or a column of the dataframe to the datetime format you can use:
df["date"] = pd.to_datetime(df["date"])
You can then convert the series back to a string with the desired format:
df["date"].dt.strftime('%Y-%m-%d')
When working with (multiple) unusual formats you might need to use a different method, see this similar question.
Related Topics
Passing Functions with Arguments to Another Function in Python
How to Get the Path of the Python Script I am Running In
How to Find Length of Digits in an Integer
How to Append a New Row to an Old CSV File in Python
Sftp in Python? (Platform Independent)
Pandas Dataframe Get First Row of Each Group
How to Read First N Lines of a File
How to Remove Stop Words Using Nltk or Python
Is There a Decorator to Simply Cache Function Return Values
How to Access "Static" Class Variables Within Methods in Python
Typeerror: Not All Arguments Converted During String Formatting Python
Python: Ignore 'Incorrect Padding' Error When Base64 Decoding
Pip Install Access Denied on Windows
How to Change Backends in Matplotlib/Python