Can pandas automatically read dates from a CSV file?
You should add parse_dates=True
, or parse_dates=['column name']
when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.
Suppose you have a column 'datetime' with your string, then:
from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)
This way you can even combine multiple columns into a single datetime column, this merges a 'date' and a 'time' column into a single 'datetime' column:
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)
You can find directives (i.e. the letters to be used for different formats) for strptime
and strftime
in this page.
Pandas reading csv with a datetime period
You can convert first of datetimes by to_datetime
and then use Series.dt.to_period
:
df['week'] = pd.to_datetime(df['week'].str.split('/').str[0]).dt.to_period('W')
print (df)
name week content
0 Dan 2012-07-09/2012-07-15 4.0
1 Jim 2012-07-09/2012-07-15 1.0
2 Joe 2012-07-09/2012-07-15 3.0
3 Sam 2012-07-16/2012-07-22 18.0
4 Tom 2012-07-16/2012-07-22 7.0
print (df['week'])
0 2012-07-09/2012-07-15
1 2012-07-09/2012-07-15
2 2012-07-09/2012-07-15
3 2012-07-16/2012-07-22
4 2012-07-16/2012-07-22
Name: week, dtype: period[W-SUN]
If want parse values in read_csv
use converters
with lambda function:
import pandas as pd
from io import StringIO
temp="""name;week;content
0;Dan;2012-07-09/2012-07-15;4.0
1;Jim;2012-07-09/2012-07-15;1.0
2;Joe;2012-07-09/2012-07-15;3.0
3;Sam;2012-07-16/2012-07-22;18.0
4;Tom;2012-07-16/2012-07-22;7.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
f = lambda x: pd.to_datetime(x.split('/')[0]).to_period('W')
df = pd.read_csv(StringIO(temp), sep=";", converters={'week': f})
print (df)
name week content
0 Dan 2012-07-09/2012-07-15 4.0
1 Jim 2012-07-09/2012-07-15 1.0
2 Joe 2012-07-09/2012-07-15 3.0
3 Sam 2012-07-16/2012-07-22 18.0
4 Tom 2012-07-16/2012-07-22 7.0
print (df.dtypes)
name object
week period[W-SUN]
content float64
dtype: object
Polars: Specify dtypes for all columns at once in read_csv
Reading all data in a csv to any other type than pl.Utf8
likely fails with a lot of null
values. We can use expressions to declare how we want to deal with those null values.
If you read a csv with infer_schema_length=0
, polars does not know the schema and will read all columns as pl.Utf8
as that is a super type of all polars types.
When read as Utf8
we can use expressions to cast all columns.
(pl.read_csv("test.csv", infer_schema_length=0)
.with_columns(pl.all().cast(pl.Int32, strict=False))
how to parse a non-string column as datetime with pandas.read_csv
Just changing input data works as you expects
(Removing comma in end.)
Data:
TS,secs,degC,Pa,V,V,V,V,V,degC,%
2019-08-29 15:29:02.000,0.000,23.21,97707.95,2.37942,4.06958,1.16183,2.06545,2.16861,22.70,53.70
2019-08-29 15:29:04.000,2.001,23.22,98000.81,2.30359,4.04178,1.15457,2.06375,2.16660,22.70,54.00
and read as below mentioned:
df=pd.read_csv('file.csv', parse_dates=['TS'])
and df.dtypes
gives desired output
TS datetime64[ns]
secs float64
degC float64
Pa float64
V float64
V.1 float64
V.2 float64
V.3 float64
V.4 float64
degC.1 float64
% float64
dtype: object
Related Topics
How to Install Psycopg2 with "Pip" on Python
How to Print an Exception in Python
How to Update/Upgrade Pip Itself from Inside My Virtual Environment
Making Python Loggers Output All Messages to Stdout in Addition to Log File
How to Create Test and Train Samples from One Dataframe with Pandas
How to Count the Occurrence of a Certain Item in an Ndarray
How to Find the Location of Python Module Sources
Programmatically Saving Image to Django Imagefield
How to Append One String to Another in Python
How to Run Python Code from Sublime Text 2
Relative Imports - Modulenotfounderror: No Module Named X
Moving Matplotlib Legend Outside of the Axis Makes It Cutoff by the Figure Box
Turn a String into a Valid Filename
Ignore Python Multiple Return Value
Convert String Date to Timestamp in Python
Creating a Pandas Dataframe from a Numpy Array: How to Specify the Index Column and Column Headers