Datetime Dtypes in Pandas Read_Csv

Can pandas automatically read dates from a CSV file?

You should add parse_dates=True, or parse_dates=['column name'] when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.

Suppose you have a column 'datetime' with your string, then:

from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

This way you can even combine multiple columns into a single datetime column, this merges a 'date' and a 'time' column into a single 'datetime' column:

dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

You can find directives (i.e. the letters to be used for different formats) for strptime and strftime in this page.

Pandas reading csv with a datetime period

You can convert first of datetimes by to_datetime and then use Series.dt.to_period:

df['week'] = pd.to_datetime(df['week'].str.split('/').str[0]).dt.to_period('W')

print (df)
name week content
0 Dan 2012-07-09/2012-07-15 4.0
1 Jim 2012-07-09/2012-07-15 1.0
2 Joe 2012-07-09/2012-07-15 3.0
3 Sam 2012-07-16/2012-07-22 18.0
4 Tom 2012-07-16/2012-07-22 7.0

print (df['week'])
0 2012-07-09/2012-07-15
1 2012-07-09/2012-07-15
2 2012-07-09/2012-07-15
3 2012-07-16/2012-07-22
4 2012-07-16/2012-07-22
Name: week, dtype: period[W-SUN]

If want parse values in read_csv use converters with lambda function:

import pandas as pd
from io import StringIO

temp="""name;week;content
0;Dan;2012-07-09/2012-07-15;4.0
1;Jim;2012-07-09/2012-07-15;1.0
2;Joe;2012-07-09/2012-07-15;3.0
3;Sam;2012-07-16/2012-07-22;18.0
4;Tom;2012-07-16/2012-07-22;7.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'

f = lambda x: pd.to_datetime(x.split('/')[0]).to_period('W')
df = pd.read_csv(StringIO(temp), sep=";", converters={'week': f})

print (df)
name week content
0 Dan 2012-07-09/2012-07-15 4.0
1 Jim 2012-07-09/2012-07-15 1.0
2 Joe 2012-07-09/2012-07-15 3.0
3 Sam 2012-07-16/2012-07-22 18.0
4 Tom 2012-07-16/2012-07-22 7.0

print (df.dtypes)
name object
week period[W-SUN]
content float64
dtype: object

Polars: Specify dtypes for all columns at once in read_csv

Reading all data in a csv to any other type than pl.Utf8 likely fails with a lot of null values. We can use expressions to declare how we want to deal with those null values.

If you read a csv with infer_schema_length=0, polars does not know the schema and will read all columns as pl.Utf8 as that is a super type of all polars types.

When read as Utf8 we can use expressions to cast all columns.

(pl.read_csv("test.csv", infer_schema_length=0)
.with_columns(pl.all().cast(pl.Int32, strict=False))

how to parse a non-string column as datetime with pandas.read_csv

Just changing input data works as you expects
(Removing comma in end.)

Data:

TS,secs,degC,Pa,V,V,V,V,V,degC,%
2019-08-29 15:29:02.000,0.000,23.21,97707.95,2.37942,4.06958,1.16183,2.06545,2.16861,22.70,53.70
2019-08-29 15:29:04.000,2.001,23.22,98000.81,2.30359,4.04178,1.15457,2.06375,2.16660,22.70,54.00

and read as below mentioned:

df=pd.read_csv('file.csv', parse_dates=['TS'])

and df.dtypes gives desired output

TS        datetime64[ns]
secs float64
degC float64
Pa float64
V float64
V.1 float64
V.2 float64
V.3 float64
V.4 float64
degC.1 float64
% float64
dtype: object


Related Topics



Leave a reply



Submit