Pandas Read CSV File with Float Values Results in Weird Rounding and Decimal Digits

Pandas read csv file with float values results in weird rounding and decimal digits

Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.

Passing float_precision='round_trip' to read_csv fixes this.

Check out this page for more detail on this.

After processing your data, if you want to save it back in a csv file, you can pass
float_format = "%.nf" to the corresponding method.

A full example:

import pandas as pd

df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places

Using pandas to read from .csv file but it's cutting off the decimal places?

I believe your issue is that you're reading in the datafile as .astype(int), which is converting everything in the CSV to an int, so you are unable to recover the decimal by doing .astype(float). Try to not specify type on the inital read_csv, as Pandas can normally handle properly typing automatically.

Pandas - compare loaded data to processed data

The issue with floating point numbers is precision. As you guessed, your numbers are very close but not exactly identical:

df.iloc[0,0]
-0.41676538151302184

df2.iloc[0,0]
-0.4167653815130218
with pd.option_context('display.float_format', '{:.20f}'.format):
display(df2.val.compare(df.val))

self other
0 -0.41676538151302178203 -0.41676538151302183755

One option is to use numpy.isclose or numpy.allclose, that are specifically designed to test close numbers. There are two parameters rtol and atol to specify a custom relative or absolute tolerance.

import numpy as np
np.isclose(df, df2).all()

# or
np.allclose(df, df2)

output: True

Pandas changes numbers when reading from Excel

I think this is a duplicate question, please see question below:

Pandas read csv file with float values results in weird rounding and decimal digits

The solution was using float_precision='round_trip':-
pd.read_csv(source_file, float_precision='round_trip')

float64 with pandas to_csv

As mentioned in the comments, it is a general floating point problem.

However you can use the float_format key word of to_csv to hide it:

df.to_csv('pandasfile.csv', float_format='%.3f')

or, if you don't want 0.0001 to be rounded to zero:

df.to_csv('pandasfile.csv', float_format='%g')

will give you:

Bob,0.085
Alice,0.005

in your output file.

For an explanation of %g, see Format Specification Mini-Language.



Related Topics



Leave a reply



Submit