Pandas read csv file with float values results in weird rounding and decimal digits
Pandas uses a dedicated dec 2 bin
converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip'
to read_csv
fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf"
to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
Using pandas to read from .csv file but it's cutting off the decimal places?
I believe your issue is that you're reading in the datafile as .astype(int)
, which is converting everything in the CSV to an int, so you are unable to recover the decimal by doing .astype(float)
. Try to not specify type on the inital read_csv
, as Pandas can normally handle properly typing automatically.
Pandas - compare loaded data to processed data
The issue with floating point numbers is precision. As you guessed, your numbers are very close but not exactly identical:
df.iloc[0,0]
-0.41676538151302184
df2.iloc[0,0]
-0.4167653815130218
with pd.option_context('display.float_format', '{:.20f}'.format):
display(df2.val.compare(df.val))
self other
0 -0.41676538151302178203 -0.41676538151302183755
One option is to use numpy.isclose
or numpy.allclose
, that are specifically designed to test close numbers. There are two parameters rtol
and atol
to specify a custom relative or absolute tolerance.
import numpy as np
np.isclose(df, df2).all()
# or
np.allclose(df, df2)
output: True
Pandas changes numbers when reading from Excel
I think this is a duplicate question, please see question below:
Pandas read csv file with float values results in weird rounding and decimal digits
The solution was using float_precision='round_trip':-pd.read_csv(source_file, float_precision='round_trip')
float64 with pandas to_csv
As mentioned in the comments, it is a general floating point problem.
However you can use the float_format
key word of to_csv
to hide it:
df.to_csv('pandasfile.csv', float_format='%.3f')
or, if you don't want 0.0001 to be rounded to zero:
df.to_csv('pandasfile.csv', float_format='%g')
will give you:
Bob,0.085
Alice,0.005
in your output file.
For an explanation of %g
, see Format Specification Mini-Language.
Related Topics
Export a Pandas Dataframe as a Table Image
Text Box with Line Wrapping in Matplotlib
All Synonyms for Word in Python
Correct Style for Python Functions That Mutate the Argument
Count Frequency of Values in Pandas Dataframe Column
Selenium Webdriver: How to Download a PDF File with Python
Pandas Selecting by Label Sometimes Return Series, Sometimes Returns Dataframe
Set Up Python 3 Build System with Sublime Text 3
Python: Nameerror: Global Name 'Foobar' Is Not Defined
What Is an 'Endpoint' in Flask
Tab Completion in Python's Raw_Input()
Changing Order of Unit Tests in Python
Replace Column Values Based on Another Dataframe Python Pandas - Better Way