Correct number of decimal places reading in a .csv
read.csv
is not truncating or rounding, but your print.data.frame
function is only displaying the values to the precision specified in options()
. Try:
print(dfrm, digits=10)
> dfrm<- data.frame(test=-117.2403266)
> print(dfrm)
test
1 -117.2403
> print(dfrm, digits=10)
test
1 -117.2403266
Using format
as suggested would show that the precision has not been lost, but it would return a character vector, so it might not be suitable for assignment when a numeric value was expected.
Edit of a 2 yr-old post: This topic might bring up the question regarding how integers can be imported when they are larger than .Machine$integer.max #[1] 2147483647
, since such they can now be internally stored exactly as 'numeric'-abscissa values, so that maximum would be 2^52 (or 2^53-1, I forget which it is). When these are read in from a scan
-based function (as are all 0f the read.*
-family), you would need to declare as 'numeric' rather than 'integer':
> str( scan(text="21474836470", what=integer()))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '21474836470'
> str( scan(text="21474836470", what=numeric()))
Read 1 item
num 2.15e+10
> str( read.table(text="21474836470", colClasses="integer"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '21474836470'
> str( read.table(text="21474836470", colClasses="numeric"))
'data.frame': 1 obs. of 1 variable:
$ V1: num 2.15e+10
If you don't specify a type or mode for "what", scan
would assume numeric()
and it would succeed.
Pandas read csv file with float values results in weird rounding and decimal digits
Pandas uses a dedicated dec 2 bin
converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip'
to read_csv
fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf"
to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
How read the exact number of decimal digits with readtable from a .csv file?
It is likely that you are running into a precision limitation of the floating point format used internally by MATLAB. MATLAB by default uses doubles to store pretty much all numbers. For an IEEE double you're only going to get about 15 decimal digits.
If you're not planning on performing computations on these numbers an option is to read them in as strings:
opts = detectImportOptions(filename);
opts = setvartype(opts, 'your variable', 'string'); % or 'char'
data = readtable(filename, opts);
If you want to perform computations on these numbers things are somewhat more difficult. A "built-in" option is to use the variable-precision arithmetic in the Symbolic Math Toolbox, or roll you own implementation. I'd consider if you really need the precision first before going down either of these paths for more precision.
how to specify the digits of numeric values when reading data with read.csv, read_csv or read_excel in R
This isn't an issue with readr. The full data is still in there—R is just not showing it all. The same thing happens when you use base R's read.csv()
:
library(tidyverse)
df.readr <- read_csv("112.8397456,35.50496106\n112.583984,37.8519194\n112.5826569,37.8602818", col_names = FALSE)
df.base <- read.csv(textConnection("112.8397456,35.50496106\n112.583984,37.8519194\n112.5826569,37.8602818"), header = FALSE)
# By default R shows 7 digits
getOption("digits")
#> [1] 7
# Both CSV files are truncated at 7 digits
df.readr
#> # A tibble: 3 × 2
#> X1 X2
#> <dbl> <dbl>
#> 1 112.8397 35.50496
#> 2 112.5840 37.85192
#> 3 112.5827 37.86028
df.base
#> V1 V2
#> 1 112.8397 35.50496
#> 2 112.5840 37.85192
#> 3 112.5827 37.86028
# Bumping up the digits shows more
options("digits" = 15)
df.readr
#> # A tibble: 3 × 2
#> X1 X2
#> <dbl> <dbl>
#> 1 112.8397456 35.50496106
#> 2 112.5839840 37.85191940
#> 3 112.5826569 37.86028180
df.base
#> V1 V2
#> 1 112.8397456 35.50496106
#> 2 112.5839840 37.85191940
#> 3 112.5826569 37.86028180
Read CSV in R appears to lose accuracy
R does not truncate your data at all. Your data have been read in successfully without losing any precision. The rounding you see on the screen is just the result of print
. Try df$v3[1]
, you will see what I mean.
Although you can control the number of digits to print, by options(digits)
, there is no need to do this.
How to read in numbers with a comma as decimal separator?
When you check ?read.table
you will probably find all the answer that you need.
There are two issues with (continental) European csv files:
- What does the
c
in csv stand for? For standard csv this is a,
, for European csv this is a;
sep
is the corresponding argument inread.table
- What is the character for the decimal point? For standard csv this is a
.
, for European csv this is a,
dec
is the corresponding argument inread.table
To read standard csv use read.csv
, to read European csv use read.csv2
. These two functions are just wrappers to read.table
that set the appropriate arguments.
If your file does not follow either of these standards set the arguments manually.
Related Topics
Two Y-Axes with Different Scales for Two Datasets in Ggplot2
Show Element Values in Barplot
Convert Roman Numerals to Numbers in R
As.Date(As.Posixct()) Gives the Wrong Date
Trying to Merge Multiple CSV Files in R
Automated Httr Authentication with Twitter , Provide Response to Interactive Prompt in "Batch" Mode
Union of Intersecting Vectors in a List in R
Reshape Data Long to Wide - Understanding Reshape Parameters
Get Monthly Means from Dataframe of Several Years of Daily Temps
R, Find Duplicated Rows , Regardless of Order
Convert/Export Googleway Output to Data Frame
Keeping Only Certain Rows of a Data Frame Based on a Set of Values
Best Practice: Should I Try to Change to Utf-8 as Locale or Is It Safe to Leave It as Is
Given Value of Matrix, Getting It's Coordinate
Count Number of Non-Na Values by Group
Sum Object in a Column Between an Interval Defined by Another Column