Correct Number of Decimal Places Reading in a .Csv

Correct number of decimal places reading in a .csv

read.csv is not truncating or rounding, but your print.data.frame function is only displaying the values to the precision specified in options(). Try:

 print(dfrm, digits=10)

> dfrm<- data.frame(test=-117.2403266)
> print(dfrm)
test
1 -117.2403
> print(dfrm, digits=10)
test
1 -117.2403266

Using format as suggested would show that the precision has not been lost, but it would return a character vector, so it might not be suitable for assignment when a numeric value was expected.

Edit of a 2 yr-old post: This topic might bring up the question regarding how integers can be imported when they are larger than .Machine$integer.max #[1] 2147483647, since such they can now be internally stored exactly as 'numeric'-abscissa values, so that maximum would be 2^52 (or 2^53-1, I forget which it is). When these are read in from a scan-based function (as are all 0f the read.*-family), you would need to declare as 'numeric' rather than 'integer':

> str( scan(text="21474836470", what=integer()))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '21474836470'
> str( scan(text="21474836470", what=numeric()))
Read 1 item
num 2.15e+10
> str( read.table(text="21474836470", colClasses="integer"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '21474836470'
> str( read.table(text="21474836470", colClasses="numeric"))
'data.frame': 1 obs. of 1 variable:
$ V1: num 2.15e+10

If you don't specify a type or mode for "what", scan would assume numeric() and it would succeed.

Pandas read csv file with float values results in weird rounding and decimal digits

Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.

Passing float_precision='round_trip' to read_csv fixes this.

Check out this page for more detail on this.

After processing your data, if you want to save it back in a csv file, you can pass
float_format = "%.nf" to the corresponding method.

A full example:

import pandas as pd

df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places

How read the exact number of decimal digits with readtable from a .csv file?

It is likely that you are running into a precision limitation of the floating point format used internally by MATLAB. MATLAB by default uses doubles to store pretty much all numbers. For an IEEE double you're only going to get about 15 decimal digits.

If you're not planning on performing computations on these numbers an option is to read them in as strings:

opts = detectImportOptions(filename);
opts = setvartype(opts, 'your variable', 'string'); % or 'char'
data = readtable(filename, opts);

If you want to perform computations on these numbers things are somewhat more difficult. A "built-in" option is to use the variable-precision arithmetic in the Symbolic Math Toolbox, or roll you own implementation. I'd consider if you really need the precision first before going down either of these paths for more precision.

how to specify the digits of numeric values when reading data with read.csv, read_csv or read_excel in R

This isn't an issue with readr. The full data is still in there—R is just not showing it all. The same thing happens when you use base R's read.csv():

library(tidyverse)
df.readr <- read_csv("112.8397456,35.50496106\n112.583984,37.8519194\n112.5826569,37.8602818", col_names = FALSE)

df.base <- read.csv(textConnection("112.8397456,35.50496106\n112.583984,37.8519194\n112.5826569,37.8602818"), header = FALSE)

# By default R shows 7 digits
getOption("digits")
#> [1] 7

# Both CSV files are truncated at 7 digits
df.readr
#> # A tibble: 3 × 2
#> X1 X2
#> <dbl> <dbl>
#> 1 112.8397 35.50496
#> 2 112.5840 37.85192
#> 3 112.5827 37.86028
df.base
#> V1 V2
#> 1 112.8397 35.50496
#> 2 112.5840 37.85192
#> 3 112.5827 37.86028

# Bumping up the digits shows more
options("digits" = 15)

df.readr
#> # A tibble: 3 × 2
#> X1 X2
#> <dbl> <dbl>
#> 1 112.8397456 35.50496106
#> 2 112.5839840 37.85191940
#> 3 112.5826569 37.86028180
df.base
#> V1 V2
#> 1 112.8397456 35.50496106
#> 2 112.5839840 37.85191940
#> 3 112.5826569 37.86028180

Read CSV in R appears to lose accuracy

R does not truncate your data at all. Your data have been read in successfully without losing any precision. The rounding you see on the screen is just the result of print. Try df$v3[1], you will see what I mean.

Although you can control the number of digits to print, by options(digits), there is no need to do this.

How to read in numbers with a comma as decimal separator?

When you check ?read.table you will probably find all the answer that you need.

There are two issues with (continental) European csv files:

  1. What does the c in csv stand for? For standard csv this is a ,, for European csv this is a ;
    sep is the corresponding argument in read.table
  2. What is the character for the decimal point? For standard csv this is a ., for European csv this is a ,
    dec is the corresponding argument in read.table

To read standard csv use read.csv, to read European csv use read.csv2. These two functions are just wrappers to read.table that set the appropriate arguments.

If your file does not follow either of these standards set the arguments manually.



Related Topics



Leave a reply



Submit