How to Read in Numbers With a Comma as Decimal Separator

How to read in numbers with a comma as decimal separator?

When you check ?read.table you will probably find all the answer that you need.

There are two issues with (continental) European csv files:

  1. What does the c in csv stand for? For standard csv this is a ,, for European csv this is a ;

    sep is the corresponding argument in read.table
  2. What is the character for the decimal point? For standard csv this is a ., for European csv this is a ,

    dec is the corresponding argument in read.table

To read standard csv use read.csv, to read European csv use read.csv2. These two functions are just wrappers to read.table that set the appropriate arguments.

If your file does not follow either of these standards set the arguments manually.

R: How to read in numbers with comma as a Dec separator & a Field separator? The two arguments to fread 'dec' and 'sep' are equal (',').

It might be possible to do using the dec parameter depending on how you're reading the file in. Here is how I would do it using data.table:

dat <- fread('"Name", "Age"
"Joe", "1,2"')
dat[, Age := as.numeric(gsub(",", ".", Age))]

# Name Age
# 1: Joe 1.2

data frame with commas as decimal separator

When you read in the .csv file, you can specify the sep and dec parameters based on the file-type:

# assuming file uses ; for separating columns and , for decimal point
# Using base functions
read.csv(filename, sep = ";", dec = ",")

# Using data.table
library(data.table)
fread(filename, sep = ";", dec = ",")

You should attempt to address the source of the issue first, regular expressions and other work-arounds should be used only if that fails to get the desired result.

as.numeric with comma decimal separators?

as.numeric(sub(",", ".", Input, fixed = TRUE))

should work.

Parsing numbers with a comma decimal separator in JavaScript

You need to replace (remove) the dots first in the thousands separator, then take care of the decimal:

function isNumber(n) {
'use strict';
n = n.replace(/\./g, '').replace(',', '.');
return !isNaN(parseFloat(n)) && isFinite(n);
}

Pandas: Read csv with quoted values, comma as decimal separator, and period as digit grouping symbol

What about that ?

import pandas

table = pandas.read_csv("data.csv", sep=";", decimal=",")

print(table["Amount"][0]) # -36.37
print(type(table["Amount"][0])) # <class 'numpy.float64'>
print(table["Amount"][0] + 36.37) # 0.0

Pandas automatically detects a number and converts it to numpy.float64.



Edit:

As @bweber discovered, some values in data.csv ​​contained more than 3 digits, and used a digit grouping symbol '.'. In order to convert the String to Integer, the symbol used must be passed to the read_csv() method:

table = pandas.read_csv("data.csv", sep=";", decimal=",", thousands='.')

Matlab: How to read in numbers with a comma as decimal separator?

With a test script I've found a factor of less than 1.5. My code would look like:

tmco = {'NumHeaderLines', 1      , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;

A = txt2mat(filename, tmco{:});

Note the different 'ReplaceChar' value and 'ReadMode' 'block'.

I get the following results for a ~5MB file on my (not too new) machine:

  • txt2mat test comma avg. time: 0.63231
  • txt2mat test dot avg. time: 0.45715
  • textscan test dot avg. time: 0.4787

The full code of my test script:

%% generate sample files

fdot = 'C:\temp\cDot.txt';
fcom = 'C:\temp\cCom.txt';

c = 5; % # columns
r = 100000; % # rows
test = round(1e8*rand(r,c))/1e6;
tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
tdot = ['a header line', char([13,10]), tdot];

tcom = strrep(tdot,'.',',');

% write dot file
fid = fopen(fdot,'w');
fprintf(fid, '%s', tdot);
fclose(fid);
% write comma file
fid = fopen(fcom,'w');
fprintf(fid, '%s', tcom);
fclose(fid);

disp('-----')

%% read back sample files with txt2mat and textscan

% txt2mat-options with comma decimal sep.
tmco = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;

% txt2mat-options with dot decimal sep.
tmdo = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block'} ;

% textscan-options
tsco = {'HeaderLines' , 1 , ...
'CollectOutput' , true } ;


A = txt2mat(fcom, tmco{:});
B = txt2mat(fdot, tmdo{:});

fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};

disp(['txt2mat test comma (1=Ok): ' num2str(isequal(A,test)) ])
disp(['txt2mat test dot (1=Ok): ' num2str(isequal(B,test)) ])
disp(['textscan test dot (1=Ok): ' num2str(isequal(C,test)) ])
disp('-----')

%% speed test

numTest = 20;

% A) txt2mat with comma
tic
for k = 1:numTest
A = txt2mat(fcom, tmco{:});
clear A
end
ttmc = toc;
disp(['txt2mat test comma avg. time: ' num2str(ttmc/numTest) ])

% B) txt2mat with dot
tic
for k = 1:numTest
B = txt2mat(fdot, tmdo{:});
clear B
end
ttmd = toc;
disp(['txt2mat test dot avg. time: ' num2str(ttmd/numTest) ])

% C) textscan with dot
tic
for k = 1:numTest
fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};
clear C
end
ttsc = toc;
disp(['textscan test dot avg. time: ' num2str(ttsc/numTest) ])
disp('-----')


Related Topics



Leave a reply



Submit