How to read in numbers with a comma as decimal separator?
When you check ?read.table
you will probably find all the answer that you need.
There are two issues with (continental) European csv files:
- What does the
c
in csv stand for? For standard csv this is a,
, for European csv this is a;
sep
is the corresponding argument inread.table
- What is the character for the decimal point? For standard csv this is a
.
, for European csv this is a,
dec
is the corresponding argument inread.table
To read standard csv use read.csv
, to read European csv use read.csv2
. These two functions are just wrappers to read.table
that set the appropriate arguments.
If your file does not follow either of these standards set the arguments manually.
R: How to read in numbers with comma as a Dec separator & a Field separator? The two arguments to fread 'dec' and 'sep' are equal (',').
It might be possible to do using the dec
parameter depending on how you're reading the file in. Here is how I would do it using data.table
:
dat <- fread('"Name", "Age"
"Joe", "1,2"')
dat[, Age := as.numeric(gsub(",", ".", Age))]
# Name Age
# 1: Joe 1.2
data frame with commas as decimal separator
When you read in the .csv file, you can specify the sep
and dec
parameters based on the file-type:
# assuming file uses ; for separating columns and , for decimal point
# Using base functions
read.csv(filename, sep = ";", dec = ",")
# Using data.table
library(data.table)
fread(filename, sep = ";", dec = ",")
You should attempt to address the source of the issue first, regular expressions and other work-arounds should be used only if that fails to get the desired result.
as.numeric with comma decimal separators?
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
Parsing numbers with a comma decimal separator in JavaScript
You need to replace (remove) the dots first in the thousands separator, then take care of the decimal:
function isNumber(n) {
'use strict';
n = n.replace(/\./g, '').replace(',', '.');
return !isNaN(parseFloat(n)) && isFinite(n);
}
Pandas: Read csv with quoted values, comma as decimal separator, and period as digit grouping symbol
What about that ?
import pandas
table = pandas.read_csv("data.csv", sep=";", decimal=",")
print(table["Amount"][0]) # -36.37
print(type(table["Amount"][0])) # <class 'numpy.float64'>
print(table["Amount"][0] + 36.37) # 0.0
Pandas automatically detects a number and converts it to numpy.float64
.
Edit:
As @bweber discovered, some values in data.csv
contained more than 3 digits, and used a digit grouping symbol '.
'. In order to convert the String to Integer, the symbol used must be passed to the read_csv() method:
table = pandas.read_csv("data.csv", sep=";", decimal=",", thousands='.')
Matlab: How to read in numbers with a comma as decimal separator?
With a test script I've found a factor of less than 1.5. My code would look like:
tmco = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;
A = txt2mat(filename, tmco{:});
Note the different 'ReplaceChar' value and 'ReadMode' 'block'.
I get the following results for a ~5MB file on my (not too new) machine:
- txt2mat test comma avg. time: 0.63231
- txt2mat test dot avg. time: 0.45715
- textscan test dot avg. time: 0.4787
The full code of my test script:
%% generate sample files
fdot = 'C:\temp\cDot.txt';
fcom = 'C:\temp\cCom.txt';
c = 5; % # columns
r = 100000; % # rows
test = round(1e8*rand(r,c))/1e6;
tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
tdot = ['a header line', char([13,10]), tdot];
tcom = strrep(tdot,'.',',');
% write dot file
fid = fopen(fdot,'w');
fprintf(fid, '%s', tdot);
fclose(fid);
% write comma file
fid = fopen(fcom,'w');
fprintf(fid, '%s', tcom);
fclose(fid);
disp('-----')
%% read back sample files with txt2mat and textscan
% txt2mat-options with comma decimal sep.
tmco = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;
% txt2mat-options with dot decimal sep.
tmdo = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block'} ;
% textscan-options
tsco = {'HeaderLines' , 1 , ...
'CollectOutput' , true } ;
A = txt2mat(fcom, tmco{:});
B = txt2mat(fdot, tmdo{:});
fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};
disp(['txt2mat test comma (1=Ok): ' num2str(isequal(A,test)) ])
disp(['txt2mat test dot (1=Ok): ' num2str(isequal(B,test)) ])
disp(['textscan test dot (1=Ok): ' num2str(isequal(C,test)) ])
disp('-----')
%% speed test
numTest = 20;
% A) txt2mat with comma
tic
for k = 1:numTest
A = txt2mat(fcom, tmco{:});
clear A
end
ttmc = toc;
disp(['txt2mat test comma avg. time: ' num2str(ttmc/numTest) ])
% B) txt2mat with dot
tic
for k = 1:numTest
B = txt2mat(fdot, tmdo{:});
clear B
end
ttmd = toc;
disp(['txt2mat test dot avg. time: ' num2str(ttmd/numTest) ])
% C) textscan with dot
tic
for k = 1:numTest
fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};
clear C
end
ttsc = toc;
disp(['textscan test dot avg. time: ' num2str(ttsc/numTest) ])
disp('-----')
Related Topics
Levels≪-'( What Sorcery Is This
Combine Two or More Columns in a Dataframe into a New Column With a New Name
Dplyr::Select Function Clashes With Mass::Select
Replace Multiple Letters With Accents With Gsub
Concatenate Strings by Group With Dplyr
How to Change Multiple Date Formats in Same Column
Special Variables in Ggplot (..Count.., ..Density.., etc.)
How to Flatten/Merge Overlapping Time Periods
Turning Off Some Legends in a Ggplot
Standard Evaluation in Dplyr: Summarise a Variable Given as a Character String
Scale a Series Between Two Points
Calculate the Mean For Each Column of a Matrix in R
Dummify Character Column and Find Unique Values
Scatterplot With Marginal Histograms in Ggplot2
Change the Blank Cells to "Na"
Multiple Use of the Positional '$' Operator to Update Nested Arrays