How to Declare a Thousand Separator in Read.Csv

How can I declare a thousand separator in read.csv?

since there is an "r" tag under the question, I assume this is an R question.
In R, you do not need to do anything to handle the quoted commas:

> read.csv('t.csv', header=F)
V1 V2 V3 V4
1 Sudan 15,276,000 14,098,000 13,509,000
2 Chad 209000 196000 190000

# if you want to convert them to numbers:
> df <- read.csv('t.csv', header=F, stringsAsFactor=F)
> df$V2 <- as.numeric(gsub(',', '', df$V2))

Read csv with numeric columns containing thousands separator

You should be able to read the data with read.csv. Here an example

#write data
write('Date,x,y\n"2015/08/01","71,131","20,390"\n"2015/08/02","81,599","23,273"\n"2015/08/03","79,435","21,654"\n"2015/08/04","80,733","20,924"',"test.csv")

#use "text" rather than "file" in read.csv
#perform regex substitution before using read.csv
#the outer gsub with '(?<=\\d),(\\d{3})(?!\\d)' performs the thousands separator substitution
#the inner gsub replaces all \" with '
read.csv(text=gsub('(?<=\\d),(\\d{3})(?!\\d)',
'\\1',
gsub("\\\"",
"'",
paste0(readLines("test.csv"),collapse="\n")),
perl=TRUE),
header=TRUE,
quote="'",
stringsAsFactors=FALSE)

The result

#        Date     x     y
#1 2015/08/01 71131 20390
#2 2015/08/02 81599 23273
#3 2015/08/03 79435 21654
#4 2015/08/04 80733 20924

pandas reading CSV data formatted with comma for thousands separator

Pass param thousands=',' to read_csv to read those values as thousands:

In [27]:
import pandas as pd
import io

t="""id;value
0;123,123
1;221,323,330
2;32,001"""
pd.read_csv(io.StringIO(t), thousands=r',', sep=';')

Out[27]:
id value
0 0 123123
1 1 221323330
2 2 32001

Conflict between thousand separator and date format - pandas.read_csv

I have managed to solve the problem.

df = pd.read_csv(filepath, sep=";", header=5, decimal=",", thousands = ".", parse_dates=['Datum'], date_parser = lambda x: datetime.strptime(x, '%d.%m.%Y'))
df['Datum'] = df['Datum'].dt.strftime("%d.%m.%Y")

Problem was because thousands separator was ".", I somehow managed to format the date like I wanted afterwards and now everything works good.

Appreciate all help!

Use Non breakable space as thousands separator in pandas read_csv function

pd.read_csv supports two parser engines: C and Python. According to the doc,

The C engine is faster while the python engine is currently more feature-complete.

I did some tests and it looked like the C engine -- which is the default choice in most cases -- can only deal with thousands and decimal separators that are basic ASCII letters ('\x0' - '\x7f'); using '\xa0' as the thousands separator is only supported in the Python engine.

data = "0,11;1\xa0279,92;1\xa0324,21;1\xa0302,14;10,65;2\xa0707,77;2\xa0951,71;2\xa0829,40"
df = pd.read_csv(io.StringIO(data), header=None, encoding="iso-8859-1",
sep=';', decimal=',', thousands='\xa0', engine="python")
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 1 non-null float64
1 1 1 non-null float64
2 2 1 non-null float64
3 3 1 non-null float64
4 4 1 non-null float64
5 5 1 non-null float64
6 6 1 non-null float64
7 7 1 non-null float64
dtypes: float64(8)
memory usage: 192.0 bytes

Most elegant way to load csv with point as thousands separator in R

Adapted from this post: Specify custom Date format for colClasses argument in read.table/read.csv

#some sample data
write.csv(data.frame(a=c("1.234,56", "1.234,56"),
b=c("1.234,56", "1.234,56")),
"test.csv", row.names=FALSE, quote=TRUE)

#define your own numeric class
setClass('myNum')
#define conversion
setAs("character", "myNum",
function(from) as.numeric(gsub(",", "\\.", gsub("\\.", "", from))))

#read data with custom colClasses
read_data = read.csv("test.csv",
stringsAsFactors=FALSE,
colClasses=c("myNum", "myNum"))
#let's try whether this is really a numeric
read_data[1, 1] * 2

#[1] 2469.12

Pandas: Read csv with quoted values, comma as decimal separator, and period as digit grouping symbol

What about that ?

import pandas

table = pandas.read_csv("data.csv", sep=";", decimal=",")

print(table["Amount"][0]) # -36.37
print(type(table["Amount"][0])) # <class 'numpy.float64'>
print(table["Amount"][0] + 36.37) # 0.0

Pandas automatically detects a number and converts it to numpy.float64.



Edit:

As @bweber discovered, some values in data.csv ​​contained more than 3 digits, and used a digit grouping symbol '.'. In order to convert the String to Integer, the symbol used must be passed to the read_csv() method:

table = pandas.read_csv("data.csv", sep=";", decimal=",", thousands='.')

How to read .csv-data containing thousand separators and special handling of zeros (in R)?

If "," is the only separator, i.e. all of the numbers are integers, you can set the dec argument of csv2 (or read.csv) to "," and multiply by 1000:

data <- read.csv2(
text = "id ; variable1
1 ; 2,001
1,008 ; 2,001
1,009 ; 2,002
1,01 ; 2,001
1,3 ; 2,0",
sep = ";",
stringsAsFactors = FALSE,
header = TRUE,
dec = "," )

.

> 1000*data
id variable1
1 1000 2001
2 1008 2001
3 1009 2002
4 1010 2001
5 1300 2000
>

Read CSV file with space as thousand-seperator using pandas.read_csv

If you have non-breaking spaces, I would suggest a more aggressive regular expression with str.replace:

df.col1 = df.col1.str.replace('[^\d.,e+-]', '')\
.str.replace(',', '.').astype(float)

Regex

[       # character group
^ # negation - ignore everything in this character group
\d # digit
. # dot
e # 'e' - exponent
+- # signs
]

How to read pandas CSV file with comma separator and comma thousand separator

Try reading it with:

pd.read_csv(myfile, encoding='latin1', quotechar='"')

Each column that contains these will be treated as type object.
Once you get this, to get back to float use:

df = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))

Alternatively you can try:

pd.read_csv(myfile, encoding='latin1', quotechar='"', error_bad_lines=False)

Here you can see what was omitted from original csv - what caused the problem.

For each line that was omitted you'll receive a Warning instead of Error.



Related Topics



Leave a reply



Submit