Convert Commas Decimal Separators to Dots Within a Dataframe

Convert commas decimal separators to dots within a Dataframe

pandas.read_csv has a decimal parameter for this: doc

I.e. try with:

df = pd.read_csv(Input, delimiter=";", decimal=",")

Replace comma with dot Pandas

You need to assign the result of your operate back as the operation isn't inplace, besides you can use apply or stack and unstack with vectorised str.replace to do this quicker:

In [5]:
df.apply(lambda x: x.str.replace(',','.'))

Out[5]:
1-8 1-7
H0 0.140711 0.140711
H1 0.0999 0.0999
H2 0.001 0.001
H3 0.140711 0.140711
H4 0.140711 0.140711
H5 0.140711 0.140711
H6 0 0
H7 0 0
H8 0.140711 0.140711
H9 0.140711 0.140711
H10 0.140711 0.1125688
H11 0.140711 0.1125688
H12 0.140711 0.1125688
H13 0.140711 0.1125688
H14 0.140711 0.140711
H15 0.140711 0.140711
H16 0.140711 0.140711
H17 0.140711 0.140711
H18 0.140711 0.140711
H19 0.140711 0.140711
H20 0.140711 0.140711
H21 0.140711 0.140711
H22 0.140711 0.140711
H23 0.140711 0.140711

In [4]:
df.stack().str.replace(',','.').unstack()

Out[4]:
1-8 1-7
H0 0.140711 0.140711
H1 0.0999 0.0999
H2 0.001 0.001
H3 0.140711 0.140711
H4 0.140711 0.140711
H5 0.140711 0.140711
H6 0 0
H7 0 0
H8 0.140711 0.140711
H9 0.140711 0.140711
H10 0.140711 0.1125688
H11 0.140711 0.1125688
H12 0.140711 0.1125688
H13 0.140711 0.1125688
H14 0.140711 0.140711
H15 0.140711 0.140711
H16 0.140711 0.140711
H17 0.140711 0.140711
H18 0.140711 0.140711
H19 0.140711 0.140711
H20 0.140711 0.140711
H21 0.140711 0.140711
H22 0.140711 0.140711
H23 0.140711 0.140711

the key thing here is to assign back the result:

df = df.stack().str.replace(',','.').unstack()

Loop to convert decimal comma (,) into dot (.) to change class of data.frame columns

Attempt 1: gsub does not modify strings in place - you need to assign it back to df[,i].

df[,i] <- gsub(",", ".", df[ , i])

Attempt 2: Right idea. But x[nm] gives you a data frame, while gsub takes vectors. Better to do x[,nm], with optional drop = TRUE (this is default). Also, you have the arguments of your function moved around. You want to apply fc over the different values of inx, keeping x = df fixed.

Try:

inx = 1:4
fc <- function(x, inx){
nm <- names(x)[inx]
gsub(pattern = ",", replacement = ".", x = x[,nm])
}
sapply(inx, fc, x = df)

This returns a matrix because sapply will try to simplify. If this is not desired, use lapply and wrap it in a data frame.

data.frame(lapply(inx, fc, x = df))

Or to do it in one line with an anonymous function. Data frames are fundamentally lists, so you can iterate over the columns with lapply like so.

data.frame(lapply(df, function(x) gsub(",", ".", x, fixed = TRUE)))

data frame with commas as decimal separator

When you read in the .csv file, you can specify the sep and dec parameters based on the file-type:

# assuming file uses ; for separating columns and , for decimal point
# Using base functions
read.csv(filename, sep = ";", dec = ",")

# Using data.table
library(data.table)
fread(filename, sep = ";", dec = ",")

You should attempt to address the source of the issue first, regular expressions and other work-arounds should be used only if that fails to get the desired result.

replace commas to decimal points in DataFrame columns to make them numeric

import re    

for col in ['b', 'c', 'd']:
df[col] = pd.to_numeric(df[col].apply(lambda x: re.sub(',', '.', str(x))))

Search and replace dots and commas in pandas dataframe

The best is use if possible parameters in read_csv:

df = pd.read_csv(file, thousands='.', decimal=',')

If not possible, then replace should help:

df['col2'] = (df['col2'].replace('\.','', regex=True)
.replace(',','.', regex=True)
.astype(float))

Replacing dot with comma from a dataframe using Python

Where does the dataframe come from - how was it generated? Was it imported from a CSV file?

Your code works if you apply it to columns which are strings, as long as you remember to do
df = df.apply() and not just df.apply() , e.g.:

import pandas as pd
df = pd.DataFrame()
df['a'] =['some . text', 'some . other . text']
df = df.apply(lambda x: x.str.replace('.', ','))
print(df)

However, you are trying to do this with numbers, not strings.
To be precise, the other question is: what are the dtypes of your dataframe?
If you type

df.dtypes

what's the output?

I presume your columns are numeric and not strings, right? After all, if they are numbers they should be stored as such in your dataframe.

The next question: how are you exporting this table to Excel?

If you are saving a csv file, pandas' to_csv() method has a decimal argument which lets you specify what should be the separator for the decimals (tyipically, dot in the English-speaking world and comma in many countries in continental Europe). Look up the syntax.

If you are using the to_excel() method, it shouldn't matter because Excel should treat it internally as a number, and how it displays it (whether with a dot or comma for decimal separator) will typically depend on the options set in your computer.

Please clarify how you are exporting the data and what happens when you open it in Excel: does Excel treat it as a string? Or as a number, but you would like to see a different separator for the decimals?

Also look here for how to change decimal separators in Excel: https://www.officetooltips.com/excel_2016/tips/change_the_decimal_point_to_a_comma_or_vice_versa.html

UPDATE

OP, you have still not explained where the dataframe comes from. Do you import it from an external source? Do you create it/ calculate it yourself?
The fact that the columns are objects makes me think they are either stored as strings, or maybe some rows are numeric and some are not.

What happens if you try to convert a column to float?

df['Open'] = df['Open'].astype('float64')

If the entire column should be numeric but it's not, then start by cleansing your data.

Second question: what happens when you use Excel to open the file you have just created? Excel displays a comma, but what character Excel sues to separate decimals depends on the Windows/Mac/Excel settings, not on how pandas created the file. Have you tried the link I gave above, can you change how Excel displays decimals? Also, does Excel treat those numbers as numbers or as strings?

Convert commas to dots in txt with python that also contains scientific number formatting

You may try reading the entire file into a Python string, and then doing a global replacement of comma to dot:

data = ""
with open('nums.csv', 'r') as file:
data = file.read().replace(',', '.').replace(' ', ';')

with open("nums_out.csv", "w") as out_file:
out_file.write(data)

For a possibly more robust solution, should there exist the possibility that two columns could be separated by multiple whitespace characters, use re.sub:

data = ""
with open('nums.csv', 'r') as file:
data = file.read().replace(',', '.')
data = re.sub(r'(?<=\n|^)[^\S\r\n]+', '', data)
data = re.sub('(?<=\S)[^\S\r\n]+', ';', data)


Related Topics



Leave a reply



Submit