Convert Python Strings into Floats Explicitly Using the Comma or the Point as Separators

Convert Python strings into floats explicitly using the comma or the point as separators

because I don't know the locale settings

You could look that up using the locale module:

>>> locale.nl_langinfo(locale.RADIXCHAR)
'.'

or

>>> locale.localeconv()['decimal_point']
'.'

Using that, your code could become:

import locale
_locale_radix = locale.localeconv()['decimal_point']

def read_float_with_comma(num):
if _locale_radix != '.':
num = num.replace(_locale_radix, ".")
return float(num)

Better still, the same module has a conversion function for you, called atof():

import locale

def read_float_with_comma(num):
return locale.atof(num)

How can I convert a string with dot and comma into a float in Python

Just remove the , with replace():

float("123,456.908".replace(',',''))

python pandas - generic ways to deal with commas in string to float conversion with astype()

I fixed the problem with the following workaround. This still might break in some cases but I did not find a way to tell pands astype() that a comma is ok. If someone has another solution with pandas only, please let me know:

import locale
from datetime import datetime
import pandas as pd

data = {
"col_str": ["a", "b", "c"],
"col_int": ["1", "2", "3"],
"col_float": ["1,2", "3,2342", "97837,8277"],
"col_float2": ["13,2", "3234,2342", "263,8277"],
"col_date": [datetime(2020, 8, 1, 0, 3, 4).isoformat(),
datetime(2020, 8, 2, 2, 4, 5).isoformat(),
datetime(2020, 8, 3, 6, 8, 4).isoformat()
]
}

conversion_dict = {
"col_str": str,
"col_int": int,
"col_float": float,
"col_float2": float,
"col_date": "datetime64"
}

df = pd.DataFrame(data=data)
throw_error = True

try:
df = df.astype(conversion_dict, errors="raise")
except ValueError as e:
error_message = str(e).strip().upper()
error_search = "COULD NOT CONVERT STRING TO FLOAT:"
# compare error messages to only get the string to float error because pandas only throws ValueError´s which
# are not datatype specific. This might be quite hacky because error messages could change.
if error_message[:len(error_search)] == error_search:
# convert everything else and ignore errors for the float columns
df = df.astype(conversion_dict, errors="ignore")
# go over the conversion dict
for key, value in conversion_dict.items():
# print(str(key) + ":" + str(value) + ":" + str(df[key].dtype))
# only apply to convert-to-float-columns which are not already in the correct pandas type float64
# if you don´t check for correctly classified types, .str.replace() throws an error
if (value == float or value == "float") and df[key].dtype != "float64":
# df[key].apply(locale.atof) or anythin locale related is plattform dependant and therefore bad
# in my opinion
# locale settings for atof
# WINDOWS: locale.setlocale(locale.LC_ALL, 'deu_deu')
# UNIX: locale.setlocale(locale.LC_ALL, 'de_DE')
df[key] = pd.to_numeric(df[key].str.replace(',', '.'))
else:
if throw_error:
# or do whatever is best suited for your use case
raise ValueError(str(e))
else:
df = df.astype(conversion_dict, errors="ignore")

print(df.dtypes)
print(df)

How can I get Python to recognize comma as the decimal point in user input?

You can use the replace method:

prvotna_cena = float(input('Prosim vnesi prvotno ceno:').replace(',','.'))

Locale-indepenent string to float conversion in python

You can make some assumptions on which character is the thousands separator and which is the decimal point. However, there is a case where you cannot know for sure what do do:

  • Look for the last character that is . or ,. If it occurs more than once, the number does not have a decimal point and that character is the thousands separator
  • If the string contains exactly one of each, the last one is the decimal point
  • If the string contains only one point/comma, you are pretty much out of luck: 123.456 or 123,456 might be the number 123456 or 123.456. However, with a number like 123.45 - i.e. the number of digits after the potential thousands separator not being a multiple of three - you can assume that it's a decimal point.

replace dot and comma with each other from a number str in python

You might want to rely on switching the locale of the number, should such an API exist within Python. If you must do the substitution manually, then use re.sub with lambda callback:

amt = '1.233.456.778,00'
output = re.sub(r'[.,]', lambda x: '.' if x.group() == ',' else ',', amt)
print(output) # 1,233,456,778.00

Note that this approach gets around the problem with your current approach, namely that if the string happens to have an @ in it, then your logic would fail.

How do I parse a string to a float or int?

>>> a = "545.2222"
>>> float(a)
545.22220000000004
>>> int(float(a))
545

How to format numbers floats with comma and dot

As you've noticed, the display option only affects the display. So you need to do an explicit conversion, possibly using locale.format(), if you want to actually convert the column to a string.

The locale methods are also somewhat limited in what they can do, so I'd recommend using the Babel module for internationalization and localization. Babel has a richer API and it actually ships localization data you can use (so you don't need to depend on it being available in your O.S.) It also includes data about currencies, so it can do that conversion for you as well.

You can install Babel with:

pip install Babel

And then you can convert your columns to use Brazilian Real currency with:

from babel.numbers import format_currency

df['close'] = df['close'].apply(
lambda v: format_currency(v, 'BRL', locale='pt_BR'),
)

Or, to convert both "high" and "close" together:

df[['high', 'close']] = df[['high', 'close']].applymap(
lambda v: format_currency(v, 'BRL', locale='pt_BR'),
)

If you're generating HTML from the DataFrame (for example, in a Jupyter notebook), you can use the Styling API to apply the format only when rendering the DataFrame, keeping the underlying data as floats and not strings:

df.style.format(
lambda v: format_currency(v, 'BRL', locale='pt_BR'),
subset=['high', 'close'],
)


Related Topics



Leave a reply



Submit