Convert Number Strings With Commas in Pandas Dataframe to Float

Converting values with commas in a pandas dataframe to floats.

Convert 'Date' using to_datetime for the other use str.replace(',','.') and then cast the type:

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df['Close_y'] = df['Close_y'].str.replace(',','.').astype(float)

replace looks for exact matches, what you're trying to do is replace any match in the string

Convert Pandas Dataframe to Float with commas and negative numbers

It seems you need replace , to empty strings:

print (df)
2016-10-31 2,144.78
2016-07-31 2,036.62
2016-04-30 1,916.60
2016-01-31 1,809.40
2015-10-31 1,711.97
2016-01-31 6,667.22
2015-01-31 5,373.59
2014-01-31 4,071.00
2013-01-31 3,050.20
2016-09-30 -0.06
2016-06-30 -1.88
2016-03-31
2015-12-31 -0.13
2015-09-30
2015-12-31 -0.14
2014-12-31 0.07
2013-12-31 0
2012-12-31 0
Name: val, dtype: object
print (pd.to_numeric(df.str.replace(',',''), errors='coerce'))
2016-10-31 2144.78
2016-07-31 2036.62
2016-04-30 1916.60
2016-01-31 1809.40
2015-10-31 1711.97
2016-01-31 6667.22
2015-01-31 5373.59
2014-01-31 4071.00
2013-01-31 3050.20
2016-09-30 -0.06
2016-06-30 -1.88
2016-03-31 NaN
2015-12-31 -0.13
2015-09-30 NaN
2015-12-31 -0.14
2014-12-31 0.07
2013-12-31 0.00
2012-12-31 0.00
Name: val, dtype: float64

EDIT:

If use append, then is possible dtype of first df is float and second object, so need cast to str first, because get mixed DataFrame - e.g. first rows are type float and last rows are strings:

print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce'))

Also is possible check types by:

print (df.apply(type))
2016-09-30 <class 'float'>
2016-06-30 <class 'float'>
2015-12-31 <class 'float'>
2014-12-31 <class 'float'>
2014-01-31 <class 'str'>
2013-01-31 <class 'str'>
2016-09-30 <class 'str'>
2016-06-30 <class 'str'>
2016-03-31 <class 'str'>
2015-12-31 <class 'str'>
2015-09-30 <class 'str'>
2015-12-31 <class 'str'>
2014-12-31 <class 'str'>
2013-12-31 <class 'str'>
2012-12-31 <class 'str'>
Name: val, dtype: object

EDIT1:

If need apply solution for all columns of DataFrame use apply:

df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))
print (df1)
Revenue Other, Net
Date
2016-09-30 24.73 -0.06
2016-06-30 18.73 -1.88
2016-03-31 17.56 NaN
2015-12-31 29.14 -0.13
2015-09-30 22.67 NaN
2015-12-31 95.85 -0.14
2014-12-31 84.58 0.07
2013-12-31 58.33 0.00
2012-12-31 29.63 0.00
2016-09-30 243.91 -0.80
2016-06-30 230.77 -1.12
2016-03-31 216.58 1.32
2015-12-31 206.23 -0.05
2015-09-30 192.82 -0.34
2015-12-31 741.15 -1.37
2014-12-31 556.28 -1.90
2013-12-31 414.51 -1.48
2012-12-31 308.82 0.10
2016-10-31 2144.78 41.98
2016-07-31 2036.62 35.00
2016-04-30 1916.60 -11.66
2016-01-31 1809.40 27.09
2015-10-31 1711.97 -3.44
2016-01-31 6667.22 14.13
2015-01-31 5373.59 -18.69
2014-01-31 4071.00 -4.87
2013-01-31 3050.20 -5.70

print(df1.dtypes)
Revenue float64
Other, Net float64
dtype: object

But if need convert only some columns of DataFrame use subset and apply:

cols = ['Revenue', ...]
df[cols] = df[cols].apply(lambda x: pd.to_numeric(x.astype(str)
.str.replace(',',''), errors='coerce'))
print (df)
Revenue Other, Net
Date
2016-09-30 24.73 -0.06
2016-06-30 18.73 -1.88
2016-03-31 17.56
2015-12-31 29.14 -0.13
2015-09-30 22.67
2015-12-31 95.85 -0.14
2014-12-31 84.58 0.07
2013-12-31 58.33 0
2012-12-31 29.63 0
2016-09-30 243.91 -0.8
2016-06-30 230.77 -1.12
2016-03-31 216.58 1.32
2015-12-31 206.23 -0.05
2015-09-30 192.82 -0.34
2015-12-31 741.15 -1.37
2014-12-31 556.28 -1.9
2013-12-31 414.51 -1.48
2012-12-31 308.82 0.1
2016-10-31 2144.78 41.98
2016-07-31 2036.62 35
2016-04-30 1916.60 -11.66
2016-01-31 1809.40 27.09
2015-10-31 1711.97 -3.44
2016-01-31 6667.22 14.13
2015-01-31 5373.59 -18.69
2014-01-31 4071.00 -4.87
2013-01-31 3050.20 -5.7

print(df.dtypes)
Revenue float64
Other, Net object
dtype: object

EDIT2:

Solution for your bonus problem:

df = pd.DataFrame({'A':['q','e','r'],
'B':['4','5','q'],
'C':[7,8,9.0],
'D':['1,000','3','50,000'],
'E':['5','3','6'],
'F':['w','e','r']})

print (df)
A B C D E F
0 q 4 7.0 1,000 5 w
1 e 5 8.0 3 3 e
2 r q 9.0 50,000 6 r
#first apply original solution
df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))
print (df1)
A B C D E F
0 NaN 4.0 7.0 1000 5 NaN
1 NaN 5.0 8.0 3 3 NaN
2 NaN NaN 9.0 50000 6 NaN

#mask where all columns are NaN - string columns
mask = df1.isnull().all()
print (mask)
A True
B False
C False
D False
E False
F True
dtype: bool
#replace NaN to string columns
df1.loc[:, mask] = df1.loc[:, mask].combine_first(df)
print (df1)
A B C D E F
0 q 4.0 7.0 1000 5 w
1 e 5.0 8.0 3 3 e
2 r NaN 9.0 50000 6 r

python pandas - generic ways to deal with commas in string to float conversion with astype()

I fixed the problem with the following workaround. This still might break in some cases but I did not find a way to tell pands astype() that a comma is ok. If someone has another solution with pandas only, please let me know:

import locale
from datetime import datetime
import pandas as pd

data = {
"col_str": ["a", "b", "c"],
"col_int": ["1", "2", "3"],
"col_float": ["1,2", "3,2342", "97837,8277"],
"col_float2": ["13,2", "3234,2342", "263,8277"],
"col_date": [datetime(2020, 8, 1, 0, 3, 4).isoformat(),
datetime(2020, 8, 2, 2, 4, 5).isoformat(),
datetime(2020, 8, 3, 6, 8, 4).isoformat()
]
}

conversion_dict = {
"col_str": str,
"col_int": int,
"col_float": float,
"col_float2": float,
"col_date": "datetime64"
}

df = pd.DataFrame(data=data)
throw_error = True

try:
df = df.astype(conversion_dict, errors="raise")
except ValueError as e:
error_message = str(e).strip().upper()
error_search = "COULD NOT CONVERT STRING TO FLOAT:"
# compare error messages to only get the string to float error because pandas only throws ValueError´s which
# are not datatype specific. This might be quite hacky because error messages could change.
if error_message[:len(error_search)] == error_search:
# convert everything else and ignore errors for the float columns
df = df.astype(conversion_dict, errors="ignore")
# go over the conversion dict
for key, value in conversion_dict.items():
# print(str(key) + ":" + str(value) + ":" + str(df[key].dtype))
# only apply to convert-to-float-columns which are not already in the correct pandas type float64
# if you don´t check for correctly classified types, .str.replace() throws an error
if (value == float or value == "float") and df[key].dtype != "float64":
# df[key].apply(locale.atof) or anythin locale related is plattform dependant and therefore bad
# in my opinion
# locale settings for atof
# WINDOWS: locale.setlocale(locale.LC_ALL, 'deu_deu')
# UNIX: locale.setlocale(locale.LC_ALL, 'de_DE')
df[key] = pd.to_numeric(df[key].str.replace(',', '.'))
else:
if throw_error:
# or do whatever is best suited for your use case
raise ValueError(str(e))
else:
df = df.astype(conversion_dict, errors="ignore")

print(df.dtypes)
print(df)

Converting string variable with double commas into float?

If you always have 2 decimal digits:

df['min'] = pd.to_numeric(df['min'].str.replace('.', '', regex=False)).div(100)

output (as new column min2 for clarity):

        min     min2
0 9.50 9.50
1 10.00 10.00
2 3.45 3.45
3 1.095.50 1095.50
4 13.25 13.25

Pandas convert numbers with a comma instead of the point for the decimal separator from objects to numbers

You can replace , with .:

df['ColumnName'] = pd.to_numeric(df['ColumnName'].str.replace(',', '.'))

On the other note, if you read the data with pd.read_csv, there's an option decimal=','.

How can I convert a string with dot and comma into a float in Python

Just remove the , with replace():

float("123,456.908".replace(',',''))


Related Topics



Leave a reply



Submit