Converting values with commas in a pandas dataframe to floats.
Convert 'Date' using to_datetime
for the other use str.replace(',','.')
and then cast the type:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df['Close_y'] = df['Close_y'].str.replace(',','.').astype(float)
replace
looks for exact matches, what you're trying to do is replace any match in the string
Convert Pandas Dataframe to Float with commas and negative numbers
It seems you need replace
,
to empty strings
:
print (df)
2016-10-31 2,144.78
2016-07-31 2,036.62
2016-04-30 1,916.60
2016-01-31 1,809.40
2015-10-31 1,711.97
2016-01-31 6,667.22
2015-01-31 5,373.59
2014-01-31 4,071.00
2013-01-31 3,050.20
2016-09-30 -0.06
2016-06-30 -1.88
2016-03-31
2015-12-31 -0.13
2015-09-30
2015-12-31 -0.14
2014-12-31 0.07
2013-12-31 0
2012-12-31 0
Name: val, dtype: object
print (pd.to_numeric(df.str.replace(',',''), errors='coerce'))
2016-10-31 2144.78
2016-07-31 2036.62
2016-04-30 1916.60
2016-01-31 1809.40
2015-10-31 1711.97
2016-01-31 6667.22
2015-01-31 5373.59
2014-01-31 4071.00
2013-01-31 3050.20
2016-09-30 -0.06
2016-06-30 -1.88
2016-03-31 NaN
2015-12-31 -0.13
2015-09-30 NaN
2015-12-31 -0.14
2014-12-31 0.07
2013-12-31 0.00
2012-12-31 0.00
Name: val, dtype: float64
EDIT:
If use append, then is possible dtype
of first df
is float
and second object
, so need cast to str
first, because get mixed DataFrame
- e.g. first rows are type
float
and last rows are strings
:
print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce'))
Also is possible check types
by:
print (df.apply(type))
2016-09-30 <class 'float'>
2016-06-30 <class 'float'>
2015-12-31 <class 'float'>
2014-12-31 <class 'float'>
2014-01-31 <class 'str'>
2013-01-31 <class 'str'>
2016-09-30 <class 'str'>
2016-06-30 <class 'str'>
2016-03-31 <class 'str'>
2015-12-31 <class 'str'>
2015-09-30 <class 'str'>
2015-12-31 <class 'str'>
2014-12-31 <class 'str'>
2013-12-31 <class 'str'>
2012-12-31 <class 'str'>
Name: val, dtype: object
EDIT1:
If need apply solution for all columns of DataFrame
use apply
:
df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))
print (df1)
Revenue Other, Net
Date
2016-09-30 24.73 -0.06
2016-06-30 18.73 -1.88
2016-03-31 17.56 NaN
2015-12-31 29.14 -0.13
2015-09-30 22.67 NaN
2015-12-31 95.85 -0.14
2014-12-31 84.58 0.07
2013-12-31 58.33 0.00
2012-12-31 29.63 0.00
2016-09-30 243.91 -0.80
2016-06-30 230.77 -1.12
2016-03-31 216.58 1.32
2015-12-31 206.23 -0.05
2015-09-30 192.82 -0.34
2015-12-31 741.15 -1.37
2014-12-31 556.28 -1.90
2013-12-31 414.51 -1.48
2012-12-31 308.82 0.10
2016-10-31 2144.78 41.98
2016-07-31 2036.62 35.00
2016-04-30 1916.60 -11.66
2016-01-31 1809.40 27.09
2015-10-31 1711.97 -3.44
2016-01-31 6667.22 14.13
2015-01-31 5373.59 -18.69
2014-01-31 4071.00 -4.87
2013-01-31 3050.20 -5.70
print(df1.dtypes)
Revenue float64
Other, Net float64
dtype: object
But if need convert only some columns of DataFrame
use subset
and apply
:
cols = ['Revenue', ...]
df[cols] = df[cols].apply(lambda x: pd.to_numeric(x.astype(str)
.str.replace(',',''), errors='coerce'))
print (df)
Revenue Other, Net
Date
2016-09-30 24.73 -0.06
2016-06-30 18.73 -1.88
2016-03-31 17.56
2015-12-31 29.14 -0.13
2015-09-30 22.67
2015-12-31 95.85 -0.14
2014-12-31 84.58 0.07
2013-12-31 58.33 0
2012-12-31 29.63 0
2016-09-30 243.91 -0.8
2016-06-30 230.77 -1.12
2016-03-31 216.58 1.32
2015-12-31 206.23 -0.05
2015-09-30 192.82 -0.34
2015-12-31 741.15 -1.37
2014-12-31 556.28 -1.9
2013-12-31 414.51 -1.48
2012-12-31 308.82 0.1
2016-10-31 2144.78 41.98
2016-07-31 2036.62 35
2016-04-30 1916.60 -11.66
2016-01-31 1809.40 27.09
2015-10-31 1711.97 -3.44
2016-01-31 6667.22 14.13
2015-01-31 5373.59 -18.69
2014-01-31 4071.00 -4.87
2013-01-31 3050.20 -5.7
print(df.dtypes)
Revenue float64
Other, Net object
dtype: object
EDIT2:
Solution for your bonus problem:
df = pd.DataFrame({'A':['q','e','r'],
'B':['4','5','q'],
'C':[7,8,9.0],
'D':['1,000','3','50,000'],
'E':['5','3','6'],
'F':['w','e','r']})
print (df)
A B C D E F
0 q 4 7.0 1,000 5 w
1 e 5 8.0 3 3 e
2 r q 9.0 50,000 6 r
#first apply original solution
df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))
print (df1)
A B C D E F
0 NaN 4.0 7.0 1000 5 NaN
1 NaN 5.0 8.0 3 3 NaN
2 NaN NaN 9.0 50000 6 NaN
#mask where all columns are NaN - string columns
mask = df1.isnull().all()
print (mask)
A True
B False
C False
D False
E False
F True
dtype: bool
#replace NaN to string columns
df1.loc[:, mask] = df1.loc[:, mask].combine_first(df)
print (df1)
A B C D E F
0 q 4.0 7.0 1000 5 w
1 e 5.0 8.0 3 3 e
2 r NaN 9.0 50000 6 r
python pandas - generic ways to deal with commas in string to float conversion with astype()
I fixed the problem with the following workaround. This still might break in some cases but I did not find a way to tell pands astype() that a comma is ok. If someone has another solution with pandas only, please let me know:
import locale
from datetime import datetime
import pandas as pd
data = {
"col_str": ["a", "b", "c"],
"col_int": ["1", "2", "3"],
"col_float": ["1,2", "3,2342", "97837,8277"],
"col_float2": ["13,2", "3234,2342", "263,8277"],
"col_date": [datetime(2020, 8, 1, 0, 3, 4).isoformat(),
datetime(2020, 8, 2, 2, 4, 5).isoformat(),
datetime(2020, 8, 3, 6, 8, 4).isoformat()
]
}
conversion_dict = {
"col_str": str,
"col_int": int,
"col_float": float,
"col_float2": float,
"col_date": "datetime64"
}
df = pd.DataFrame(data=data)
throw_error = True
try:
df = df.astype(conversion_dict, errors="raise")
except ValueError as e:
error_message = str(e).strip().upper()
error_search = "COULD NOT CONVERT STRING TO FLOAT:"
# compare error messages to only get the string to float error because pandas only throws ValueError´s which
# are not datatype specific. This might be quite hacky because error messages could change.
if error_message[:len(error_search)] == error_search:
# convert everything else and ignore errors for the float columns
df = df.astype(conversion_dict, errors="ignore")
# go over the conversion dict
for key, value in conversion_dict.items():
# print(str(key) + ":" + str(value) + ":" + str(df[key].dtype))
# only apply to convert-to-float-columns which are not already in the correct pandas type float64
# if you don´t check for correctly classified types, .str.replace() throws an error
if (value == float or value == "float") and df[key].dtype != "float64":
# df[key].apply(locale.atof) or anythin locale related is plattform dependant and therefore bad
# in my opinion
# locale settings for atof
# WINDOWS: locale.setlocale(locale.LC_ALL, 'deu_deu')
# UNIX: locale.setlocale(locale.LC_ALL, 'de_DE')
df[key] = pd.to_numeric(df[key].str.replace(',', '.'))
else:
if throw_error:
# or do whatever is best suited for your use case
raise ValueError(str(e))
else:
df = df.astype(conversion_dict, errors="ignore")
print(df.dtypes)
print(df)
Converting string variable with double commas into float?
If you always have 2 decimal digits:
df['min'] = pd.to_numeric(df['min'].str.replace('.', '', regex=False)).div(100)
output (as new column min2 for clarity):
min min2
0 9.50 9.50
1 10.00 10.00
2 3.45 3.45
3 1.095.50 1095.50
4 13.25 13.25
Pandas convert numbers with a comma instead of the point for the decimal separator from objects to numbers
You can replace ,
with .
:
df['ColumnName'] = pd.to_numeric(df['ColumnName'].str.replace(',', '.'))
On the other note, if you read the data with pd.read_csv
, there's an option decimal=','
.
How can I convert a string with dot and comma into a float in Python
Just remove the ,
with replace()
:
float("123,456.908".replace(',',''))
Related Topics
Getting All Possible Combinations from a List With Duplicate Elements
Regular Expression to Check Whitespace in the Beginning and End of a String
How to Store Python Dictionary in to MySQL Db Through Python
Python, Delete Json Element Having Specific Key from a Loop
How to Allocate Array With Shape and Data Type
Python Pandas - Get Row Based on Previous Row Value
How to Fix the 403:Insufficient Authentication Scopes Error from Google Analytics User Deletion API
How to Execute Two Commands in Terminal Using Python'S Subprocess Module
Using Condition to Split Pandas Column of Lists into Multiple Columns.
Pandas Get the Most Frequent Values of a Column
Macos: How to Downgrade Homebrew Python
How to Insert String Value into Specific Column Value on Python Pandas
Tensorflow - Valueerror: Failed to Convert a Numpy Array to a Tensor (Unsupported Object Type Float)
Comparing Two Json Objects Irrespective of the Sequence of Elements in Them
How to Increase the Font Size of the Legend in My Seaborn Factorplot/Facetgrid
Use a Loop to Plot N Charts Python