Convert the string 0.12M to 120000 or 0.11K to 110 in pandas dataframe
If no unit means M
then change .fillna(10**6)
instead fillna(1)
and processing column col1
instead col
:
df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) *
df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
.fillna(10**6)
.replace(['K','M'],[10**3,10**6]).astype(int))
print (df)
col1 col2 col
0 0.11K 110 110.0
1 1011K 1011000 1011000.0
2 0.12M 120000 120000.0
3 0 0 0.0
4 0.3 300000 300000.0
5 0.02 20000 20000.0
Your solution from Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe:
df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) *
df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
.fillna(1)
.replace(['K','M'],[10**3,10**6]).astype(int))
print (df)
col1 col2 col
0 0.11K 110 110.00
1 1011K 1011000 1011000.00
2 0.12M 120000 120000.00
3 0 0 0.00
4 0.3 300000 0.30
5 0.02 20000 0.02
Applying a function to a dataframe does not work
TL;DR:
What you are looking for is .applymap()
Details:
Your method is actually written well and can be used in .apply()
as-is, for a pandas.Series
object, but I assume that if you are experiencing issues, it is due to the fact that you are probably using it for a pandas.DataFrame
, against multiple columns.
In such a case, the argument passed to num_repair
is actually of type pandas.Series
, which num_repair
is not really meant to support.
I can only assume, since the code that uses num_repair
isn't given. Consider adding it for the completeness of the question.
If so, you can use it as follows:
df = pd.DataFrame([
['1M', '1B', '1TR'],
['22M', '22B', '22TR'],
], columns=[1990, 1991, 1992])
df.applymap(num_repair)
output:
1990 1991 1992
0 1000000 1000000000 1000000000000
1 22000000 22000000000 22000000000000
Side Note
If you want to apply it to all columns except the country, since the name may contain B
/ TR
/ M
- you can do the following:
df = pd.DataFrame([
['countryM', '1M', '1B', '1TR'],
['countryB', '22M', '22B', '22TR'],
], columns=['country', 1990, 1991, 1992])
df.loc[:, df.columns.drop('country')] = df.loc[:, df.columns.drop('country')].applymap(num_repair)
df
output:
country 1990 1991 1992
0 countryM 1000000 1000000000 1000000000000
1 countryB 22000000 22000000000 22000000000000
How to convert numeric strings such as 200.13K and 1.2M to integer using pandas?
There are a few ways to do this. My favourite is using replace
and pd.eval
. Assuming "Vol" is a string column, you can do:
df['Vol'].replace({'K': '*1e3', 'M': '*1e6'}, regex=True).map(pd.eval)
0 920810.0
1 1280000.0
2 2190000.0
3 443660.0
4 682810.0
Name: Vol, dtype: float64
Depending on the orders of magnitude you need to support, you can modify the replacement dict as needed.
convert a pandas series in a DataFrame from a string (financial abbreviations) to numeric
I'm a fan of this approach
mapping = dict(K='E3', M='E6', B='E9')
df.assign(Property_Damage=pd.to_numeric(
df.Property_Damage.replace(mapping, regex=True)))
EVENT_TYPE ID Property_Damage
0 Flood 1 2500.0
1 Hail 2 0.0
2 Fire 3 400000.0
3 Tornado 4 1000.0
4 Flood 5 NaN
5 Fire 6 1000.0
You can get your NaN
filled with 0
mapping = dict(K='E3', M='E6', B='E9')
df.assign(Property_Damage=pd.to_numeric(
df.Property_Damage.fillna(0).replace(mapping, regex=True)))
EVENT_TYPE ID Property_Damage
0 Flood 1 2500.0
1 Hail 2 0.0
2 Fire 3 400000.0
3 Tornado 4 1000.0
4 Flood 5 0.0
5 Fire 6 1000.0
Related Topics
How to Properly Setup Pipenv in Pycharm
How to Print Colored Text to the Terminal
List Append Is Overwriting My Previous Values
Typeerror: Strptime() Argument 1 Must Be Str, Not List
Print a List of Space-Separated Elements
Visual Studio Code Intellisense Is Very Slow - Is There Anything I Can Do
How to Tell If Tensorflow Is Using Gpu Acceleration from Inside Python Shell
How to Easily Print Ascii-Art Text
How to Find Factors of a Number Using the Simplest Python Method
Iterate Through a List by Skipping Every 5Th Element
Pythonically Add Header to a CSV File
How to Serialize Sqlalchemy Result to Json
How to Get the Latest File in a Folder
Make Alternate Letters Capital
Get First Date and Last Date of Current Quarter in Python
Pandas - Find Index of Value Anywhere in Dataframe
Conda: Remove All Installed Packages from Base/Root Environment