Convert the String 2.90K to 2900 or 5.2M to 5200000 in Pandas Dataframe

Convert the string 0.12M to 120000 or 0.11K to 110 in pandas dataframe

If no unit means M then change .fillna(10**6) instead fillna(1) and processing column col1 instead col:

df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) * 
df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
.fillna(10**6)
.replace(['K','M'],[10**3,10**6]).astype(int))

print (df)
col1 col2 col
0 0.11K 110 110.0
1 1011K 1011000 1011000.0
2 0.12M 120000 120000.0
3 0 0 0.0
4 0.3 300000 300000.0
5 0.02 20000 20000.0

Your solution from Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe:

df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) * 
df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
.fillna(1)
.replace(['K','M'],[10**3,10**6]).astype(int))

print (df)
col1 col2 col
0 0.11K 110 110.00
1 1011K 1011000 1011000.00
2 0.12M 120000 120000.00
3 0 0 0.00
4 0.3 300000 0.30
5 0.02 20000 0.02

Applying a function to a dataframe does not work

TL;DR:

What you are looking for is .applymap()

Details:

Your method is actually written well and can be used in .apply() as-is, for a pandas.Series object, but I assume that if you are experiencing issues, it is due to the fact that you are probably using it for a pandas.DataFrame, against multiple columns.
In such a case, the argument passed to num_repair is actually of type pandas.Series, which num_repair is not really meant to support.
I can only assume, since the code that uses num_repair isn't given. Consider adding it for the completeness of the question.

If so, you can use it as follows:

df = pd.DataFrame([
['1M', '1B', '1TR'],
['22M', '22B', '22TR'],
], columns=[1990, 1991, 1992])
df.applymap(num_repair)

output:


1990 1991 1992
0 1000000 1000000000 1000000000000
1 22000000 22000000000 22000000000000

Side Note

If you want to apply it to all columns except the country, since the name may contain B / TR / M - you can do the following:

df = pd.DataFrame([
['countryM', '1M', '1B', '1TR'],
['countryB', '22M', '22B', '22TR'],
], columns=['country', 1990, 1991, 1992])
df.loc[:, df.columns.drop('country')] = df.loc[:, df.columns.drop('country')].applymap(num_repair)
df

output:

    country     1990        1991        1992
0 countryM 1000000 1000000000 1000000000000
1 countryB 22000000 22000000000 22000000000000

How to convert numeric strings such as 200.13K and 1.2M to integer using pandas?

There are a few ways to do this. My favourite is using replace and pd.eval. Assuming "Vol" is a string column, you can do:

df['Vol'].replace({'K': '*1e3', 'M': '*1e6'}, regex=True).map(pd.eval)

0 920810.0
1 1280000.0
2 2190000.0
3 443660.0
4 682810.0
Name: Vol, dtype: float64

Depending on the orders of magnitude you need to support, you can modify the replacement dict as needed.

convert a pandas series in a DataFrame from a string (financial abbreviations) to numeric

I'm a fan of this approach

mapping = dict(K='E3', M='E6', B='E9')

df.assign(Property_Damage=pd.to_numeric(
df.Property_Damage.replace(mapping, regex=True)))

EVENT_TYPE ID Property_Damage
0 Flood 1 2500.0
1 Hail 2 0.0
2 Fire 3 400000.0
3 Tornado 4 1000.0
4 Flood 5 NaN
5 Fire 6 1000.0

You can get your NaN filled with 0

mapping = dict(K='E3', M='E6', B='E9')

df.assign(Property_Damage=pd.to_numeric(
df.Property_Damage.fillna(0).replace(mapping, regex=True)))

EVENT_TYPE ID Property_Damage
0 Flood 1 2500.0
1 Hail 2 0.0
2 Fire 3 400000.0
3 Tornado 4 1000.0
4 Flood 5 0.0
5 Fire 6 1000.0


Related Topics



Leave a reply



Submit