How to Remove All Non-Numeric Characters from All the Values in a Particular Column in Pandas Dataframe

Pandas dataframe strip non-numeric characters

You can do this way using str.replace(r"[a-zA-Z]",'') to remove the alphabet characters. If you need you can add more characters on this class to remove those also.

import pandas as pd

df = pd.read_csv("test.csv", names=['Accuracy', 'Error rate', 'Not classified'])

df['Accuracy'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Error rate'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Not classified'] = df['Not classified'].str.replace(r"[a-zA-Z]",'')
print(df)

DEMO: https://repl.it/@SanyAhmed/EarnestTatteredRepo

Remove non-numeric in df column with different datatypes

This should work : df['Volumne'] = df['Volume'].str.replace(r'[^0-9.]', '')

How can I strip off all non-numeric characters in a Pandas Series

You have 2 different problems here:

  • first is to extract digits from the column cells
  • second is to make a list if you have more than one digit

Just chain both operations:

df[col].str.findall(r'\d').apply(lambda x: x[0] if len(x) == 1 else '' if len(x) == 0 else x)

With you example it gives:

0         4
1 4
2 4
3 [3, 4]
4 4
5 4

How do I remove all non- numerical numbers from entire data frame: Debugging

I think problem is need specify columns for replace and replace empty value to NaN or 0 if not numeric like second last Size value:

cols = ['Size','Installs']
df[cols] = df[cols].replace('[^\d.]', '', regex = True).replace('',np.nan).astype(float)

print (df)
Rating Reviews Size Installs Type Price
0 4.1 159 19.0 10000.0 Free 0
1 3.9 967 14.0 500000.0 Free 0
2 4.7 87510 8.7 5000000.0 Free 0
3 4.5 215644 25.0 50000000.0 Free 0
4 4.3 967 2.8 100000.0 Free 0
10836 4.5 38 53.0 5000.0 Free 0
10837 5.0 4 3.6 100.0 Free 0
10838 0.0 3 9.5 1000.0 Free 0
10839 4.5 114 NaN 1000.0 Free 0
10840 4.5 398307 19.0 10000000.0 Free 0

Delete all but numerical values in a column(s) using Pandas

Just an extra information, I did like this and got the opposite :D

import pandas as pd
df.replace(to_replace=r'[^a-zA-Z#]', value='', regex=True)

Size Total
0 TB TB
1 G G
2 A A

Since you changed your question, I did like this, maybe someone could have a better answer.

df['Size'] = df['Size'].str.replace("[^[^0-9]+", " ") 
df['Total'] = df['Total'].str.replace("[^[^0-9]+", " ")
df

output:

Size Total ID
0 110 200 A
1 100 300 B
2 500 700 C

How do I remove non-numeric values from specific column in pandas?

Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). The int() function takes an optional second argument for the base. We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:

df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))


Related Topics



Leave a reply



Submit