Pandas dataframe strip non-numeric characters
You can do this way using str.replace(r"[a-zA-Z]",'')
to remove the alphabet characters. If you need you can add more characters on this class to remove those also.
import pandas as pd
df = pd.read_csv("test.csv", names=['Accuracy', 'Error rate', 'Not classified'])
df['Accuracy'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Error rate'] = df['Error rate'].str.replace(r"[a-zA-Z]",'')
df['Not classified'] = df['Not classified'].str.replace(r"[a-zA-Z]",'')
print(df)
DEMO: https://repl.it/@SanyAhmed/EarnestTatteredRepo
Remove non-numeric in df column with different datatypes
This should work : df['Volumne'] = df['Volume'].str.replace(r'[^0-9.]', '')
How can I strip off all non-numeric characters in a Pandas Series
You have 2 different problems here:
- first is to extract digits from the column cells
- second is to make a list if you have more than one digit
Just chain both operations:
df[col].str.findall(r'\d').apply(lambda x: x[0] if len(x) == 1 else '' if len(x) == 0 else x)
With you example it gives:
0 4
1 4
2 4
3 [3, 4]
4 4
5 4
How do I remove all non- numerical numbers from entire data frame: Debugging
I think problem is need specify columns for replace and replace empty value to NaN
or 0
if not numeric like second last Size
value:
cols = ['Size','Installs']
df[cols] = df[cols].replace('[^\d.]', '', regex = True).replace('',np.nan).astype(float)
print (df)
Rating Reviews Size Installs Type Price
0 4.1 159 19.0 10000.0 Free 0
1 3.9 967 14.0 500000.0 Free 0
2 4.7 87510 8.7 5000000.0 Free 0
3 4.5 215644 25.0 50000000.0 Free 0
4 4.3 967 2.8 100000.0 Free 0
10836 4.5 38 53.0 5000.0 Free 0
10837 5.0 4 3.6 100.0 Free 0
10838 0.0 3 9.5 1000.0 Free 0
10839 4.5 114 NaN 1000.0 Free 0
10840 4.5 398307 19.0 10000000.0 Free 0
Delete all but numerical values in a column(s) using Pandas
Just an extra information, I did like this and got the opposite :D
import pandas as pd
df.replace(to_replace=r'[^a-zA-Z#]', value='', regex=True)
Size Total
0 TB TB
1 G G
2 A A
Since you changed your question, I did like this, maybe someone could have a better answer.
df['Size'] = df['Size'].str.replace("[^[^0-9]+", " ")
df['Total'] = df['Total'].str.replace("[^[^0-9]+", " ")
df
output:
Size Total ID
0 110 200 A
1 100 300 B
2 500 700 C
How do I remove non-numeric values from specific column in pandas?
Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). The int()
function takes an optional second argument for the base. We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:
df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))
Related Topics
Delete Rows Containing Numeric Values in Strings from Pandas Dataframe
Find the Index of the First Digit in a String
Using SQL Server Stored Procedures from Python (Pyodbc)
How to Convert Strings With Billion or Million Abbreviation into Integers in a List
Python Menu-Driven Programming
Numpy Array Typeerror: Only Integer Scalar Arrays Can Be Converted to a Scalar Index
Vary the Color of Each Bar in Bargraph Using Particular Value
Python - Get Path of Root Project Structure
How to Get the Column Name in Pandas Based on Row Values
Broadcast One Channel in Numpy Array into Three Channels
Quickest Way to Find the Nth Largest Value in a Numpy Matrix
Comparing Items in Lists Within Same Indices Python
Printing Lists in Python Without Spaces
How to Pass Variables from Python Script to Bash Script
Calculating the Area Under a Curve Given a Set of Coordinates, Without Knowing the Function
Move Seaborn Plot Legend to a Different Position
How to Get Interactive Plots Again in Spyder/Ipython/Matplotlib
Python: How to Find the First Day of Every Month Between Two Date Ranges