Drop non-numeric columns from a pandas DataFrame
To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.
Ran into it on this post on the exact same thing.
Or in your case, specifically:source.select_dtypes(['number']) or source.select_dtypes([np.number]
How do you delete a non-numeric column from an input dataset?
From the docs you can just select the numeric data by filtering using select_dtypes
:
In [5]:
df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),'b': [True, False] * 3,'c': [1.0, 2.0] * 3})
df
Out[5]:
a b c
0 0.338710 True 1
1 1.530095 False 2
2 -0.048261 True 1
3 -0.505742 False 2
4 0.729667 True 1
5 -0.634482 False 2
In [15]:
df.select_dtypes(include=[np.number])
Out[15]:
a c
0 0.338710 1
1 1.530095 2
2 -0.048261 1
3 -0.505742 2
4 0.729667 1
5 -0.634482 2
You can pass any valid np dtype hierarchy
Python: how to drop all the non numeric values from a pandas column?
Use to_numeric
with errors='coerce'
and Series.notna
for filtering by boolean indexing
:
df = df[pd.to_numeric(df['Rooms'],errors='coerce').notna()]
print (df)
Rooms BFS
0 3.5 4201
1 1.5 4201
4 5.5 4201
5 5 4201
6 4.5 4201
7 3 4201
9 3 4201
If need numeric in output first assign to same column and then use DataFrame.dropna
:
df['Rooms'] = pd.to_numeric(df['Rooms'],errors='coerce')
df = df.dropna(subset=['Rooms'])
print (df)
Rooms BFS
0 3.5 4201
1 1.5 4201
4 5.5 4201
5 5.0 4201
6 4.5 4201
7 3.0 4201
9 3.0 4201
PySpark: How to drop non-numeric columnsfr a DataFrame?
First of all, please find here a reference on different PySpark types.
The code below removes the String cols:
df = spark.createDataFrame([
(1, "a", "xxx", None, "abc", "xyz","fgh"),
(2, "b", None, 3, "abc", "xyz","fgh"),
(3, "c", "a23", None, None, "xyz","fgh")
], ("ID","flag", "col1", "col2", "col3", "col4", "col5"))
from pyspark.sql.types import *
num_cols = [f.name for f in df.schema.fields if not isinstance(f.dataType, StringType)]
df2 = df.select([c for c in num_cols])
df2.show()
+---+----+
| ID|col2|
+---+----+
| 1|null|
| 2| 3|
| 3|null|
+---+----+
Alternatively (to be precise) you can replace not isinstance
with isinstance
and include the types from the link above you are interested in.
Hope this helps.
Remove pandas dataframe row if one column's element is non-numeric
You can use the 'to_numeric' operation of pandas which will throw up an error when a string value is met. This error will be handled by 'coerce' which forces the content to null. We then use the 'notnull' operation to filter this particular row.
df[pd.to_numeric(df['price'], errors='coerce').notnull()]
Related Topics
Formal and Actual Parameters in a Function in Python
Pandas - Drop Last Column of Dataframe
Finding Non-Numeric Rows in Dataframe in Pandas
Pandas - How to Compare 2 CSV Files and Output Changes
Pandas: Calculate Total Percent Difference Between Two Data Frames
Selenium - Iterating Through Groups of Elements - Python
How to Change the Foreground or Background Colour of a Tkinter Button on MAC Os X
Cv2 Image Error: Error: (-215:Assertion Failed) !Ssize.Empty() in Function 'Cv::Resize'
How to Print Superscript in Python
Python Pandas Count the Number of Occurances Inside Lists in a Column
How to Delete a Column That Contains Only Zeros in Pandas
Iterate Over Worksheets, Rows, Columns
How to Change a Dataframe Column from String Type to Double Type in Pyspark
Printing Even Characters With Strings in Python
Pip Install Pandas: Installing Dependencies Error
Python Regex - How to Get Positions and Values of Matches