Drop Non-Numeric Columns from a Pandas Dataframe

Drop non-numeric columns from a pandas DataFrame

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:

source.select_dtypes(['number']) or source.select_dtypes([np.number]

How do you delete a non-numeric column from an input dataset?

From the docs you can just select the numeric data by filtering using select_dtypes:

In [5]:
df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),'b': [True, False] * 3,'c': [1.0, 2.0] * 3})
df

Out[5]:
a b c
0 0.338710 True 1
1 1.530095 False 2
2 -0.048261 True 1
3 -0.505742 False 2
4 0.729667 True 1
5 -0.634482 False 2

In [15]:
df.select_dtypes(include=[np.number])

Out[15]:
a c
0 0.338710 1
1 1.530095 2
2 -0.048261 1
3 -0.505742 2
4 0.729667 1
5 -0.634482 2

You can pass any valid np dtype hierarchy

Python: how to drop all the non numeric values from a pandas column?

Use to_numeric with errors='coerce' and Series.notna for filtering by boolean indexing:

df = df[pd.to_numeric(df['Rooms'],errors='coerce').notna()]
print (df)
Rooms BFS
0 3.5 4201
1 1.5 4201
4 5.5 4201
5 5 4201
6 4.5 4201
7 3 4201
9 3 4201

If need numeric in output first assign to same column and then use DataFrame.dropna:

df['Rooms'] = pd.to_numeric(df['Rooms'],errors='coerce')
df = df.dropna(subset=['Rooms'])
print (df)
Rooms BFS
0 3.5 4201
1 1.5 4201
4 5.5 4201
5 5.0 4201
6 4.5 4201
7 3.0 4201
9 3.0 4201

PySpark: How to drop non-numeric columnsfr a DataFrame?

First of all, please find here a reference on different PySpark types.

The code below removes the String cols:

df = spark.createDataFrame([
(1, "a", "xxx", None, "abc", "xyz","fgh"),
(2, "b", None, 3, "abc", "xyz","fgh"),
(3, "c", "a23", None, None, "xyz","fgh")
], ("ID","flag", "col1", "col2", "col3", "col4", "col5"))

from pyspark.sql.types import *
num_cols = [f.name for f in df.schema.fields if not isinstance(f.dataType, StringType)]

df2 = df.select([c for c in num_cols])
df2.show()

+---+----+
| ID|col2|
+---+----+
| 1|null|
| 2| 3|
| 3|null|
+---+----+

Alternatively (to be precise) you can replace not isinstance with isinstance and include the types from the link above you are interested in.
Hope this helps.

Remove pandas dataframe row if one column's element is non-numeric

You can use the 'to_numeric' operation of pandas which will throw up an error when a string value is met. This error will be handled by 'coerce' which forces the content to null. We then use the 'notnull' operation to filter this particular row.

df[pd.to_numeric(df['price'], errors='coerce').notnull()]


Related Topics



Leave a reply



Submit