Remove Non-Numeric Rows in One Column with Pandas

Remove non-numeric rows in one column with pandas

You could use standard method of strings isnumeric and apply it to each value in your id column:

import pandas as pd
from io import StringIO

data = """
id,name
1,A
2,B
3,C
tt,D
4,E
5,F
de,G
"""

df = pd.read_csv(StringIO(data))

In [55]: df
Out[55]: 
   id name
0   1    A
1   2    B
2   3    C
3  tt    D
4   4    E
5   5    F
6  de    G

In [56]: df[df.id.apply(lambda x: x.isnumeric())]
Out[56]: 
  id name
0  1    A
1  2    B
2  3    C
4  4    E
5  5    F

Or if you want to use id as index you could do:

In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id')
Out[61]: 
   name
id     
1     A
2     B
3     C
4     E
5     F

Edit. Add timings

Although case with pd.to_numeric is not using apply method it is almost two times slower than with applying np.isnumeric for str columns. Also I add option with using pandas str.isnumeric which is less typing and still faster then using pd.to_numeric. But pd.to_numeric is more general because it could work with any data types (not only strings).

df_big = pd.concat([df]*10000)

In [3]: df_big = pd.concat([df]*10000)

In [4]: df_big.shape
Out[4]: (70000, 2)

In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())]
15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit df_big[df_big.id.str.isnumeric()]
20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()]
29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Remove pandas dataframe row if one column's element is non-numeric

You can use the 'to_numeric' operation of pandas which will throw up an error when a string value is met. This error will be handled by 'coerce' which forces the content to null. We then use the 'notnull' operation to filter this particular row.

df[pd.to_numeric(df['price'], errors='coerce').notnull()]

Need to delete non-numeric rows from a dataframe

I ended up doing it this way.

cols = df_append.columns[:-1]
df_append[cols] = df_append[cols].apply(pd.to_numeric, errors='coerce')
df_append = df_append.fillna(0)

That's good enough for my purpose!

Python Pandas dropping Non numerical rows from columns

I think you need add isnull for checking NaN values, because your function return NaN if not number. Better and faster is use text method str.isnumeric() and str.isdigit() with boolean indexing:

print df['Score'].str.isnumeric()
0      NaN
1      NaN
2    False
3      NaN
4    False
Name: Score, dtype: object

print df['Score'].str.isnumeric().isnull()
0     True
1     True
2    False
3     True
4    False
Name: Score, dtype: bool

print df[df['Score'].str.isnumeric().isnull()]
   Faggio         Foo Score
0       0         Nis     4
1       1  and stimpy     6
3       1         cab     7

print df[df['Score'].str.isdigit().isnull()]
   Faggio         Foo Score
0       0         Nis     4
1       1  and stimpy     6
3       1         cab     7

Similar solution with to_numeric and notnull:

print df[pd.to_numeric(df['Score'], errors='coerce').notnull()]
   Faggio         Foo Score
0       0         Nis     4
1       1  and stimpy     6
3       1         cab     7

Remove non-numeric in df column with different datatypes

This should work : df['Volumne'] = df['Volume'].str.replace(r'[^0-9.]', '')

How to remove non numeric characters from an column

You can use pandas Series's vectorized counterpart of the re.sub method .str.replace to remove \D (match non numeric characters):

df.column1.str.replace('\D', '')

0     67512
1      2568
2      5647
3       NaN
4    222674
5     98789
Name: column1, dtype: object

Drop rows from a dataframe with a non-numeric index

You may use pd.to_numeric to convert your numbers column to numeric. All non-numeric entries will be coerced to NaN, and you can then just drop those rows.

df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')

df = df.dropna(subset=['numbers']).set_index('numbers')

How do I remove non-numeric values from specific column in pandas?

Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). The int() function takes an optional second argument for the base. We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:

df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))

get non numerical rows in a column pandas python

Use boolean indexing with mask created by to_numeric + isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'

print (pd.to_numeric(df['num'], errors='coerce'))
0   -1.48
1    1.70
2   -6.18
3    0.25
4     NaN
5    0.25
Name: num, dtype: float64

print (pd.to_numeric(df['num'], errors='coerce').isnull())
0    False
1    False
2    False
3    False
4     True
5    False
Name: num, dtype: bool

print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])
  N-D     num unit
4  Q5  sum(d)   UD

Another solution with isinstance and apply:

print (df[df['num'].apply(lambda x: isinstance(x, str))])
  N-D     num unit
4  Q5  sum(d)   UD

Remove Non-Numeric Rows in One Column with Pandas