Remove non-numeric rows in one column with pandas
You could use standard method of strings isnumeric
and apply it to each value in your id
column:
import pandas as pd
from io import StringIO
data = """
id,name
1,A
2,B
3,C
tt,D
4,E
5,F
de,G
"""
df = pd.read_csv(StringIO(data))
In [55]: df
Out[55]:
id name
0 1 A
1 2 B
2 3 C
3 tt D
4 4 E
5 5 F
6 de G
In [56]: df[df.id.apply(lambda x: x.isnumeric())]
Out[56]:
id name
0 1 A
1 2 B
2 3 C
4 4 E
5 5 F
Or if you want to use id
as index you could do:
In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id')
Out[61]:
name
id
1 A
2 B
3 C
4 E
5 F
Edit. Add timings
Although case with pd.to_numeric
is not using apply
method it is almost two times slower than with applying np.isnumeric
for str
columns. Also I add option with using pandas str.isnumeric
which is less typing and still faster then using pd.to_numeric
. But pd.to_numeric
is more general because it could work with any data types (not only strings).
df_big = pd.concat([df]*10000)
In [3]: df_big = pd.concat([df]*10000)
In [4]: df_big.shape
Out[4]: (70000, 2)
In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())]
15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit df_big[df_big.id.str.isnumeric()]
20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()]
29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Remove pandas dataframe row if one column's element is non-numeric
You can use the 'to_numeric' operation of pandas which will throw up an error when a string value is met. This error will be handled by 'coerce' which forces the content to null. We then use the 'notnull' operation to filter this particular row.
df[pd.to_numeric(df['price'], errors='coerce').notnull()]
Need to delete non-numeric rows from a dataframe
I ended up doing it this way.
cols = df_append.columns[:-1]
df_append[cols] = df_append[cols].apply(pd.to_numeric, errors='coerce')
df_append = df_append.fillna(0)
That's good enough for my purpose!
Python Pandas dropping Non numerical rows from columns
I think you need add isnull
for checking NaN
values, because your function return NaN
if not number. Better and faster is use text method str.isnumeric()
and str.isdigit()
with boolean indexing:
print df['Score'].str.isnumeric()
0 NaN
1 NaN
2 False
3 NaN
4 False
Name: Score, dtype: object
print df['Score'].str.isnumeric().isnull()
0 True
1 True
2 False
3 True
4 False
Name: Score, dtype: bool
print df[df['Score'].str.isnumeric().isnull()]
Faggio Foo Score
0 0 Nis 4
1 1 and stimpy 6
3 1 cab 7
print df[df['Score'].str.isdigit().isnull()]
Faggio Foo Score
0 0 Nis 4
1 1 and stimpy 6
3 1 cab 7
Similar solution with to_numeric
and notnull
:
print df[pd.to_numeric(df['Score'], errors='coerce').notnull()]
Faggio Foo Score
0 0 Nis 4
1 1 and stimpy 6
3 1 cab 7
Remove non-numeric in df column with different datatypes
This should work : df['Volumne'] = df['Volume'].str.replace(r'[^0-9.]', '')
How to remove non numeric characters from an column
You can use pandas Series's vectorized counterpart of the re.sub
method .str.replace
to remove \D
(match non numeric characters):
df.column1.str.replace('\D', '')
0 67512
1 2568
2 5647
3 NaN
4 222674
5 98789
Name: column1, dtype: object
Drop rows from a dataframe with a non-numeric index
You may use pd.to_numeric
to convert your numbers
column to numeric. All non-numeric entries will be coerced to NaN
, and you can then just drop those rows.
df = pd.read_csv(file, encoding='cp1252')
df['numbers'] = pd.to_numeric(df['numbers'], errors='coerce')
df = df.dropna(subset=['numbers']).set_index('numbers')
How do I remove non-numeric values from specific column in pandas?
Those are actually integers, just represented in a different base (base 16, also known as hexadecimal). The int()
function takes an optional second argument for the base. We can check if a string consists only of numeric characters, and if so use 10 as the base, 16 otherwise:
df.DstPort.apply(lambda x: int(x, 10 if x.isnumeric() else 16))
get non numerical rows in a column pandas python
Use boolean indexing
with mask created by to_numeric
+ isnull
Note: This solution does not find or filter numbers saved as strings: like '1' or '22'
print (pd.to_numeric(df['num'], errors='coerce'))
0 -1.48
1 1.70
2 -6.18
3 0.25
4 NaN
5 0.25
Name: num, dtype: float64
print (pd.to_numeric(df['num'], errors='coerce').isnull())
0 False
1 False
2 False
3 False
4 True
5 False
Name: num, dtype: bool
print (df[pd.to_numeric(df['num'], errors='coerce').isnull()])
N-D num unit
4 Q5 sum(d) UD
Another solution with isinstance
and apply
:
print (df[df['num'].apply(lambda x: isinstance(x, str))])
N-D num unit
4 Q5 sum(d) UD
Related Topics
How to Remove the Left Part of a String
Check If a File Is Open in Python
How to Remove an Element in Lxml
Drawing Lines Between Two Plots in Matplotlib
How to Pick "X" Number of Unique Numbers from a List in Python
Capture Arbitrary Path in Flask Route
Word Count from a Txt File Program
How to Bind a List to a Parameter in a Custom Query in SQLalchemy
How to Mark a Portion of a Text Widget as Readonly
Why Do -1 and -2 Both Hash to -2 in Cpython
How to Run Python Script Without Typing 'Python ...'
Record Speakers Output with Pyaudio
How to Modify Variable in Python That Is in Outer, But Not Global, Scope
Pandas Out of Bounds Nanosecond Timestamp After Offset Rollforward Plus Adding a Month Offset