delete rows containing numeric values in strings from pandas dataframe
In your case, I think it's better to use simple indexing rather than drop. For example:
>>> df
text type
0 abc b
1 abc123 a
2 cde a
3 abc1.2.3 b
4 1.2.3 a
5 xyz a
6 abc123 a
7 9999 a
8 5text a
9 text a
>>> df[~df.text.str.contains(r'[0-9]')]
text type
0 abc b
2 cde a
5 xyz a
9 text a
That locates any rows with no numeric text
To explain:
df.text.str.contains(r'[0-9]')
returns a boolean series of where there are any digits:
0 False
1 True
2 False
3 True
4 True
5 False
6 True
7 True
8 True
9 False
and you can use this with the ~
to index your dataframe wherever that returns false
Remove rows from pandas dataframe if string has 'only numbers'
If we're only worrying about ASCII digits 0-9:
df = df[~df['question_stemmed'].str.isdigit()]
If we need to worry about unicode or digits in other languages:
df = df[~df['question_stemmed'].str.isnumeric()]
Pandas methods internally call the corresponding python methods. See What's the difference between str.isdigit, isnumeric and isdecimal in python? for an explanation of how these functions work.
Remove rows where column value type is string Pandas
Use convert_objects
with param convert_numeric=True
this will coerce any non numeric values to NaN
:
In [24]:
df = pd.DataFrame({'a': [0.1,0.5,'jasdh', 9.0]})
df
Out[24]:
a
0 0.1
1 0.5
2 jasdh
3 9
In [27]:
df.convert_objects(convert_numeric=True)
Out[27]:
a
0 0.1
1 0.5
2 NaN
3 9.0
In [29]:
You can then drop them:
df.convert_objects(convert_numeric=True).dropna()
Out[29]:
a
0 0.1
1 0.5
3 9.0
UPDATE
Since version 0.17.0
this method is now deprecated and you need to use to_numeric
unfortunately this operates on a Series
rather than a whole df so the equivalent code is now:
df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()
Removing rows with digits and strings in pandas dataframe
Using pandas.Series.str.contains
with regex
Simpler regex but would allow for a row with '123 456'
because both '3 '
and ' 4'
satisfy the pattern.
df[df.col1.str.contains('\d\D|\D\d')]
col1
3 C96305407PLA
4 P0116711
This addresses the shortcoming of the regex above by explicitly forcing the pattern to only match if either a digit/alpha or alpha/digit is found.
df[df.col1.str.contains('(?i)\d[a-z]|[a-z]\d')]
col1
3 C96305407PLA
4 P0116711
Python Pandas Remove Rows that has Numbers (not float nor int but like 1.2.3)
Transform to str then use Regex function.
df=pd.DataFrame({'id':[1,2,3],'value':['3.3.4 text','3.4.5',3.2]})
df=df.astype(str)
df[df['value'].str.contains(r'^[\d.]+$')]
It gets:
id value
1 2 3.4.5
2 3 3.2
Select rows which contain numeric substrings in Pandas
You can use boolean indexing with a str.contains()
regex:
^0E
- starts with0E
\d{2}$
- ends with 2 digits\d{2}[A-Z]$
- ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]
Remove rows from DataFrame that contain numbers from 0 to 9
You can use the vectorised contains
and the regex pattern \d
to see if the string contains any digits to create the boolean mask and use ~
to negate it:
In [173]:
df[~df['Testvalue'].str.contains('\d')]
Out[173]:
Testvalue
2 water
Here the contains
generates the following boolean mask:
In [174]:
df['Testvalue'].str.contains('\d')
Out[174]:
0 True
1 True
2 False
Name: Testvalue, dtype: bool
Delete rows of a pandas data frame having string values in python 3.4.1
So the way I would approach this is to try to convert the columns to an int using a user function with a Try
/Catch
to handle the situation where the value cannot be coerced into an Int, these get set to NaN
values. Drop the row where you have an empty value, for some reason it actually has a length of 1 when I tested this with your data, it may work for you using len 0.
In [42]:
# simple function to try to convert the type, returns NaN if the value cannot be coerced
def func(x):
try:
return int(x)
except ValueError:
return NaN
# assign multiple columns
df['Pro_L_1'], df['Pro_L_3'], df['Sale'] = df['Pro_L_1'].apply(func), df['Pro_L_3'].apply(func), df['Sale'].apply(func)
# drop the 'empty' date row, take a copy() so we don't get a warning
df = df.loc[df['Date'].str.len() > 1].copy()
# convert the string to a datetime, if we didn't drop the row it would set the empty row to today's date
df['Date']= pd.to_datetime(df['Date'])
# now convert all the dtypes that are numeric to a numeric dtype
df = df.convert_objects(convert_numeric=True)
# check the dtypes
df.dtypes
Out[42]:
Geo_L_1 int64
Geo_L_2 int64
Geo_L_3 int64
Pro_L_1 float64
Pro_L_2 float64
Pro_L_3 float64
Date datetime64[ns]
Sale float64
dtype: object
In [43]:
# display the current situation
df
Out[43]:
Geo_L_1 Geo_L_2 Geo_L_3 Pro_L_1 Pro_L_2 Pro_L_3 Date Sale
0 1 2 3 129 1 5193316745 2012-01-01 9
1 1 2 3 129 1 5193316745 2013-01-01 NaN
3 1 2 3 129 NaN 5193316745 2012-01-10 10
4 1 2 3 129 1 5193316745 2013-01-10 4
5 1 2 3 NaN 1 5193316745 2014-01-10 6
6 1 2 3 129 1 5193316745 2012-01-11 4
7 1 2 3 129 1 NaN 2013-01-11 2
8 1 2 3 129 1 5193316745 2014-01-11 6
9 1 2 3 129 1 5193316745 2012-01-12 NaN
10 1 2 3 129 1 5193316745 2013-01-12 5
In [44]:
# drop the rows
df.dropna()
Out[44]:
Geo_L_1 Geo_L_2 Geo_L_3 Pro_L_1 Pro_L_2 Pro_L_3 Date Sale
0 1 2 3 129 1 5193316745 2012-01-01 9
4 1 2 3 129 1 5193316745 2013-01-10 4
6 1 2 3 129 1 5193316745 2012-01-11 4
8 1 2 3 129 1 5193316745 2014-01-11 6
10 1 2 3 129 1 5193316745 2013-01-12 5
For the last line assign it so df = df.dropna()
Related Topics
How to Count the Number of Messages
How to Verify If a Button Is Enabled and Disabled in Webdriver Python
Python: [Errno 10054] an Existing Connection Was Forcibly Closed by the Remote Host
Finding Out Who Got the Highest Mark Among the Students
Printing the Number of Days in a Given Month and Year [Python]
Pandas Merge - How to Avoid Duplicating Columns
Delete Every Non Utf-8 Symbols from String
How to Save a Pandas Dataframe Table as a Png
Numpy Array Typeerror: Only Integer Scalar Arrays Can Be Converted to a Scalar Index
Python: Draw Line Between Two Coordinates in a Matrix
How to Mention a User in Discord.Py
Remove White Space from Entire Dataframe
What Do Numbers Starting With 0 Mean in Python
How to Convert Datetime by Removing Nanoseconds
Get Only Unique Words from a Sentence in Python
Regex to Remove Commas Before a Number in Python
Finding a Substring Within a String Without Using Any Built in Functions