Filter All Rows That Do Not Contain Letters (Alpha) in 'Pandas'

Filter all rows that do not contain letters (alpha) in ´pandas´

I think you'd need str.contains to filter values which contain letters by the means of boolean indexing:

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444

If there are some NaNs values you can pass a parameter:

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444

Filtering out rows with non-alphanumeric characters

You're looking for str.isalpha:


some_col my_column
0 1 some
1 2 word

A similar method is str.isalnum, if you want to retain letters and digits.

If you want to handle letters and whitespace characters, use


some_col my_column
0 1 some
1 2 word

Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Use str.replace.

0 a#bc1!
1 a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object

Choose empty rows only from a dataframe

See below for a table summarizing various methods to identify null/False/empty elements in pandas

Depending on what other values you can have in "Alpha" you could use:

Keep only spaces


Keep all "Falsy" values



       Date FN  AuM Alpha
0 01012021 A 10
1 01012021 B 20

table of empty/falsy/zero-length values

df = pd.DataFrame({'data': [1, 'abc', True, 0, '', None,
float('nan'), False, [],
{}, set(), tuple([])]

df['type'] = df['data'].apply(lambda x: type(x).__name__)
df['isna'] = df['data'].isna().map({True: 'X', False: ''})
df['isnull'] = df['data'].isnull().map({True: 'X', False: ''})
df['~bool'] = (~df['data'].astype(bool)).map({True: 'X', False: ''})
df['len'] = df['data'].str.len().convert_dtypes()
df['len==0'] = df['data'].str.len().eq(0).map({True: 'X', False: ''})


     data      type isna isnull ~bool   len len==0
0 1 int <NA>
1 abc str 3
2 True bool <NA>
3 0 int X <NA>
4 str X 0 X
5 None NoneType X X X <NA>
6 NaN float X X <NA>
7 False bool X <NA>
8 [] list X 0 X
9 {} dict X 0 X
10 {} set X 0 X
11 () tuple X 0 X

Remove all alphanumeric words from a string using pandas

You can use

data_df = pd.DataFrame({'Vendor': ['2fvRE-Ku89lkRVJ44QQFN ABACUS LABS, INC', 'abc123 CAT LABS, INC']})
data_df['Vendor'].str.replace(r'^(?:[A-Za-z-]+\d|[\d-]+[A-Za-z])[\w-]*\s*', '', regex=True)
# Name: Vendor, dtype: object

See the regex demo.

Regex details

  • ^ - start of string
  • (?:[A-Za-z-]+\d|[\d-]+[A-Za-z]) - either one or more letters/dashes and then a digit or a one or more digits/dashes and then a letter
  • [\w-]* - zero or more word or - chars
  • \s* - zero or more whitespace chars.

filter pandas dataframe by two columns where one column is a list

You can use apply():

print(x[x.apply(lambda row: row.Letter in row.Alpha, axis=1)])
# Alpha Letter
#0 [a, z] a
#2 c c
#4 [e, q, m] e

pandas iterate throw rows to see if value is alpha numeric

You can use a list comprehension:

df['col1'] = [''.join([i for i in x if i.isalpha()]) for x in df['col1']]


0 Hi
1 Hi
2 hi
3 Hi

If you have NaN or float values, remove them first by converting them to empty string:

df.loc[pd.to_numeric(df['col1'], errors='coerce').notnull(), 'col1'] = ''

Related Topics

Leave a reply