Filter All Rows That Do Not Contain Letters (Alpha) in 'Pandas'

Filter all rows that do not contain letters (alpha) in ´pandas´

I think you'd need str.contains to filter values which contain letters by the means of boolean indexing:

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444

If there are some NaNs values you can pass a parameter:

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444

Filtering out rows with non-alphanumeric characters

You're looking for str.isalpha:

df[df.my_column.str.isalpha()]

some_col my_column
0 1 some
1 2 word

A similar method is str.isalnum, if you want to retain letters and digits.

If you want to handle letters and whitespace characters, use

df[~df.my_column.str.contains(r'[^\w\s]')]

some_col my_column
0 1 some
1 2 word

Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas

How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

Use str.replace.

df
strings
0 a#bc1!
1 a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object

Choose empty rows only from a dataframe

See below for a table summarizing various methods to identify null/False/empty elements in pandas

Depending on what other values you can have in "Alpha" you could use:

Keep only spaces

df2.loc[df2['Alpha'].eq('')]

Keep all "Falsy" values

df2.loc[~df2['Alpha'].astype(bool)]

output:

       Date FN  AuM Alpha
0 01012021 A 10
1 01012021 B 20

table of empty/falsy/zero-length values

df = pd.DataFrame({'data': [1, 'abc', True, 0, '', None,
float('nan'), False, [],
{}, set(), tuple([])]
})

df['type'] = df['data'].apply(lambda x: type(x).__name__)
df['isna'] = df['data'].isna().map({True: 'X', False: ''})
df['isnull'] = df['data'].isnull().map({True: 'X', False: ''})
df['~bool'] = (~df['data'].astype(bool)).map({True: 'X', False: ''})
df['len'] = df['data'].str.len().convert_dtypes()
df['len==0'] = df['data'].str.len().eq(0).map({True: 'X', False: ''})
print(df)

output:

     data      type isna isnull ~bool   len len==0
0 1 int <NA>
1 abc str 3
2 True bool <NA>
3 0 int X <NA>
4 str X 0 X
5 None NoneType X X X <NA>
6 NaN float X X <NA>
7 False bool X <NA>
8 [] list X 0 X
9 {} dict X 0 X
10 {} set X 0 X
11 () tuple X 0 X

Remove all alphanumeric words from a string using pandas

You can use

data_df = pd.DataFrame({'Vendor': ['2fvRE-Ku89lkRVJ44QQFN ABACUS LABS, INC', 'abc123 CAT LABS, INC']})
data_df['Vendor'].str.replace(r'^(?:[A-Za-z-]+\d|[\d-]+[A-Za-z])[\w-]*\s*', '', regex=True)
# => 0 ABACUS LABS, INC
# 1 CAT LABS, INC
# Name: Vendor, dtype: object

See the regex demo.

Regex details

  • ^ - start of string
  • (?:[A-Za-z-]+\d|[\d-]+[A-Za-z]) - either one or more letters/dashes and then a digit or a one or more digits/dashes and then a letter
  • [\w-]* - zero or more word or - chars
  • \s* - zero or more whitespace chars.

filter pandas dataframe by two columns where one column is a list

You can use apply():

print(x[x.apply(lambda row: row.Letter in row.Alpha, axis=1)])
# Alpha Letter
#0 [a, z] a
#2 c c
#4 [e, q, m] e

pandas iterate throw rows to see if value is alpha numeric

You can use a list comprehension:

df['col1'] = [''.join([i for i in x if i.isalpha()]) for x in df['col1']]

print(df)

col1
0 Hi
1 Hi
2 hi
3 Hi

If you have NaN or float values, remove them first by converting them to empty string:

df.loc[pd.to_numeric(df['col1'], errors='coerce').notnull(), 'col1'] = ''


Related Topics



Leave a reply



Submit