Filter all rows that do not contain letters (alpha) in ´pandas´
I think you'd need str.contains
to filter values which contain letters by the means of boolean indexing
:
df = df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
Col A.
2 dog
3 cat 583
4 rabbit 444
If there are some NaN
s values you can pass a parameter:
df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]
print (df)
Col A.
3 dog
4 cat 583
5 rabbit 444
Filtering out rows with non-alphanumeric characters
You're looking for str.isalpha
:
df[df.my_column.str.isalpha()]
some_col my_column
0 1 some
1 2 word
A similar method is str.isalnum
, if you want to retain letters and digits.
If you want to handle letters and whitespace characters, use
df[~df.my_column.str.contains(r'[^\w\s]')]
some_col my_column
0 1 some
1 2 word
Lastly, if you are looking to remove punctuation as a whole, I've written a Q&A here which might be a useful read: Fast punctuation removal with pandas
How to remove non-alpha-numeric characters from strings within a dataframe column in Python?
Use str.replace
.
df
strings
0 a#bc1!
1 a(b$c
df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object
To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:
df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object
Choose empty rows only from a dataframe
See below for a table summarizing various methods to identify null/False/empty elements in pandas
Depending on what other values you can have in "Alpha" you could use:
Keep only spaces
df2.loc[df2['Alpha'].eq('')]
Keep all "Falsy" values
df2.loc[~df2['Alpha'].astype(bool)]
output:
Date FN AuM Alpha
0 01012021 A 10
1 01012021 B 20
table of empty/falsy/zero-length values
df = pd.DataFrame({'data': [1, 'abc', True, 0, '', None,
float('nan'), False, [],
{}, set(), tuple([])]
})
df['type'] = df['data'].apply(lambda x: type(x).__name__)
df['isna'] = df['data'].isna().map({True: 'X', False: ''})
df['isnull'] = df['data'].isnull().map({True: 'X', False: ''})
df['~bool'] = (~df['data'].astype(bool)).map({True: 'X', False: ''})
df['len'] = df['data'].str.len().convert_dtypes()
df['len==0'] = df['data'].str.len().eq(0).map({True: 'X', False: ''})
print(df)
output:
data type isna isnull ~bool len len==0
0 1 int <NA>
1 abc str 3
2 True bool <NA>
3 0 int X <NA>
4 str X 0 X
5 None NoneType X X X <NA>
6 NaN float X X <NA>
7 False bool X <NA>
8 [] list X 0 X
9 {} dict X 0 X
10 {} set X 0 X
11 () tuple X 0 X
Remove all alphanumeric words from a string using pandas
You can use
data_df = pd.DataFrame({'Vendor': ['2fvRE-Ku89lkRVJ44QQFN ABACUS LABS, INC', 'abc123 CAT LABS, INC']})
data_df['Vendor'].str.replace(r'^(?:[A-Za-z-]+\d|[\d-]+[A-Za-z])[\w-]*\s*', '', regex=True)
# => 0 ABACUS LABS, INC
# 1 CAT LABS, INC
# Name: Vendor, dtype: object
See the regex demo.
Regex details
^
- start of string(?:[A-Za-z-]+\d|[\d-]+[A-Za-z])
- either one or more letters/dashes and then a digit or a one or more digits/dashes and then a letter[\w-]*
- zero or more word or-
chars\s*
- zero or more whitespace chars.
filter pandas dataframe by two columns where one column is a list
You can use apply()
:
print(x[x.apply(lambda row: row.Letter in row.Alpha, axis=1)])
# Alpha Letter
#0 [a, z] a
#2 c c
#4 [e, q, m] e
pandas iterate throw rows to see if value is alpha numeric
You can use a list comprehension:
df['col1'] = [''.join([i for i in x if i.isalpha()]) for x in df['col1']]
print(df)
col1
0 Hi
1 Hi
2 hi
3 Hi
If you have NaN
or float
values, remove them first by converting them to empty string:
df.loc[pd.to_numeric(df['col1'], errors='coerce').notnull(), 'col1'] = ''
Related Topics
Python - Use Previous Row'S Value to Update the New Rows Values
Filter All Rows That Do Not Contain Letters (Alpha) in 'Pandas'
Regex to Remove Commas Before a Number in Python
Delete Every Non Utf-8 Symbols from String
How to Extract X,Y Coordinates from Opencv "Cv2.Keypoint" Object
Getting Value in a Dataframe in Pyspark
Using a Pandas Dataframe as a Lookup Table
How Best to Insert Nan Values in a Python List by Referring to an Already Sorted List
Python Pandas - Get Row Based on Previous Row Value
How to Remove Parentheses from a String
Easiest Way to Ignore Blank Lines When Reading a File in Python
Finding a Substring Within a String Without Using Any Built in Functions
How to Save a Pandas Dataframe Table as a Png
How to Find the Average Colour of an Image in Python With Opencv
How to Stop Execution of Python Script in Visual Studio Code