re.sub erroring with Expected string or bytes-like object
As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub
. The simplest way is to change location
to str(location)
when using re.sub
. It wouldn't hurt to do it anyways even if it's already a str
.
letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters
" ", # Replace all non-letters with spaces
str(location))
TypeError expected string or bytes-like object - Pycharm
The arguments of the re.search
must be "string" or "byte". As I see in your "sponser" column, there is a NaN which interpreted as float
so in that iteration approval
is neither string nor byte. This is why you get that TypeError.
Write this code to see that in action:
for item in list(set(covid_approval_polls["sponsor"])):
print(item, type(item))
To solve this, you can either ignore the re.search
with single condition pd.isna()
or maybe replace the NaNs in DataFrame with empty string ""
.
TypeError: expected string or bytes-like object Regular expression removing special characters
You are trying to apply regular expression to the List object.
If your goal is to use this regex on every item of the list, you can apply re.sub for each item in list:
import re
def replace_func(item):
return re.sub(r'\W+', '', item)
train['content'] = train['content'].map(lambda x: [replace_func(item) for item in x])
TypeError: expected string or bytes-like object' while trying to replace consecutive white spaces with a single space in all entries of a DataFrame
You need DataFrame.applymap
for element wise processing, because both function working with scalars:
df = df.applymap(lambda s: re.sub('\s+', ' ', s))
print(df)
col1 col2
0 a--b e f
1 c d g---h
df = df.applymap(lambda s: ' '.join(s.split()))
print(df)
col1 col2
0 a--b e f
1 c d g---h
Method DataFrame.transform
processing columns like Series
, so it failed.
You can rewrite second solution with Series.str.split
and Series.str.join
for processing columns (Series
):
def f(x):
#test - processing column
#print (x)
return x.str.split().str.join(' ')
df = df.transform(f)
print (df)
col1 col2
0 a--b e f
1 c d g---h
Regular Expressions expected string or bytes like object
f.read
is a method, so you need to do f.read()
do note that .read()
will read the entire file, you may want to iterate it line by line instead
Getting the error: expected string or bytes-like object when using re split method
Dirty workaround
Extract the binary and get the string
raw_hypno_single = [x for x in str(f.read()).split('Sleep stage',1)][1:]
Then, split the sleep stage and movement as suggested in OP1
raw_hypno =re.split(r"Sleep stage|Movement time", raw_hypno_single[0])
The full code is
file= 'edfx\\SC4002EC-Hypnogram.edf' # Please change according to the location of you file
with open(file, mode='rb') as f:
raw_hypno_single = [x for x in str(f.read()).split('Sleep stage',1)][1:]
raw_hypno =re.split(r"Sleep stage|Movement time", raw_hypno_single[0])
TypeError: expected string or bytes-like object for list in list
You're getting TypeError:
----> 4 if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:
here because row['email']
is a list, not a string, so you can't apply re.findall
which expected a string, not a list.
Now, it seems your particular problem can be solved without even iterating over dataframe rows. Try:
emails = emaildf['email'].explode()
emails = pd.Series(np.where(emails.str.contains("gmail|hotmail|yahoo|msn").replace(np.nan, False), emails, np.nan), index=emails.index)
emails = emails.groupby(emails.index).apply(lambda x: [y for y in x if pd.notna(y)]).apply(lambda x: x if len(x)>1 else (x[0] if len(x)==1 else np.nan))
df['Private_Email'] = np.where(pd.notna(emails), emails, 'No Gmail/Hotmail/MSN/Yahoo domains found.')
Related Topics
How to Access a Standard-Library Module in Python When There Is a Local Module with the Same Name
Python Pack() and Grid() Methods Together
Django JSONfield Inside Arrayfield
How to Load/Edit/Run/Save Text Files (.Py) into an Ipython Notebook Cell
How to Schedule a Function to Run Every Hour on Flask
How to Left Align a Fixed Width String
Pandas Reading CSV as String Type
Python: Use MySQLdb to Import a MySQL Table as a Dictionary
Python Pyqt Signals Are Not Always Working
Can Multiprocessing Process Class Be Run from Idle
Tab Completion in Python's Raw_Input()
Python: Tf-Idf-Cosine: to Find Document Similarity
Why Can't I Change Only a Single Element in a Nested List in Python