Re.Sub Erroring with "Expected String or Bytes-Like Object"

re.sub erroring with Expected string or bytes-like object

As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub. The simplest way is to change location to str(location) when using re.sub. It wouldn't hurt to do it anyways even if it's already a str.

letters_only = re.sub("[^a-zA-Z]",  # Search for all non-letters
" ", # Replace all non-letters with spaces
str(location))

TypeError expected string or bytes-like object - Pycharm

The arguments of the re.search must be "string" or "byte". As I see in your "sponser" column, there is a NaN which interpreted as float so in that iteration approval is neither string nor byte. This is why you get that TypeError.

Write this code to see that in action:

for item in list(set(covid_approval_polls["sponsor"])):
print(item, type(item))

To solve this, you can either ignore the re.search with single condition pd.isna() or maybe replace the NaNs in DataFrame with empty string "".

TypeError: expected string or bytes-like object Regular expression removing special characters

You are trying to apply regular expression to the List object.

If your goal is to use this regex on every item of the list, you can apply re.sub for each item in list:

import re
def replace_func(item):
return re.sub(r'\W+', '', item)

train['content'] = train['content'].map(lambda x: [replace_func(item) for item in x])

TypeError: expected string or bytes-like object' while trying to replace consecutive white spaces with a single space in all entries of a DataFrame

You need DataFrame.applymap for element wise processing, because both function working with scalars:

df = df.applymap(lambda s: re.sub('\s+', ' ', s))
print(df)
col1 col2
0 a--b e f
1 c d g---h


df = df.applymap(lambda s: ' '.join(s.split()))
print(df)
col1 col2
0 a--b e f
1 c d g---h

Method DataFrame.transform processing columns like Series, so it failed.

You can rewrite second solution with Series.str.split and Series.str.join for processing columns (Series):

def f(x):
#test - processing column
#print (x)
return x.str.split().str.join(' ')

df = df.transform(f)
print (df)

col1 col2
0 a--b e f
1 c d g---h

Regular Expressions expected string or bytes like object

f.read is a method, so you need to do f.read()

do note that .read() will read the entire file, you may want to iterate it line by line instead

Getting the error: expected string or bytes-like object when using re split method

Dirty workaround

Extract the binary and get the string

raw_hypno_single = [x for x in str(f.read()).split('Sleep stage',1)][1:]

Then, split the sleep stage and movement as suggested in OP1

  raw_hypno =re.split(r"Sleep stage|Movement time", raw_hypno_single[0])

The full code is

file= 'edfx\\SC4002EC-Hypnogram.edf' # Please change according to the location of you file
with open(file, mode='rb') as f:
raw_hypno_single = [x for x in str(f.read()).split('Sleep stage',1)][1:]
raw_hypno =re.split(r"Sleep stage|Movement time", raw_hypno_single[0])

TypeError: expected string or bytes-like object for list in list

You're getting TypeError:

----> 4         if len(re.findall("gmail|hotmail|yahoo|msn", row['email'])) > 0:

here because row['email'] is a list, not a string, so you can't apply re.findall which expected a string, not a list.

Now, it seems your particular problem can be solved without even iterating over dataframe rows. Try:

emails = emaildf['email'].explode()
emails = pd.Series(np.where(emails.str.contains("gmail|hotmail|yahoo|msn").replace(np.nan, False), emails, np.nan), index=emails.index)
emails = emails.groupby(emails.index).apply(lambda x: [y for y in x if pd.notna(y)]).apply(lambda x: x if len(x)>1 else (x[0] if len(x)==1 else np.nan))
df['Private_Email'] = np.where(pd.notna(emails), emails, 'No Gmail/Hotmail/MSN/Yahoo domains found.')


Related Topics



Leave a reply



Submit