TypeError: can't use a string pattern on a bytes-like object in re.findall()
You want to convert html (a byte-like object) into a string using .decode
, e.g. html = response.read().decode('utf-8')
.
See Convert bytes to a Python String
Error :cannot use a string pattern on a bytes-like object
Python 3 distinguishes "bytes" and "string" types; this is especially important for Unicode strings, where each character may be more than one byte, depending on the character and the encoding.
Regular expressions can work on either, but it has to be consistent — searching for bytes within bytes, or strings within strings.
Depending on what you need, there are two solutions:
Decode the
output
variable before searching in it; for instance, with:output_text = output.decode('utf-8')
This depends on the encoding that you are using; UTF-8 is the most common these days.
The matched group will be a string.
Search with bytes by adding a
b
prefix to the regular expression. A regular expression should also use ther
prefix, so it becomes:re.search(br"(Profile\s*:\s)(.*)", output)
The matched group will be a bytes object.
Can't use a string pattern on a bytes-like object - python's re error
The problem is that you're mixing bytes and text strings. You should either decode your data into a text string (unicode), e.g. data.decode('utf-8')
, or use a bytes object for the pattern, e.g. re.findall(b"[A-Za-z]")
(note the leading b
before the string literal).
re.search().TypeError: cannot use a string pattern on a bytes-like object
re
needs byte patterns (not string) to search bytes-like objects. Append a b
to your search pattern like so: b'<title>(.*?)</title>'
cannot use a bytes pattern on a string-like object with agent.request
On Python 3 sys.argv
is a list of str
. However, Agent.request
accepts a value of type bytes
as its 2nd argument. Since sys.argv[1]
is a value of type str
something goes wrong somewhere in the implementation and you get this obscure exception.
If you encode sys.argv[1]
to bytes (eg sys.argv[1].encode("ascii")
) and pass the result to agent.request
then you'll get past this error.
TypeError: cannot use a string pattern on a bytes-like object using re.findall()
response.text
will give you a str
, not byte
s but response.content
will give you byte
s.
Choose the type you want to use and use it consistently.
re
will handle bytes if the regular expression is byte
s as well.
cannot use a string pattern on a bytes-like object (Python)
return re.findall('(?:href=")(.*?)"', response.content)
response.content
in this case is of type binary. So either you use response.text
, so you get pure text and can process it as you plan on doing now, or you can check this out:
Regular expression parsing a binary file?
In case you want to continue down the binary road.
Cheers
Related Topics
How to Convert SQL Query Result to Pandas Data Structure
Sqlalchemy: Print the Actual Query
SQL Join or R's Merge() Function in Numpy
Python Code to Remove HTML Tags from a String
Pandas Read_HTML Valueerror: No Tables Found
Why Is Variable1 += Variable2 Much Faster Than Variable1 = Variable1 + Variable2
Beautifulsoup Not Grabbing Dynamic Content
Django Gunicorn Not Load Static Files
How to Change the Styles of Pandas Dataframe Headers
Google Fonts (Ttf) Being Ignored in Qtwebengine When Using @Font Face
How to Use Tailwindcss with Django
Does Python Have a Module to Convert CSS Styles to Inline Styles for Emails
What Are Some Good Python Orm Solutions
Remove Xticks in a Matplotlib Plot
Pandas: Peculiar Performance Drop for Inplace Rename After Dropna
Preserving Styles Using Python's Xlrd,Xlwt, and Xlutils.Copy