Typeerror: Can't Use a String Pattern on a Bytes-Like Object in Re.Findall()

TypeError: can't use a string pattern on a bytes-like object in re.findall()

You want to convert html (a byte-like object) into a string using .decode, e.g. html = response.read().decode('utf-8').

See Convert bytes to a Python String

Error :cannot use a string pattern on a bytes-like object

Python 3 distinguishes "bytes" and "string" types; this is especially important for Unicode strings, where each character may be more than one byte, depending on the character and the encoding.

Regular expressions can work on either, but it has to be consistent — searching for bytes within bytes, or strings within strings.

Depending on what you need, there are two solutions:

  • Decode the output variable before searching in it; for instance, with: output_text = output.decode('utf-8')

    This depends on the encoding that you are using; UTF-8 is the most common these days.

    The matched group will be a string.

  • Search with bytes by adding a b prefix to the regular expression. A regular expression should also use the r prefix, so it becomes: re.search(br"(Profile\s*:\s)(.*)", output)

    The matched group will be a bytes object.

Can't use a string pattern on a bytes-like object - python's re error

The problem is that you're mixing bytes and text strings. You should either decode your data into a text string (unicode), e.g. data.decode('utf-8'), or use a bytes object for the pattern, e.g. re.findall(b"[A-Za-z]") (note the leading b before the string literal).

re.search().TypeError: cannot use a string pattern on a bytes-like object

re needs byte patterns (not string) to search bytes-like objects. Append a b to your search pattern like so: b'<title>(.*?)</title>'

cannot use a bytes pattern on a string-like object with agent.request

On Python 3 sys.argv is a list of str. However, Agent.request accepts a value of type bytes as its 2nd argument. Since sys.argv[1] is a value of type str something goes wrong somewhere in the implementation and you get this obscure exception.

If you encode sys.argv[1] to bytes (eg sys.argv[1].encode("ascii")) and pass the result to agent.request then you'll get past this error.

TypeError: cannot use a string pattern on a bytes-like object using re.findall()

response.text will give you a str, not bytes but response.content will give you bytes.

Choose the type you want to use and use it consistently.

re will handle bytes if the regular expression is bytes as well.

cannot use a string pattern on a bytes-like object (Python)

return re.findall('(?:href=")(.*?)"', response.content)

response.content in this case is of type binary. So either you use response.text, so you get pure text and can process it as you plan on doing now, or you can check this out:

Regular expression parsing a binary file?

In case you want to continue down the binary road.

Cheers



Related Topics



Leave a reply



Submit