Python 3 TypeError: must be str, not bytes with sys.stdout.write()
Python 3 handles strings a bit different. Originally there was just one type for
strings: str
. When unicode gained traction in the '90s the new unicode
type
was added to handle Unicode without breaking pre-existing code1. This is
effectively the same as str
but with multibyte support.
In Python 3 there are two different types:
- The
bytes
type. This is just a sequence of bytes, Python doesn't know
anything about how to interpret this as characters. - The
str
type. This is also a sequence of bytes, but Python knows how to
interpret those bytes as characters. - The separate
unicode
type was dropped.str
now supports unicode.
In Python 2 implicitly assuming an encoding could cause a lot of problems; you
could end up using the wrong encoding, or the data may not have an encoding at
all (e.g. it’s a PNG image).
Explicitly telling Python which encoding to use (or explicitly telling it to
guess) is often a lot better and much more in line with the "Python philosophy"
of "explicit is better than implicit".
This change is incompatible with Python 2 as many return values have changed,
leading to subtle problems like this one; it's probably the main reason why
Python 3 adoption has been so slow. Since Python doesn't have static typing2
it's impossible to change this automatically with a script (such as the bundled2to3
).
- You can convert
str
tobytes
withbytes('h€llo', 'utf-8')
; this should
produceb'H\xe2\x82\xacllo'
. Note how one character was converted to three
bytes. - You can convert
bytes
tostr
withb'H\xe2\x82\xacllo'.decode('utf-8')
.
Of course, UTF-8 may not be the correct character set in your case, so be sure
to use the correct one.
In your specific piece of code, nextline
is of type bytes
, not str
,
reading stdout
and stdin
from subprocess
changed in Python 3 from str
tobytes
. This is because Python can't be sure which encoding this uses. It
probably uses the same as sys.stdin.encoding
(the encoding of your system),
but it can't be sure.
You need to replace:
sys.stdout.write(nextline)
with:
sys.stdout.write(nextline.decode('utf-8'))
or maybe:
sys.stdout.write(nextline.decode(sys.stdout.encoding))
You will also need to modify if nextline == ''
to if nextline == b''
since:
>>> '' == b''
False
Also see the Python 3 ChangeLog, PEP 358, and PEP 3112.
1 There are some neat tricks you can do with ASCII that you can't do with multibyte character sets; the most famous example is the "xor with space to switch case" (e.g. chr(ord('a') ^ ord(' ')) == 'A'
) and "set 6th bit to make a control character" (e.g. ord('\t') + ord('@') == ord('I')
). ASCII was designed in a time when manipulating individual bits was an operation with a non-negligible performance impact.
2 Yes, you can use function annotations, but it's a comparatively new feature and little used.
builtins.TypeError: must be str, not bytes
The outfile should be in binary mode.
outFile = open('output.xml', 'wb')
Getting TypeError must be str not bytes with echo command
You could convert the strings to bytes as well and then decode them:
bytes_string = b"echo \"" + base64.b64encode(b'Hello World') + b"\" | base64 -d"
print(bytes_string.decode('utf-8'))
>>> echo "SGVsbG8gV29ybGQ=" | base64 -d
python3.6 - TypeError: write() argument must be str, not bytes - but no files involved
You're opening the subprocess without an encoding parameter set, so the streams are binary streams (which is an excellently sane default, considering e.g. something like GhostScript could output a binary PDF on stdout
).
Do
process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True,
encoding='utf-8',
errors='strict', # could be ignore or replace too, `strict` is the default
)
if you'd like the streams to be wrapped in an UTF-8 decoder so you get strings out of them, not bytes. Of course, this implies you know the output data is always UTF-8.
Python builtins.TypeError: must be str, not bytes - Twisted LineReceiver.sendLine()
Your code should run fine in Python/Anaconda v2 but not in v3 unless you've left some code out. For Py v3+, use either:
self.sendLine( b"What's your name?" )
self.sendLine( "What's your name?".encode('utf8') )
As you can see in docs for LineReceiver.sendLine
the argument must be type bytes
.
Related Topics
Python - When to Use File VS Open
Class Variables Is Shared Across All Instances in Python
Login to Website Using Urllib2 - Python 2.7
Python/Selenium Incognito/Private Mode
How to Return a Subset of a List That Matches a Condition
How to Run Python Script on Terminal
Python Library 'Unittest': Generate Multiple Tests Programmatically
Python Replace String Pattern with Output of Function
Python Parse Comma-Separated Number into Int
Python List Comprehension - Want to Avoid Repeated Evaluation
Find Out How Many Times a Regex Matches in a String in Python
How to Draw a Line with Matplotlib
Best Way to Parse a Url Query String
Concat Dataframe Reindexing Only Valid with Uniquely Valued Index Objects
How to Select Literal Values in an SQLalchemy Query
Python Float to Int Conversion
Can't Open Lib 'Odbc Driver 13 for SQL Server'? Sym Linking Issue