How to Set Sys.Stdout Encoding in Python 3

How to set sys.stdout encoding in Python 3?

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

Setting stdout UTF8 encoding with Python3

After further research, the solution is to use SetEnv PYTHONIOENCODING utf8 in .htaccess files, as detailed here: mod_cgi + utf8 + Python3 produces no output.

For other processes it might be interesting to put PYTHONIOENCODING=utf8 in /etc/environment for persistence (not sure if it does the job for all processes that could call a Python script).

Setting the correct encoding when piping stdout in Python

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')

Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

import sys
for line in sys.stdin:
# Decode what you receive:
line = line.decode('iso8859-1')

# Work with Unicode internally:
line = line.upper()

# Encode what you send:
line = line.encode('utf-8')
sys.stdout.write(line)

Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

Change encoding of stdin / stdout at runtime in Python 3

Actually TextIOWrapper does return bytes. It takes a Unicode string and returns a byte string in a particular encoding. To change sys.stdout to use a particular encoding in a script, here's an example:

Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u5000')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\dev\python32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u5000' in position 0: character maps to <undefined>>>> import io
>>> import io
>>> import sys
>>> sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
>>> print('\u5000')
倀

(my terminal isn't UTF-8)

sys.stdout.buffer accesses the raw byte stream. You can also use the following to write to stdout in a particular encoding:

sys.stdout.buffer.write('\u5000'.encode('utf8'))

How do I specify the encoding in an print() statement?

print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)

print() calls file.write(), and file defaults to sys.stdout. sys.stdout is a file object whose write() method encodes strings according to its encoding property. If you reconfigure that property it'll change how strings are encoded when printed:

sys.stdout.reconfigure(encoding='latin-1')

Alternatively, you could encode the string yourself and then write the bytes to stdout's underlying binary buffer.

sys.stdout.buffer.write("<some text>".encode('latin-1'))

Beware that buffer is not a public property: "This is not part of the TextIOBase API and may not exist in some implementations."

Writing unicode strings via sys.stdout in Python

It's not clear to my why you wouldn't be able to do print; but assuming so, yes, the approach looks right to me.



Related Topics



Leave a reply



Submit