How to set sys.stdout encoding in Python 3?
Since Python 3.7 you can change the encoding of standard streams with reconfigure()
:
sys.stdout.reconfigure(encoding='utf-8')
You can also modify how encoding errors are handled by adding an errors
parameter.
Setting stdout UTF8 encoding with Python3
After further research, the solution is to use SetEnv PYTHONIOENCODING utf8
in .htaccess
files, as detailed here: mod_cgi + utf8 + Python3 produces no output.
For other processes it might be interesting to put PYTHONIOENCODING=utf8
in /etc/environment
for persistence (not sure if it does the job for all processes that could call a Python script).
Setting the correct encoding when piping stdout in Python
Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.
A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.
# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')
Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.
import sys
for line in sys.stdin:
# Decode what you receive:
line = line.decode('iso8859-1')
# Work with Unicode internally:
line = line.upper()
# Encode what you send:
line = line.encode('utf-8')
sys.stdout.write(line)
Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.
Change encoding of stdin / stdout at runtime in Python 3
Actually TextIOWrapper
does return bytes. It takes a Unicode string and returns a byte string in a particular encoding. To change sys.stdout
to use a particular encoding in a script, here's an example:
Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u5000')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\dev\python32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u5000' in position 0: character maps to <undefined>>>> import io
>>> import io
>>> import sys
>>> sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
>>> print('\u5000')
倀
(my terminal isn't UTF-8)
sys.stdout.buffer
accesses the raw byte stream. You can also use the following to write to stdout
in a particular encoding:
sys.stdout.buffer.write('\u5000'.encode('utf8'))
How do I specify the encoding in an print() statement?
print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False)
print()
calls file.write()
, and file
defaults to sys.stdout
. sys.stdout
is a file object whose write()
method encodes strings according to its encoding property. If you reconfigure that property it'll change how strings are encoded when printed:
sys.stdout.reconfigure(encoding='latin-1')
Alternatively, you could encode the string yourself and then write the bytes to stdout's underlying binary buffer.
sys.stdout.buffer.write("<some text>".encode('latin-1'))
Beware that buffer
is not a public property: "This is not part of the TextIOBase API and may not exist in some implementations."
Writing unicode strings via sys.stdout in Python
It's not clear to my why you wouldn't be able to do print; but assuming so, yes, the approach looks right to me.
Related Topics
Passing an Integer by Reference in Python
How to Get Variable Data from a Class
Different Behaviour for List._Iadd_ and List._Add_
Intuition and Idea Behind Reshaping 4D Array to 2D Array in Numpy
Convert Pandas Timezone-Aware Datetimeindex to Naive Timestamp, But in Certain Timezone
Printing All Instances of a Class
What's the Simplest Way of Detecting Keyboard Input in a Script from the Terminal
How to Print Original Variable's Name in Python After It Was Returned from a Function
Store Different Datatypes in One Numpy Array
Pandas Groupby.Apply Method Duplicates First Group
Convert Excel Style Date with Pandas
Is It Pythonic: Naming Lambdas
Importerror: No Module Named 'Encodings'
How Slow Is Python's String Concatenation VS. Str.Join