How to print UTF-8 encoded text to the console in Python 3?
How to print UTF-8 encoded text to the console in Python < 3?
print u"some unicode text \N{EURO SIGN}"
print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you have
a bytestring then convert it to Unicode first.
Your locale settings (LANG
, LC_CTYPE
) indicate a utf-8 locale and
therefore (in theory) you could print a utf-8 bytestring directly and it
should be displayed correctly in your terminal (if terminal settings
are consistent with the locale settings and they should be) but you
should avoid it: do not hardcode the character encoding of your
environment inside your script; print Unicode directly instead.
There are many wrong assumptions in your question.
You do not need to set PYTHONIOENCODING
with your locale settings,
to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.
You do not need the workaround sys.stdout =
. It may
codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
break if some code (that you do not control) does need to print bytes
and/or it may break while
printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING
envvar are enough. Also, if you need to replace sys.stdout
then use io.TextIOWrapper()
instead of codecs
module like win-unicode-console
package does.
sys.getdefaultencoding()
is unrelated to your locale settings and toPYTHONIOENCODING
. Your assumption that setting PYTHONIOENCODING
should change sys.getdefaultencoding()
is incorrect. You should
check sys.stdout.encoding
instead.
sys.getdefaultencoding()
is not used when you print to the
console. It may be used as a fallback on Python 2 if stdout is
redirected to a file/pipe unless PYTHOHIOENCODING
is set:
$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8
Do not call sys.setdefaultencoding("UTF-8")
; it may corrupt your
data silently and/or break 3rd-party modules that do not expect
it. Remember sys.getdefaultencoding()
is used to convert bytestrings
(str
) to/from unicode
in Python 2 implicitly e.g., "a" + u"b"
. See also,
the quote in @mesilliac's answer.
How to make python 3 print() utf8
Clarification:
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.
To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:
import sys
sys.stdout.buffer.write(TestText2)
Printing UTF-8 characters in Python 3 to the web
Thanks to the feedback from users here, I was able to piece together a solution:
- The
Content-Type
line must includecharset=utf-8
. - Apache's configuration file must include
SetEnv LANG en_US.UTF-8
.
A great debugging tool was to print the value of sys.stdout.encoding
, it should return "UTF-8", not "ANSI_X3.4-1968".
printing UTF-8 in Python 3 using Sublime Text 3
The answer was actually in the question linked in your question - PYTHONIOENCODING
needs to be set to "utf-8"
. However, since OS X is silly and doesn't pick up on environment variables set in Terminal or via .bashrc
or similar files, this won't work in the way indicated in the answer to the other question. Instead, you need to pass that environment variable to Sublime.
Luckily, ST3 build systems (I don't know about ST2) have the "env"
option. This is a dictionary of keys and values passed to exec.py
, which is responsible for running build systems without the "target"
option set. As discussed in our comments above, I indicated that your sample program worked fine on a UTF-8-encoded text file containing non-ASCII characters when run with ST3 (Build 3122) on Linux, but not with the same version run on OS X. All that was necessary to get it to run was to change the build system to enclude this line:
"env": {"PYTHONIOENCODING": "utf8"},
I saved the build system, hit ⌘B, and the program ran fine.
BTW, if you'd like to read exec.py
, or Packages/Python/Python.sublime-build
, or any other file packed up in a .sublime-package
archive, install PackageResourceViewer
via Package Control. Use the "Open Resource" option in the Command Palette to pick individual files, or "Extract Package" (both are preceded by "PackageResourceViewer:", or prv
using fuzzy search) to extract an entire package to your Packages
folder, which is accessed by selecting Sublime Text → Preferences → Browse Packages…
(just Preferences → Browse Packages…
on other operating systems). It is located on your hard drive in the following location:
- Linux:
~/.config/sublime-text-3/Packages
- OS X:
~/Library/Application Support/Sublime Text 3/Packages
- Windows Regular Install:
C:\Users\YourUserName\AppData\Roaming\Sublime Text 3\Packages
- Windows Portable Install:
InstallationFolder\Sublime Text 3\Data\Packages
Once files are saved to your Packages
folder (if you just view them via the "Open Resource" option and close without changing or saving them, they won't be), they will override the identically-named file contained within the .sublime-package
archive. So, for instance, if you want to edit the default Python.sublime-build
file in the Python
package, your changes will be saved as Packages/Python/Python.sublime-build
, and when you choose the Python
build system from the menu, it will only use your version.
Printing utf8 strings in Sublime Text's console with Windows
I have found a possible fix: add the encoding
parameter in the Python.sublime-build
file:
{
"cmd": ["python", "-u", "$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python",
"encoding": "cp1252",
...
Note: "encoding": "latin1"
seems to work as well, but - I don't know why - "encoding": "utf8"
does not work, even if the .py file is UTF8, even if Python 3 uses UTF8, etc. Mystery!
Edit: This works now:
{
"cmd": ["python", "-u", "$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python",
"encoding": "utf8",
"env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"},
}
Linked topic:
Setting the correct encoding when piping stdout in Python and this answer in particular
How to change the preferred encoding in Sublime Text 3 for MacOS for the
env
trick.
Python 3 - print utf-8 encoded data into console (not \x00(\x00A\x04 )
That's not UTF-8.
3>> b"\x00(\x00A\x04>\x042\x04<\x045\x04A".decode('utf-16be')
'(Aовмес'
Note that "utf-16be" was chosen based on your sample data; it is more likely to be UTF-16LE instead.
Unicode (UTF-8) reading and writing to files in Python
In the notation u'Capit\xe1n\n'
(should be just 'Capit\xe1n\n'
in 3.x, and must be in 3.0 and 3.1), the \xe1
represents just one character. \x
is an escape sequence, indicating that e1
is in hexadecimal.
Writing Capit\xc3\xa1n
into the file in a text editor means that it actually contains \xc3\xa1
. Those are 8 bytes and the code reads them all. We can see this by displaying the result:
# Python 3.x - reading the file as bytes rather than text,
# to ensure we see the raw data
>>> open('f2', 'rb').read()
b'Capit\\xc3\\xa1n\n'
# Python 2.x
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
Instead, just input characters like á
in the editor, which should then handle the conversion to UTF-8 and save it.
In 2.x, a string that actually contains these backslash-escape sequences can be decoded using the string_escape
codec:
# Python 2.x
>>> print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán
The result is a str
that is encoded in UTF-8 where the accented character is represented by the two bytes that were written \\xc3\\xa1
in the original string. To get a unicode
result, decode again with UTF-8.
In 3.x, the string_escape
codec is replaced with unicode_escape
, and it is strictly enforced that we can only encode
from a str
to bytes
, and decode
from bytes
to str
. unicode_escape
needs to start with a bytes
in order to process the escape sequences (the other way around, it adds them); and then it will treat the resulting \xc3
and \xa1
as character escapes rather than byte escapes. As a result, we have to do a bit more work:
# Python 3.x
>>> 'Capit\\xc3\\xa1n\n'.encode('ascii').decode('unicode_escape').encode('latin-1').decode('utf-8')
'Capitán\n'
Related Topics
Why Does Python's Multiprocessing Module Import _Main_ When Starting a New Process on Windows
Converting Currency with $ to Numbers in Python Pandas
Why Can't I Repeat the 'For' Loop for CSV.Reader
Accessing CPU Temperature in Python
SchröDinger's Variable: the _Class_ Cell Magically Appears If You'Re Checking for Its Presence
Python Input Never Equals an Integer
"Inner Exception" (With Traceback) in Python
Extract a String Between Double Quotes
How to Loop Through a List by Twos
Is There a Python Module to Solve Linear Equations
Matching Any Character Including Newlines in a Python Regex Subexpression, Not Globally
How to Do N-D Distance and Nearest Neighbor Calculations on Numpy Arrays