Why Should We Not Use Sys.Setdefaultencoding("Utf-8") in a Py Script

Why should we NOT use sys.setdefaultencoding(utf-8) in a py script?

As per the documentation: This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode.

This function is only available at Python start-up time, when Python scans the environment. It has to be called in a system-wide module, sitecustomize.py, After this module has been evaluated, the setdefaultencoding() function is removed from the sys module.

The only way to actually use it is with a reload hack that brings the attribute back.

Also, the use of sys.setdefaultencoding() has always been discouraged, and it has become a no-op in py3k. The encoding of py3k is hard-wired to "utf-8" and changing it raises an error.

I suggest some pointers for reading:

  • http://blog.ianbicking.org/illusive-setdefaultencoding.html
  • http://nedbatchelder.com/blog/200401/printing_unicode_from_python.html
  • http://www.diveintopython3.net/strings.html#one-ring-to-rule-them-all
  • http://boodebr.org/main/python/all-about-python-and-unicode
  • http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python

Python Not Accepting UTF-8 Coding

You can set it to utf-8 as:

import sys
reload(sys)
sys.setdefaultencoding("utf8")

Changing default encoding of Python?

Here is a simpler method (hack) that gives you back the setdefaultencoding() function that was deleted from sys:

import sys
# sys.setdefaultencoding() does not exist, here!
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')

(Note for Python 3.4+: reload() is in the importlib library.)

This is not a safe thing to do, though: this is obviously a hack, since sys.setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).

PS: This hack doesn't seem to work with Python 3.9 anymore.

Persist UTF-8 as Default Encoding

Please take a look into site.py library - it is the place where sys.setdefaultencoding happens. You could, I think, modify or substitute this module in order to make it permanent on your machine. Here is some of it's source code, comments explains something:

def setencoding():
"""Set the string encoding used by the Unicode implementation. The
default is 'ascii', but if you're willing to experiment, you can
change this."""

encoding = "ascii" # Default value set by _PyUnicode_Init()
if 0:
# Enable to support locale aware default string encodings.
import locale
loc = locale.getdefaultlocale()
if loc[1]:
encoding = loc[1]
if 0:
# Enable to switch off string to Unicode coercion and implicit
# Unicode to string conversion.
encoding = "undefined"
if encoding != "ascii":
# On Non-Unicode builds this will raise an AttributeError...
sys.setdefaultencoding(encoding) # Needs Python Unicode build !

Full source https://hg.python.org/cpython/file/2.7/Lib/site.py.

This is the place where they delete the sys.setdefaultencoding function, if you were wondering:

def main():

...

# Remove sys.setdefaultencoding() so that users cannot change the
# encoding after initialization. The test for presence is needed when
# this module is run as a script, because this code is executed twice.
if hasattr(sys, "setdefaultencoding"):
del sys.setdefaultencoding

Why i can't change my python default encoding?

You need to read following:

  1. Why should we NOT use sys.setdefaultencoding("utf-8") in a py script?
  2. How to print UTF-8 encoded text to the console in Python < 3?

I recommend using # -*- coding: utf-8 -*- to top of your .py file.

Python3 utf-8 decode issue

The problem is with the print() expression, not with the decode() method.
If you look closely, the raised exception is a UnicodeEncodeError, not a -DecodeError.

Whenever you use the print() function, Python converts its arguments to a str and subsequently encodes the result to bytes, which are sent to the terminal (or whatever Python is run in).
The codec which is used for encoding (eg. UTF-8 or ASCII) depends on the environment.
In an ideal case,

  • the codec which Python uses is compatible with the one which the terminal expects, so the characters are displayed correctly (otherwise you get mojibake like "é" instead of "é");
  • the codec used covers a range of characters that is sufficient for your needs (such as UTF-8 or UTF-16, which contain all characters).

In your case, the second condition isn't met for the Linux docker you mention: the encoding used is ASCII, which only supports characters found on an old English typewriter.
These are a few options to address this problem:

  • Set environment variables: on Linux, Python's encoding defaults depend on this (at least partially). In my experience, this is a bit of a trial and error; setting LC_ALL to something containing "UTF-8" worked for me once. You'll have to put them in start-up script for the shell your terminal runs, eg. .bashrc.
  • Re-encode STDOUT, like so:

    sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8')

    The encoding used has to match the one of the terminal.

  • Encode the strings yourself and send them to the binary buffer underlying sys.stdout, eg. sys.stdout.buffer.write("é".encode('utf8')). This is of course much more boilerplate than print("é"). Again, the encoding used has to match the one of the terminal.
  • Avoid print() altogether. Use open(fn, encoding=...) for output, the logging module for progress info – depending on how interactive your script is, this might be worthwhile (admittedly, you'll probably face the same encoding problem when writing to STDERR with the logging module).

There might be other options, but I doubt that there are nicer ones.

python change character encoding to utf_8

Can you check encoding by following method:

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>>

If encoding is ascii then set to utf-8

  1. open following file(I am using Python 2.7):

    /usr/lib/python2.7/sitecustomize.py

  2. then update following to utf-8

    sys.setdefaultencoding("utf-8")

[Edit 2]

Can you add following in tour code(at start) and then check:-

>>> try:
... import apport_python_hook
... except ImportError:
... pass
... else:
... apport_python_hook.install()
...
>>> import sys
>>>
>>> sys.setdefaultencoding("utf-8")
>>>
>>>


Related Topics



Leave a reply



Submit