How to Solve Unicodedecodeerror in Python 3.6

How to solve UnicodeDecodeError in Python 3.6?

It sounds like your locale is broken and have another bytes->Unicode issue. The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys to make it work).

To fix your locale, try typing locale from the command line. It should look something like:

LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

locale depends on LANG being set properly. Python effectively uses locale to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII.

You should first attempt to fix your locale. If locale errors, make sure you've installed the correct language pack for your region.

If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8. This should be used as a last resort as you'll be masking problems once again.

If Python is still throwing an error after setting PYTHONIOENCODING then please update your question with the stacktrace. Chances are you've got an implied conversion going on.

Switching to Python 3 causing UnicodeDecodeError

Python 3 decodes text files when reading, encodes when writing. The default encoding is taken from locale.getpreferredencoding(False), which evidently for your setup returns 'ASCII'. See the open() function documenation:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Instead of relying on a system setting, you should open your text files using an explicit codec:

currentFile = open(filename, 'rt', encoding='latin1')

where you set the encoding parameter to match the file you are reading.

Python 3 supports UTF-8 as the default for source code.

The same applies to writing to a writeable text file; data written will be encoded, and if you rely on the system encoding you are liable to get UnicodeEncodingError exceptions unless you explicitly set a suitable codec. What codec to use when writing depends on what text you are writing and what you plan to do with the file afterward.

You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

You can use b'\xef or some string here'.decode("utf-8", "ignore") to simply ignore such an error. Another way of doing it is to use a try-catch block.

And either way, you'd probably need to examine Python Docs on Unicode.

python 3.6.7 UnicodeEncodingError in centos 7 machine

Finally got rid of this issue in this way

I observed the issue mentioned in question under two different circumstances

The first scenario - With all settings posted in the question, all language-related encodings are UTF-8, it worked after our prod server restart without any changes. Still don't know what made it not to work previously and work after restarting the machine.

The second scenario - All LC variables are set to POSIX in our client environment. I went through many solutions which suggested to modify LANG or LC_ALL to UTF-8. But changing all the encoding configurations may lead to problems like date time conversion etc... which are locale-based.

Fix - only changed LC_CTYPE to UTF-8 in our case it is en_US.UTF-8

export LC_CTYPE="en_US.UTF-8"

and it worked.

for line in... results in UnicodeDecodeError: 'utf-8' codec can't decode byte

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.



Related Topics



Leave a reply



Submit