UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c
http://docs.python.org/howto/unicode.html#the-unicode-type
str = unicode(str, errors='replace')
or
str = unicode(str, errors='ignore')
Note: This will strip out (ignore) the characters in question returning the string without them.
For me this is ideal case since I'm using it as protection against non-ASCII input which is not allowed by my application.
Alternatively: Use the open method from the codecs
module to read in the file:
import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
errors='ignore') as fdata:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte
The error is because there is some non-ascii character in the dictionary and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings with encode()
function as follows (if a
is the string with non-ascii character):
a.encode('utf-8').strip()
error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Python tries to convert a byte-array (a bytes
which it assumes to be a utf-8-encoded string) to a unicode string (str
). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).
Since you did not provide any code we could look at, we only could guess on the rest.
From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()
). I propose to recode this in a fashion like this:
with open(path, 'rb') as f:
contents = f.read()
That b
in the mode specifier in the open()
states that the file shall be treated as binary, so contents
will remain a bytes
. No decoding attempt will happen this way.
UnicodeDecodeError: ‘utf-8’ can’t decode byte 0x90 in position 4024984: invalid start byte
Use the encoding encoding="unicode_escape"
instead of encoding="utf-8"
UnicodeDecodeError: 'utf-8' codec can't decode byte
You can also change engine parameter to 'python'
coffeeStore = pd.read_csv("/content/CoffeeStore.xlsx", header=None, names=col_names,engine='python')
For more detailed explanation about unicode, utf-8 etc. read this legendary blog post
Related Topics
How to Select a HTML Element No Matter What Frame It Is in in Selenium
How to Find the Real User Home Directory Using Python
Django Server Killed Frequently
List of Lists Changes Reflected Across Sublists Unexpectedly
Get the Data Received in a Flask Request
Why Do I Get Attributeerror: 'Nonetype' Object Has No Attribute 'Something'
Pip' Is Not Recognized as an Internal or External Command
Split Strings into Words With Multiple Word Boundary Delimiters
Does Python Support Short-Circuiting
Cartesian Product of X and Y Array Points into Single Array of 2D Points
Post Values from an HTML Form and Access Them in a Flask View
How to Make a Python, Command-Line Program Autocomplete Arbitrary Things Not Interpreter
Are Dictionaries Ordered in Python 3.6+
What Is the Purpose of the Single Underscore "_" Variable in Python