Python Unicode Equal Comparison Failed

Python unicode equal comparison failed

You may use the == operator to compare unicode objects for equality.

>>> s1 = u'Hello'
>>> s2 = unicode("Hello")
>>> type(s1), type(s2)
(<type 'unicode'>, <type 'unicode'>)
>>> s1==s2
True
>>>
>>> s3='Hello'.decode('utf-8')
>>> type(s3)
<type 'unicode'>
>>> s1==s3
True
>>>

But, your error message indicates that you aren't comparing unicode objects. You are probably comparing a unicode object to a str object, like so:

>>> u'Hello' == 'Hello'
True
>>> u'Hello' == '\x81\x01'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

See how I have attempted to compare a unicode object against a string which does not represent a valid UTF8 encoding.

Your program, I suppose, is comparing unicode objects with str objects, and the contents of a str object is not a valid UTF8 encoding. This seems likely the result of you (the programmer) not knowing which variable holds unicide, which variable holds UTF8 and which variable holds the bytes read in from a file.

I recommend http://nedbatchelder.com/text/unipain.html, especially the advice to create a "Unicode Sandwich."

How can I fix this warning:UnicodeWarning: Unicode equal comparison failed

You are mixing Unicode strings and bytestrings. Python 2 will try to decode the bytestring (as ASCII) when making comparisons, and when that fails you'll get a warning:

>>> u'å', u'å'.encode('utf8')
(u'\xe5', '\xc3\xa5')
>>> 'å' == u'å'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

Don't mix Unicode strings and bytestrings. Decode your text data as early as possible, and only compare Unicode objects. In the above example, the bytestring is UTF-8 encoded, so decoding as UTF-8 first would resolve the warning.

For your example code, BeautifulSoup is (correctly) producing Unicode text. You'll have to decode your CSV data, see Read and Write CSV files including unicode with Python 2.7 for solutions, or decode those two strings manually with str.decode() calls.

UnicodeWarning: Unicode equal comparison failed

From the comments discussion:

Please find the type of both strings in this kind of scenario. In this specific case, one is type Unicode and other is type string. Converting the string type to Unicode and then comparing two strings can help resolve the issue faster.

Happy Coding :) @uzdisral

Python Unicode Warning: Unicode equal comparison failed to convert both arguments to Unicode

I can't say from your example exactly what is wrong, but this error occurs in Python 2.X when comparing a Unicode string to a byte string. Python 2.X attempts to implicitly convert the byte string to Unicode using the default ascii codec. If that fails, due to the byte string containing non-ASCII bytes, that warning occurs:

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'pingüino' == 'pingüino'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as
being unequal
False

Python 3.X reduces the confusion by not allowing non-ASCII characters in a Unicode string literal:

Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 'pingüino' == b'pingüino'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.

Instead the programmer must be more explicit, comparing bytes to bytes or Unicode to Unicode, or providing the appropriate conversion:

>>> 'pingüino' == b'ping\xfcino'.decode('latin1')
True
>>> 'pingüino'.encode('latin1') == b'ping\xfcino'
True

UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode

urllib.quote() does not properly parse Unicode. To get around this, you can call the .encode() method on the url when reading it (or on the variable you read from the database). So run url = url.encode('utf-8'). With this you get:

import urllib
import urlparse
from urlparse import urlsplit

url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")

and then your output for the path variable will be:

>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'

Does this work?

SQLAlchemy showing a UnicodeWarning: Unicode equal comparison failed

I have not tested this yet but it seems like the problem is that you are trying to compare the Client.email which is a string to the user input which you convert to unicode in the clean_str function.

One possible solution is to convert the user input to a Python string rather than unicode. But in case you need the input to be unicode for some reason, this might lead to problems.

Another solution that I recommend is using SQLAlchemy to convert the String type to unicode. You need to modify your model as:

email = db.Column(db.String(250), nullable=False, index=True, convert_unicode=True)

This will basically tell SQLAlchemy to convert the value returned from the database to unicode. You can read more about it here:
http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.String.params.convert_unicode



Related Topics



Leave a reply



Submit