Python unicode equal comparison failed
You may use the ==
operator to compare unicode objects for equality.
>>> s1 = u'Hello'
>>> s2 = unicode("Hello")
>>> type(s1), type(s2)
(<type 'unicode'>, <type 'unicode'>)
>>> s1==s2
True
>>>
>>> s3='Hello'.decode('utf-8')
>>> type(s3)
<type 'unicode'>
>>> s1==s3
True
>>>
But, your error message indicates that you aren't comparing unicode objects. You are probably comparing a unicode
object to a str
object, like so:
>>> u'Hello' == 'Hello'
True
>>> u'Hello' == '\x81\x01'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
See how I have attempted to compare a unicode object against a string which does not represent a valid UTF8 encoding.
Your program, I suppose, is comparing unicode objects with str objects, and the contents of a str object is not a valid UTF8 encoding. This seems likely the result of you (the programmer) not knowing which variable holds unicide, which variable holds UTF8 and which variable holds the bytes read in from a file.
I recommend http://nedbatchelder.com/text/unipain.html, especially the advice to create a "Unicode Sandwich."
How can I fix this warning:UnicodeWarning: Unicode equal comparison failed
You are mixing Unicode strings and bytestrings. Python 2 will try to decode the bytestring (as ASCII) when making comparisons, and when that fails you'll get a warning:
>>> u'å', u'å'.encode('utf8')
(u'\xe5', '\xc3\xa5')
>>> 'å' == u'å'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
Don't mix Unicode strings and bytestrings. Decode your text data as early as possible, and only compare Unicode objects. In the above example, the bytestring is UTF-8 encoded, so decoding as UTF-8 first would resolve the warning.
For your example code, BeautifulSoup is (correctly) producing Unicode text. You'll have to decode your CSV data, see Read and Write CSV files including unicode with Python 2.7 for solutions, or decode those two strings manually with str.decode()
calls.
UnicodeWarning: Unicode equal comparison failed
From the comments discussion:
Please find the type of both strings in this kind of scenario. In this specific case, one is type Unicode and other is type string. Converting the string type to Unicode and then comparing two strings can help resolve the issue faster.
Happy Coding :) @uzdisral
Python Unicode Warning: Unicode equal comparison failed to convert both arguments to Unicode
I can't say from your example exactly what is wrong, but this error occurs in Python 2.X when comparing a Unicode string to a byte string. Python 2.X attempts to implicitly convert the byte string to Unicode using the default ascii
codec. If that fails, due to the byte string containing non-ASCII bytes, that warning occurs:
Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> u'pingüino' == 'pingüino'
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as
being unequal
False
Python 3.X reduces the confusion by not allowing non-ASCII characters in a Unicode string literal:
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 'pingüino' == b'pingüino'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
Instead the programmer must be more explicit, comparing bytes to bytes or Unicode to Unicode, or providing the appropriate conversion:
>>> 'pingüino' == b'ping\xfcino'.decode('latin1')
True
>>> 'pingüino'.encode('latin1') == b'ping\xfcino'
True
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode
urllib.quote()
does not properly parse Unicode. To get around this, you can call the .encode()
method on the url when reading it (or on the variable you read from the database). So run url = url.encode('utf-8')
. With this you get:
import urllib
import urlparse
from urlparse import urlsplit
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
and then your output for the path
variable will be:
>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'
Does this work?
SQLAlchemy showing a UnicodeWarning: Unicode equal comparison failed
I have not tested this yet but it seems like the problem is that you are trying to compare the Client.email which is a string to the user input which you convert to unicode in the clean_str function.
One possible solution is to convert the user input to a Python string rather than unicode. But in case you need the input to be unicode for some reason, this might lead to problems.
Another solution that I recommend is using SQLAlchemy to convert the String type to unicode. You need to modify your model as:
email = db.Column(db.String(250), nullable=False, index=True, convert_unicode=True)
This will basically tell SQLAlchemy to convert the value returned from the database to unicode. You can read more about it here:
http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.String.params.convert_unicode
Related Topics
Why Is Using Thread Locals in Django Bad
Time Complexity of Accessing a Python Dict
Send Data from a Textbox into Flask
Django Signals VS. Overriding Save Method
Class Inheritance in Python 3.7 Dataclasses
How to Import Data from Mongodb to Pandas
Why Does Python's _Import_ Require Fromlist
Read a Small Random Sample from a Big CSV File into a Python Data Frame
Why Is Adding Attributes to an Already Instantiated Object Allowed
Convert Structured Array to Regular Numpy Array
Uploading Multiple Files with Flask
How to Check That Multiple Keys Are in a Dict in a Single Pass
Pygame How to Let Balls Collide
How Could I Use Batch Normalization in Tensorflow
Installing Setuptools on 64-Bit Windows