Is There a Description of How _Cmp_ Works for Dict Objects in Python 2

Is there a description of how __cmp__ works for dict objects in Python 2?

If you are asking how comparing dictionaries works, it is this:

  • To compare dicts A and B, first compare their lengths. If they are unequal, then return cmp(len(A), len(B)).
  • Next, find the key adiff in A that is the smallest key for which adiff not in B or A[adiff] != B[adiff]. (If there is no such key, the dicts are equal.)
  • Also find the smallest key bdiff in B for which bdiff not in A or A[bdiff] != B[bdiff].
  • If adiff != bdiff, then return cmp(adiff, bdiff). Else return cmp(A[adiff], B[bdiff]).

In pseudo-code:

def smallest_diff_key(A, B):
"""return the smallest key adiff in A such that adiff not in B or A[adiff] != B[bdiff]"""
diff_keys = [k for k in A if k not in B or A[k] != B[k]]
return min(diff_keys)

def dict_cmp(A, B):
if len(A) != len(B):
return cmp(len(A), len(B))
try:
adiff = smallest_diff_key(A, B)
except ValueError:
# No difference.
return 0
bdiff = smallest_diff_key(B, A)
if adiff != bdiff:
return cmp(adiff, bdiff)
return cmp(A[adiff], b[bdiff])

This is translated from the 2.6.3 implementation in dictobject.c.

Examining the working of cmp function for dictionaries

Most of the programming language implement the comparison of strings according to dictionary order (the way that words are ordered in a dictionary), i.e. compare the character's value one by one and return the first difference.

If the keys themselves are different, the return values are actually depends on the implementation. You can find more information from here: Is there a description of how __cmp__ works for dict objects in Python 2?. However it is not recommended to rely on this feature in your code.

Use Python 2 Dict Comparison in Python 3

If you want the same behavior as earlier versions of Python 2.x in both 2.7 (which uses an arbitrary sort order instead) and 3.x (which refuses to sort dicts), Ned Batchelder's answer to a question about how sorting dicts works gets you part of the way there, but not all the way.


First, it gives you an old-style cmp function, not a new-style key function. Fortunately, both 2.7 and 3.x have functools.cmp_to_key to solve that. (You could of course instead rewrite the code as a key function, but that may make it harder to see any differences between the posted code and your code…)


More importantly, it not only doesn't do the same thing in 2.7 and 3.x, it doesn't even work in 2.7 and 3.x. To understand why, look at the code:

def smallest_diff_key(A, B):
"""return the smallest key adiff in A such that A[adiff] != B[bdiff]"""
diff_keys = [k for k in A if A.get(k) != B.get(k)]
return min(diff_keys)

def dict_cmp(A, B):
if len(A) != len(B):
return cmp(len(A), len(B))
adiff = smallest_diff_key(A, B)
bdiff = smallest_diff_key(B, A)
if adiff != bdiff:
return cmp(adiff, bdiff)
return cmp(A[adiff], b[bdiff])

Notice that it's calling cmp on the mismatched values.

If the dicts can contain other dicts, that's relying on the fact that cmp(d1, d2) is going to end up calling this function… which is obviously not true in newer Python.

On top of that, in 3.x cmp doesn't even exist anymore.

Also, this relies on the fact that any value can be compared with any other value—you might get back arbitrary results, but you won't get an exception. That was true (except in a few rare cases) in 2.x, but it's not true in 3.x. That may not be a problem for you if you don't want to compare dicts with non-comparable values (e.g., if it's OK for {1: 2} < {1: 'b'} to raise an exception), but otherwise, it is.

And of course if you don't want arbitrary results for dict comparison, do you really want arbitrary results for value comparisons?

The solution to all three problems is simple: you have to replace cmp, instead of calling it. So, something like this:

def mycmp(A, B):
if isinstance(A, dict) and isinstance(B, dict):
return dict_cmp(A, B)
try:
return A < B
except TypeError:
# what goes here depends on how far you want to go for consistency

If you want the exact rules for comparison of objects of different types that 2.7 used, they're documented, so you can implement them. But if you don't need that much detail, you can write something simpler here (or maybe even just not trap the TypeError, if the exception mentioned above is acceptable).

What is the meaning of for Python dictionaries?

Quoting from comparison docs,

Tuples and Lists

Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.

If not equal, the sequences are ordered the same as their first differing elements. For example, cmp([1,2,x], [1,2,y]) returns the same as cmp(x,y). If the corresponding element does not exist, the shorter sequence is ordered first (for example, [1,2] < [1,2,3]).

Dictionaries

Mappings (dictionaries) compare equal if and only if their sorted (key, value) lists compare equal. (The implementation computes this efficiently, without constructing lists or sorting.) Outcomes other than equality are resolved consistently, but are not otherwise defined. (Earlier versions of Python [prior to 2.7.6] used lexicographic comparison of the sorted (key, value) lists, but this was very expensive for the common case of comparing for equality. An even earlier version of Python compared dictionaries by identity only, but this caused surprises because people expected to be able to test a dictionary for emptiness by comparing it to {}.)

Also, find this part of the documentation, which specifically comparing sequence types with themselves and other types,

Sequence objects may be compared to other objects with the same sequence type. The comparison uses lexicographical ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings uses the ASCII ordering for individual characters.

Note that comparing objects of different types is legal. The outcome is deterministic but arbitrary: the types are ordered by their name. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. (The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.) Mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc.

Actual dictionary comparison, as per Python 2.7 source code, goes like this

  1. Compare the length of keys first. (-1 is returned if first has lesser keys, 1 if second has lesser keys)

  2. If they are the same, then it tries to find a key for which either the key is missing in the other or different (this is called as characterizing the dict)

  3. It does the step 2, either ways, both a, b and b, a. If either of them is empty, then both the dictionaries are assumed to be equal.

  4. Now, the differences we got from characterizing the dictionaries will be compared to get the actual comparison result.

Why can't I use the method __cmp__ in Python 3 as for Python 2?

You need to provide the rich comparison methods for ordering in Python 3, which are __lt__, __gt__, __le__, __ge__, __eq__, and __ne__. See also: PEP 207 -- Rich Comparisons.

__cmp__ is no longer used.


More specifically, __lt__ takes self and other as arguments, and needs to return whether self is less than other. For example:

class Point(object):
...
def __lt__(self, other):
return ((self.x < other.x) and (self.y < other.y))

(This isn't a sensible comparison implementation, but it's hard to tell what you were going for.)

So if you have the following situation:

p1 = Point(1, 2)
p2 = Point(3, 4)

p1 < p2

This will be equivalent to:

p1.__lt__(p2)

which would return True.

__eq__ would return True if the points are equal and False otherwise. The other methods work analogously.


If you use the functools.total_ordering decorator, you only need to implement e.g. the __lt__ and __eq__ methods:

from functools import total_ordering

@total_ordering
class Point(object):
def __lt__(self, other):
...

def __eq__(self, other):
...

Comparing two dictionaries and checking how many (key, value) pairs are equal

If you want to know how many values match in both the dictionaries, you should have said that :)

Maybe something like this:

shared_items = {k: x[k] for k in x if k in y and x[k] == y[k]}
print(len(shared_items))

how to compare two objects in class using __cmp__ method in Python?

You have to compare self.value to other.value:

def __cmp__(self,other):
if self.value < other.value:
return -1
elif self.value > other.value:
return 1
else:return 0

otherwise you're comparing an integer to an object

compare two dictionaries

Your syntax is correct. In your case the choice is taken as dictionary 1 and week is taken as dictionary 2. cmp function is not defined in python 3 so you are getting an error . If you use same code in python 2 then your code will run without any error.

Here is the code in python 2.7:

>>> choice = {'fav': ['biryani', 'chow mein', 'tikka']}
>>> week = {'cook': ['rice', 'pulses', 'pualo', 'biryani']}
>>> cmp(choice, week)
1

How to implement __cmp__() and __hash__() for my class?

I'd not use __cmp__ and stick to using __eq__ instead. For hashing that is enough and you don't want to extend to being sortable here. Moreover, __cmp__ has been removed from Python 3 in favour of the rich comparison methods (__eq__, __lt__, __gt__, etc.).

Next, your __eq__ should return True when the member tuples are equal:

def __eq__(self, other):
if not isinstance(other, ThisClass):
return NotImplemented
return self.member_tuple == other.member_tuple

Returning the NotImplemented singleton when the type of the other object is not the same is good practice because that'll delegate the equality test to the other object; if it doesn't implement __eq__ or also returns NotImplemented Python will fall back to the standard id() test.

Your __hash__ implementation is spot-on.

Because a hash is not meant to be unique (it is just a means to pick a slot in the hash table), equality is then used to determine if a matching key is already present or if a hash collision has taken place. As such __eq__ (or __cmp__ if __eq__ is missing) is not called if the slot to which the object is being hashed is empty.

This does mean that if two objects are considered equal (a.__eq__(b) returns True), then their hash values must be equal too. Otherwise you could end up with a corrupted dictionary, as Python will no longer be able to determine if a key is already present in the hash table.

If both your __eq__ and __hash__ methods are delegating their duties to the self.member_tuple attribute, you are maintaining that property; you can trust the basic tuple type to have implemented this correctly.

See the glossary definition of hashable and the object.__hash__() documentation. If you are curious, I've written about how the dict and set types work internally:

  • Why is the order in dictionaries and sets arbitrary?
  • Overriding Python's Hashing Function in Dictionary


Related Topics



Leave a reply



Submit