Will Ordereddict Become Redundant in Python 3.7

Will OrderedDict become redundant in Python 3.7?

No it won't become redundant in Python 3.7 because OrderedDict is not just a dict that retains insertion order, it also offers an order dependent method, OrderedDict.move_to_end(), and supports reversed() iteration*.

Moreover, equality comparisons with OrderedDict are order sensitive and this is still not the case for dict in Python 3.7, for example:

>>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)]) 
False
>>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)])
True

Two relevant questions here and here.

* Support for reversed() iteration of regular Python dict is added for Python 3.8, see issue33462

Does dictionary, json (dumps, loads), defaultdict preserve the order from python 3.7?

Python dicts

Since python 3.7 the implementation of dict is ordered as a part of the specification, not as a side effect of the implementation, so it can be used reliably. OrderedDict is still useful for several reasons:

  • backward compatibility (if you need to support older python versions)
  • reorder keys with move_to_end and pop_item
  • clearer intent: if you rely on you dict keys being ordered, an OrderedDict makes it evident, but you do lose some performance
  • the order of the keys is considered when comparing OrderedDicts, and ignored for dicts (thanks to @ShadowRanger for the notice):

For example:

from collections import OrderedDict

one_dict = {"a": 1, "b": 2}
another_dict = {"b": 2, "a": 1}

one_ordered_dict = OrderedDict(a=1, b=2)
another_ordered_dict = OrderedDict(b=2, a=1)

print(one_dict == another_dict)
# True
print(one_ordered_dict == another_ordered_dict)
# False

If any of those requirements are needed, OrderedDict can help you, else you can just use a regular dict.

Json dump/load

Regarding the json.dump, this is a different question:

As far as I can tell, the JSONEncoder iterates
over the keys, so it should not change the order.

You can force a lexicographic ordering if you want:

>>> json.dumps({'b': 1, 'a': 2}, sort_keys=True)
'{"a": 2, "b": 1}'

Regarding the loading, I don't see why the order would not be preserved, but I admit I'm getting lost in the source code for the decoder and scanner.

If you want, you can use:

>>> json.loads('{"b": 2, "a": 1}', object_pairs_hook=OrderedDict)
OrderedDict([('b', 2), ('a', 1)])

Nested dicts

The above applies for nested dicts as well: _iterencode_dict is called recursively, and the keys are sorted for all dicts.

Indeed:

>>> json.dumps({'b': 1, 'a': {'d': 1, 'c': 3}}, sort_keys=True)
'{"a": {"c": 3, "d": 1}, "b": 1}'

Cheers!

Difference between dictionary and OrderedDict

As of Python 3.7, a new improvement to the dict built-in is:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.

This means there is no real need for OrderedDict anymore . They are almost the same.



Some minor details to consider...

Here are some comparisons between Python 3.7+ dict and OrderedDict:

from collections import OrderedDict

d = {'b': 1, 'a': 2}
od = OrderedDict([('b', 1), ('a', 2)])

# they are equal with content and order
assert d == od
assert list(d.items()) == list(od.items())
assert repr(dict(od)) == repr(d)

Obviously, there is a difference between the string representation of the two object, with the dict object in more natural and compact form.

str(d)  # {'b': 1, 'a': 2}
str(od) # OrderedDict([('b', 1), ('a', 2)])

As for different methods between the two, this question can be answered with set theory:

d_set = set(dir(d))
od_set = set(dir(od))
od_set.difference(d_set)
# {'__dict__', '__reversed__', 'move_to_end'} for Python 3.7
# {'__dict__', 'move_to_end'} for Python 3.8+

This means OrderedDict has at most two features that dict does not have built-in, but work-arounds are shown here:

Workaround for __reversed__ / reversed()

No workaround is really needed for Python 3.8+, which fixed this issue. OrderedDict can be "reversed", which simply reverses the keys (not the whole dictionary):

reversed(od)        # <odict_iterator at 0x7fc03f119888>
list(reversed(od)) # ['a', 'b']

# with Python 3.7:
reversed(d) # TypeError: 'dict' object is not reversible
list(reversed(list(d.keys()))) # ['a', 'b']

# with Python 3.8+:
reversed(d) # <dict_reversekeyiterator at 0x16caf9d2a90>
list(reversed(d)) # ['a', 'b']

To properly reverse a whole dictionary using Python 3.7+:

dict(reversed(list(d.items())))  # {'a': 2, 'b': 1}

Workaround for move_to_end

OrderedDict has a move_to_end method, which is simple to implement:

od.move_to_end('b')  # now it is: OrderedDict([('a', 2), ('b', 1)])

d['b'] = d.pop('b') # now it is: {'a': 2, 'b': 1}

Are dictionaries ordered in Python 3.6+?

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1].

As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.



How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.




[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.




[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.


Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

Are there any reasons not to use an OrderedDict?

OrderedDict is a subclass of dict, and needs more memory to keep track of the order in which keys are added. This isn't trivial. The implementation adds a second dict under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies. It's not a lot slower, but at least doubles the memory over using a plain dict.

But if it's appropriate, use it! That's why it's there :-)

How it works

The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all. When a <key, value> pair is added, the key is appended to a list. The list is the part that remembers the order.

But if this were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.

So it's a doubly-linked list instead. That makes deleting a key constant (O(1)) time. But we still need to find the doubly-linked list node belonging to the key. To make that operation O(1) time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.

So adding a new <key, value> pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict. A bit over twice as much work, but still O(1) (expected case) time overall.

Similarly, deleting a key that's present is also a bit over twice as much work but O(1) expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.

Etc. It's quite efficient.

Does Python have an ordered set?

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn't match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
union = OrderedSet()
union.union(*sets)
return union

def union(self, *sets):
for set in sets:
self |= set

Differences between Dictionary orders in Python 3.6 and Python 3.5

TLDR:

To replicate the hash-based ordering, you must take an explicitly ordered dict and impose your own ordering.

from collections import OrderedDict

def cpy35dict(source_dict):
"""Apply hash-ordering to a dictionary"""
return OrderedDict( # ensure stable ordering
sorted( # order items by key hash
source_dict.items(),
key=lambda item: hash(item[0])
)
)

This replicates the ordering used in CPython up to Python 3.5.
Note that the Python language specification makes no guarantees on order prior to Python 3.7 - you should also use this in Python 3.5 and before if you insist on the ordering.


There are basically three types of order for dict-like containers:

  • no order: There is no specific order to items. Every access may return a different order.
  • specific order: There is a well-defined order to items. Every access returns the same order.
  • arbitrary order: The ordering is not defined. It may or may not use a specific order.

As per the Python specification, dict has arbitrary order up to Python 3.6 and insertion order since Python 3.7.

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. [Mapping Types — dict¶]

However, arbitrary order does not exclude specific order. It basically means "whatever the implementation feels like". Different implementations use different schemes to implement dict.

  • PyPy uses insertion order since Python 2.7/3.2

  • CPython 3.5- uses the ordering of the underlying hash function. Since several types have salted hashes, this means you get different order on each invocation.

  • CPython 3.6 uses the ordering by which items are inserted. This is explicitly an implementation detail.

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon [What's new in Python 3.6]

In other words, code for Python 3.6 and earlier should make no assumptions about ordering of dict. Only code for Python 3.7 and later should make assumptions about ordering of dict.

Python OrderedDict not keeping element order

Your problem is that you are constructing a dict to give the initial data to the OrderedDict - this dict doesn't store any order, so the order is lost before it gets to the OrderedDict.

The solution is to build from an ordered data type - the easiest being a list of tuples:

>>> from collections import OrderedDict
>>> od = OrderedDict([((0, 0), [2]), ((0, 1), [1, 9]), ((0, 2), [1, 5, 9])])
>>> od
OrderedDict([((0, 0), [2]), ((0, 1), [1, 9]), ((0, 2), [1, 5, 9])])

It's worth noting that this is why OrderedDict uses the syntax it does for it's string representation - string representations should try to be valid Python code to reproduce the object where possible, and that's why the output uses a list of tuples instead of a dict.

Edit: As of Python 3.6, kwargs is ordered, so you can use keyword arguments instead, provided you are on an up-to-date Python version.

As of 3.7, this is also true for dicts (it was for CPython in 3.6, but the language spec didn't specify it, so using OrderedDict was still required for compatibility). This means if you can assume a 3.7+ environment, you can often drop OrderedDict altogether, or construct one from a regular dict if you need a specific feature (e.g: order to matter for equality).

Why don't Python sets preserve insertion order?

Sets and dicts are optimized for different use-cases. The primary use of a set is fast membership testing, which is order agnostic. For dicts, cost of the lookup is the most critical operation, and the key is more likely to be present. With sets, the presence or absence of an element is not known in advance, and so the set implementation needs to optimize for both the found and not-found case. Also, some optimizations for common set operations such as union and intersection make it difficult to retain set ordering without degrading performance.

While both data structures are hash based, it's a common misconception that sets are just implemented as dicts with null values. Even before the compact dict implementation in CPython 3.6, the set and dict implementations already differed significantly, with little code reuse. For example, dicts use randomized probing, but sets use a combination of linear probing and open addressing, to improve cache locality. The initial linear probe (default 9 steps in CPython) will check a series of adjacent key/hash pairs, improving performance by reducing the cost of hash collision handling - consecutive memory access is cheaper than scattered probes.

  • dictobject.c - master, v3.5.9
  • setobject.c - master, v3.5.9
  • issue18771 - changeset to reduce the cost of hash collisions for set objects in Python 3.4.

It would be possible in theory to change CPython's set implementation to be similar to the compact dict, but in practice there are drawbacks, and notable core developers were opposed to making such a change.

Sets remain unordered. (Why? The usage patterns are different. Also, different implementation.)

– Guido van Rossum

Sets use a different algorithm that isn't as amendable to retaining insertion order.
Set-to-set operations lose their flexibility and optimizations if order is required. Set mathematics are defined in terms of unordered sets. In short, set ordering isn't in the immediate future.

– Raymond Hettinger

A detailed discussion about whether to compactify sets for 3.7, and why it was decided against, can be found in the python-dev mailing lists.

In summary, the main points are: different usage patterns (insertion ordering dicts such as **kwargs is useful, less so for sets), space savings for compacting sets are less significant (because there are only key + hash arrays to densify, as opposed to key + hash + value arrays), and the aforementioned linear probing optimization which sets currently use is incompatible with a compact implementation.

I will reproduce Raymond's post below which covers the most important points.

On Sep 14, 2016, at 3:50 PM, Eric Snow wrote:

Then, I'll do same to sets.

Unless I've misunderstood, Raymond was opposed to making a similar
change to set.

That's right. Here are a few thoughts on the subject before people
starting running wild.

  • For the compact dict, the space savings was a net win with the additional space consumed by the indices and the overallocation for
    the key/value/hash arrays being more than offset by the improved
    density of key/value/hash arrays. However for sets, the net was much
    less favorable because we still need the indices and overallocation
    but can only offset the space cost by densifying only two of the three
    arrays. In other words, compacting makes more sense when you have
    wasted space for keys, values, and hashes. If you lose one of those
    three, it stops being compelling.

  • The use pattern for sets is different from dicts. The former has more hit or miss lookups. The latter tends to have fewer missing key
    lookups. Also, some of the optimizations for the set-to-set operations
    make it difficult to retain set ordering without impacting
    performance.

  • I pursued alternative path to improve set performance. Instead of compacting (which wasn't much of space win and incurred the cost of an
    additional indirection), I added linear probing to reduce the cost of
    collisions and improve cache performance. This improvement is
    incompatible with the compacting approach I advocated for
    dictionaries.

  • For now, the ordering side-effect on dictionaries is non-guaranteed, so it is premature to start insisting the sets become ordered as well.
    The docs already link to a recipe for creating an OrderedSet (
    https://code.activestate.com/recipes/576694/ ) but it seems like the
    uptake has been nearly zero. Also, now that Eric Snow has given us a
    fast OrderedDict, it is easier than ever to build an OrderedSet from
    MutableSet and OrderedDict, but again I haven't observed any real
    interest because typical set-to-set data analytics don't really need
    or care about ordering. Likewise, the primary use of fast membership
    testings is order agnostic.

  • That said, I do think there is room to add alternative set implementations to PyPI. In particular, there are some interesting
    special cases for orderable data where set-to-set operations can be
    sped-up by comparing entire ranges of keys (see
    https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists
    for a starting point). IIRC, PyPI already has code for set-like bloom
    filters and cuckoo hashing.

  • I understanding that it is exciting to have a major block of code accepted into the Python core but that shouldn't open to floodgates to
    engaging in more major rewrites of other datatypes unless we're sure
    that it is warranted.

– Raymond Hettinger

From [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered, Sept 2016.



Related Topics



Leave a reply



Submit