Are Dictionaries Ordered in Python 3.6+

Are dictionaries ordered in Python 3.6+?

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1].

As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.



How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.




[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.




[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.


Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.

Differences between Dictionary orders in Python 3.6 and Python 3.5

TLDR:

To replicate the hash-based ordering, you must take an explicitly ordered dict and impose your own ordering.

from collections import OrderedDict

def cpy35dict(source_dict):
"""Apply hash-ordering to a dictionary"""
return OrderedDict( # ensure stable ordering
sorted( # order items by key hash
source_dict.items(),
key=lambda item: hash(item[0])
)
)

This replicates the ordering used in CPython up to Python 3.5.
Note that the Python language specification makes no guarantees on order prior to Python 3.7 - you should also use this in Python 3.5 and before if you insist on the ordering.


There are basically three types of order for dict-like containers:

  • no order: There is no specific order to items. Every access may return a different order.
  • specific order: There is a well-defined order to items. Every access returns the same order.
  • arbitrary order: The ordering is not defined. It may or may not use a specific order.

As per the Python specification, dict has arbitrary order up to Python 3.6 and insertion order since Python 3.7.

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. [Mapping Types — dict¶]

However, arbitrary order does not exclude specific order. It basically means "whatever the implementation feels like". Different implementations use different schemes to implement dict.

  • PyPy uses insertion order since Python 2.7/3.2

  • CPython 3.5- uses the ordering of the underlying hash function. Since several types have salted hashes, this means you get different order on each invocation.

  • CPython 3.6 uses the ordering by which items are inserted. This is explicitly an implementation detail.

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon [What's new in Python 3.6]

In other words, code for Python 3.6 and earlier should make no assumptions about ordering of dict. Only code for Python 3.7 and later should make assumptions about ordering of dict.

Safe to assume (and teach) that a Python dict will stay ordered?

Formerly, in Python 3.6, this was an unofficial feature of the implementation of dict. As of 3.7, it is now an official (specced) feature of dict.

That officialness allows you to teach it this way about Python dict (but not about "dictionaries" (maps, hashes, etc) in general).

Practically speaking, the feature cannot be removed without significant backwards compatibility issues, and so won't be.

The remaining reasons to use OrderedDict involve a few additional capabilities like move_to_end(), and a few others, and dealing with the recentness of this spec: For example, if you want to run code in earlier versions of Python, or communicate orderedness to people or software libraries that are not
aware of this new feature.

Python 3.6+: Equality of two dictionaries with same keys but in different order

Dictionaries are hash tables, order is not supposed to matter. In python 3.6+ dictionaries are in insertion order but that is just how they are implemented. Order doesn't matter for equality. If you want order to matter in equality use an OrderedDict.

from collections import OrderedDict

d1 = OrderedDict({'foo':123, 'bar':789})
d2 = OrderedDict({'bar':789, 'foo':123})

print(d1 == d2) # False

Dict order in python 3.6 vs older

Dictionaries in Python 3.6 are ordered, but that feature is considered an implementation detail that you shouldn't rely upon (except in a few specific cases like **kwargs). If you do require a specific order, you should use collections.OrderedDict instead. You can construct one using a list of key, value tuples that are in the desired order:

from collections import OrderedDict

finaldict = OrderedDict([('Visitor Team', visitor_team),
('Visitor Rating', visitor_rating),
('Home Team', home_team),
('Home Rating', home_rating),
('Expected Winner', expected_winner),
('Margin', expected_winner_diff),
])

An OrderedDict works just like a normal dict in most respects, other than having a different repr and a few additional methods. You can read more about it in the docs.

In Python 3.6+ you'd also be able to use keyword arguments to the constructor if your key strings were valid identifiers (e.g. OrderedDict(Margin=expected_winner_diff)). Unlike the ordering of normal dicts, the order of keywords is guaranteed to be preserved (not an implementation detail). That's not backwards compatible though (and can't work for your non-identifier keys anyway).

But it's probably worth considering that if you need a very specific order for your data, a dictionary may not be the best type to use to store it in. I see the tabulate function you're using comes from a library, and according to the docs, it accepts many different formats of data. I'd probably just pass it a list of column data, and give it the headers separately:

data = [visitor_team, visitor_rating, home_team,
home_rating, expected_winner, expected_winner_diff]

headers = ["Visitor Team", "Visitor Rating", "Home Team",
"Home Rating", "Expected Winner", "Margin"]

print(tabulate(data, headers=headers, floatfmt=".2f", tablefmt="fancy_grid"))

(Note, I've not actually tested that code, since I don't have the tabulate library on my system. But it should at least be close to working.)

Will OrderedDict become redundant in Python 3.7?

No it won't become redundant in Python 3.7 because OrderedDict is not just a dict that retains insertion order, it also offers an order dependent method, OrderedDict.move_to_end(), and supports reversed() iteration*.

Moreover, equality comparisons with OrderedDict are order sensitive and this is still not the case for dict in Python 3.7, for example:

>>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)]) 
False
>>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)])
True

Two relevant questions here and here.

* Support for reversed() iteration of regular Python dict is added for Python 3.8, see issue33462



Related Topics



Leave a reply



Submit