Are dictionaries ordered in Python 3.6+?
Are dictionaries ordered in Python 3.6+?
They are insertion ordered[1].
As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict
if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).
As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.
How does the Python
3.6
dictionary implementation perform better[2] than the older one while preserving element order?
Essentially, by keeping two arrays.
The first array,
dk_entries
, holds the entries (of typePyDictKeyEntry
) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).The second,
dk_indices
, holds the indices for thedk_entries
array (that is, values that indicate the position of the corresponding entry indk_entries
). This array acts as the hash table. When a key is hashed it leads to one of the indices stored indk_indices
and the corresponding entry is fetched by indexingdk_entries
. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from typeint8_t
(1
byte) toint32_t
/int64_t
(4
/8
bytes) on32
/64
bit builds)
In the previous implementation, a sparse array of type PyDictKeyEntry
and size dk_size
had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size
full for performance reasons. (and the empty space still had PyDictKeyEntry
size!).
This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t
(X
depending on dict size) 2/3 * dk_size
s full is kept. The empty space changed from type PyDictKeyEntry
to intX_t
.
So, obviously, creating a sparse array of type PyDictKeyEntry
is much more memory demanding than a sparse array for storing int
s.
You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.
In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as [keyhash, key, value]:
entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]
As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.
[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.
[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.
Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.
Differences between Dictionary orders in Python 3.6 and Python 3.5
TLDR:
To replicate the hash-based ordering, you must take an explicitly ordered dict
and impose your own ordering.
from collections import OrderedDict
def cpy35dict(source_dict):
"""Apply hash-ordering to a dictionary"""
return OrderedDict( # ensure stable ordering
sorted( # order items by key hash
source_dict.items(),
key=lambda item: hash(item[0])
)
)
This replicates the ordering used in CPython up to Python 3.5.
Note that the Python language specification makes no guarantees on order prior to Python 3.7 - you should also use this in Python 3.5 and before if you insist on the ordering.
There are basically three types of order for dict
-like containers:
- no order: There is no specific order to items. Every access may return a different order.
- specific order: There is a well-defined order to items. Every access returns the same order.
- arbitrary order: The ordering is not defined. It may or may not use a specific order.
As per the Python specification, dict
has arbitrary order up to Python 3.6 and insertion order since Python 3.7.
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. [Mapping Types — dict¶]
However, arbitrary order does not exclude specific order. It basically means "whatever the implementation feels like". Different implementations use different schemes to implement dict
.
PyPy uses insertion order since Python 2.7/3.2
CPython 3.5- uses the ordering of the underlying hash function. Since several types have salted hashes, this means you get different order on each invocation.
CPython 3.6 uses the ordering by which items are inserted. This is explicitly an implementation detail.
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon [What's new in Python 3.6]
In other words, code for Python 3.6 and earlier should make no assumptions about ordering of dict
. Only code for Python 3.7 and later should make assumptions about ordering of dict
.
Safe to assume (and teach) that a Python dict will stay ordered?
Formerly, in Python 3.6, this was an unofficial feature of the implementation of dict
. As of 3.7, it is now an official (specced) feature of dict
.
That officialness allows you to teach it this way about Python dict
(but not about "dictionaries" (maps, hashes, etc) in general).
Practically speaking, the feature cannot be removed without significant backwards compatibility issues, and so won't be.
The remaining reasons to use OrderedDict
involve a few additional capabilities like move_to_end()
, and a few others, and dealing with the recentness of this spec: For example, if you want to run code in earlier versions of Python, or communicate orderedness to people or software libraries that are not
aware of this new feature.
Python 3.6+: Equality of two dictionaries with same keys but in different order
Dictionaries are hash tables, order is not supposed to matter. In python 3.6+ dictionaries are in insertion order but that is just how they are implemented. Order doesn't matter for equality. If you want order to matter in equality use an OrderedDict.
from collections import OrderedDict
d1 = OrderedDict({'foo':123, 'bar':789})
d2 = OrderedDict({'bar':789, 'foo':123})
print(d1 == d2) # False
Dict order in python 3.6 vs older
Dictionaries in Python 3.6 are ordered, but that feature is considered an implementation detail that you shouldn't rely upon (except in a few specific cases like **kwargs
). If you do require a specific order, you should use collections.OrderedDict
instead. You can construct one using a list of key, value
tuples that are in the desired order:
from collections import OrderedDict
finaldict = OrderedDict([('Visitor Team', visitor_team),
('Visitor Rating', visitor_rating),
('Home Team', home_team),
('Home Rating', home_rating),
('Expected Winner', expected_winner),
('Margin', expected_winner_diff),
])
An OrderedDict
works just like a normal dict
in most respects, other than having a different repr
and a few additional methods. You can read more about it in the docs.
In Python 3.6+ you'd also be able to use keyword arguments to the constructor if your key strings were valid identifiers (e.g. OrderedDict(Margin=expected_winner_diff)
). Unlike the ordering of normal dict
s, the order of keywords is guaranteed to be preserved (not an implementation detail). That's not backwards compatible though (and can't work for your non-identifier keys anyway).
But it's probably worth considering that if you need a very specific order for your data, a dictionary may not be the best type to use to store it in. I see the tabulate
function you're using comes from a library, and according to the docs, it accepts many different formats of data. I'd probably just pass it a list of column data, and give it the headers separately:
data = [visitor_team, visitor_rating, home_team,
home_rating, expected_winner, expected_winner_diff]
headers = ["Visitor Team", "Visitor Rating", "Home Team",
"Home Rating", "Expected Winner", "Margin"]
print(tabulate(data, headers=headers, floatfmt=".2f", tablefmt="fancy_grid"))
(Note, I've not actually tested that code, since I don't have the tabulate
library on my system. But it should at least be close to working.)
Will OrderedDict become redundant in Python 3.7?
No it won't become redundant in Python 3.7 because OrderedDict
is not just a dict
that retains insertion order, it also offers an order dependent method, OrderedDict.move_to_end()
, and supports reversed()
iteration*.
Moreover, equality comparisons with OrderedDict
are order sensitive and this is still not the case for dict
in Python 3.7, for example:
>>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)])
False
>>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)])
True
Two relevant questions here and here.
* Support for reversed()
iteration of regular Python dict
is added for Python 3.8, see issue33462
Related Topics
Using Pip3: Module "Importlib._Bootstrap" Has No Attribute "Sourcefileloader"
Tkinter.Photoimage Doesn't Not Support Png Image
Pythonpath Not Working For Sudo on Gnu/Linux (Works For Root)
How to Remove Items from a List While Iterating
Relative Imports For the Billionth Time
Syntax Error on Print With Python 3
Convert a String Representation of a Dictionary to a Dictionary
Best Way to Structure a Tkinter Application
Why Do Backslashes Appear Twice
How to Change the Environment of a Parent Process in Python
Call Python Script from Bash With Argument
Is There a Python Equivalent to Java'S Awt Robot Class
How to Unnest (Explode) a Column in a Pandas Dataframe, into Multiple Rows