Will OrderedDict become redundant in Python 3.7?
No it won't become redundant in Python 3.7 because OrderedDict
is not just a dict
that retains insertion order, it also offers an order dependent method, OrderedDict.move_to_end()
, and supports reversed()
iteration*.
Moreover, equality comparisons with OrderedDict
are order sensitive and this is still not the case for dict
in Python 3.7, for example:
>>> OrderedDict([(1,1), (2,2)]) == OrderedDict([(2,2), (1,1)])
False
>>> dict([(1,1), (2,2)]) == dict([(2,2), (1,1)])
True
Two relevant questions here and here.
* Support for reversed()
iteration of regular Python dict
is added for Python 3.8, see issue33462
Does dictionary, json (dumps, loads), defaultdict preserve the order from python 3.7?
Python dicts
Since python 3.7 the implementation of dict is ordered as a part of the specification, not as a side effect of the implementation, so it can be used reliably. OrderedDict is still useful for several reasons:
- backward compatibility (if you need to support older python versions)
- reorder keys with move_to_end and pop_item
- clearer intent: if you rely on you dict keys being ordered, an OrderedDict makes it evident, but you do lose some performance
- the order of the keys is considered when comparing OrderedDicts, and ignored for dicts (thanks to @ShadowRanger for the notice):
For example:
from collections import OrderedDict
one_dict = {"a": 1, "b": 2}
another_dict = {"b": 2, "a": 1}
one_ordered_dict = OrderedDict(a=1, b=2)
another_ordered_dict = OrderedDict(b=2, a=1)
print(one_dict == another_dict)
# True
print(one_ordered_dict == another_ordered_dict)
# False
If any of those requirements are needed, OrderedDict can help you, else you can just use a regular dict.
Json dump/load
Regarding the json.dump
, this is a different question:
As far as I can tell, the JSONEncoder
iterates
over the keys, so it should not change the order.
You can force a lexicographic ordering if you want:
>>> json.dumps({'b': 1, 'a': 2}, sort_keys=True)
'{"a": 2, "b": 1}'
Regarding the loading, I don't see why the order would not be preserved, but I admit I'm getting lost in the source code for the decoder and scanner.
If you want, you can use:
>>> json.loads('{"b": 2, "a": 1}', object_pairs_hook=OrderedDict)
OrderedDict([('b', 2), ('a', 1)])
Nested dicts
The above applies for nested dicts as well: _iterencode_dict
is called recursively, and the keys are sorted for all dicts.
Indeed:
>>> json.dumps({'b': 1, 'a': {'d': 1, 'c': 3}}, sort_keys=True)
'{"a": {"c": 3, "d": 1}, "b": 1}'
Cheers!
Difference between dictionary and OrderedDict
As of Python 3.7, a new improvement to the dict
built-in is:
the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.
This means there is no real need for OrderedDict
anymore . They are almost the same.
Some minor details to consider...
Here are some comparisons between Python 3.7+ dict
and OrderedDict
:
from collections import OrderedDict
d = {'b': 1, 'a': 2}
od = OrderedDict([('b', 1), ('a', 2)])
# they are equal with content and order
assert d == od
assert list(d.items()) == list(od.items())
assert repr(dict(od)) == repr(d)
Obviously, there is a difference between the string representation of the two object, with the dict
object in more natural and compact form.
str(d) # {'b': 1, 'a': 2}
str(od) # OrderedDict([('b', 1), ('a', 2)])
As for different methods between the two, this question can be answered with set theory:
d_set = set(dir(d))
od_set = set(dir(od))
od_set.difference(d_set)
# {'__dict__', '__reversed__', 'move_to_end'} for Python 3.7
# {'__dict__', 'move_to_end'} for Python 3.8+
This means OrderedDict
has at most two features that dict
does not have built-in, but work-arounds are shown here:
Workaround for __reversed__
/ reversed()
No workaround is really needed for Python 3.8+, which fixed this issue. OrderedDict
can be "reversed", which simply reverses the keys (not the whole dictionary):
reversed(od) # <odict_iterator at 0x7fc03f119888>
list(reversed(od)) # ['a', 'b']
# with Python 3.7:
reversed(d) # TypeError: 'dict' object is not reversible
list(reversed(list(d.keys()))) # ['a', 'b']
# with Python 3.8+:
reversed(d) # <dict_reversekeyiterator at 0x16caf9d2a90>
list(reversed(d)) # ['a', 'b']
To properly reverse a whole dictionary using Python 3.7+:
dict(reversed(list(d.items()))) # {'a': 2, 'b': 1}
Workaround for move_to_end
OrderedDict
has a move_to_end
method, which is simple to implement:
od.move_to_end('b') # now it is: OrderedDict([('a', 2), ('b', 1)])
d['b'] = d.pop('b') # now it is: {'a': 2, 'b': 1}
Are dictionaries ordered in Python 3.6+?
Are dictionaries ordered in Python 3.6+?
They are insertion ordered[1].
As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict
if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).
As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.
How does the Python
3.6
dictionary implementation perform better[2] than the older one while preserving element order?
Essentially, by keeping two arrays.
The first array,
dk_entries
, holds the entries (of typePyDictKeyEntry
) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).The second,
dk_indices
, holds the indices for thedk_entries
array (that is, values that indicate the position of the corresponding entry indk_entries
). This array acts as the hash table. When a key is hashed it leads to one of the indices stored indk_indices
and the corresponding entry is fetched by indexingdk_entries
. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from typeint8_t
(1
byte) toint32_t
/int64_t
(4
/8
bytes) on32
/64
bit builds)
In the previous implementation, a sparse array of type PyDictKeyEntry
and size dk_size
had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size
full for performance reasons. (and the empty space still had PyDictKeyEntry
size!).
This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t
(X
depending on dict size) 2/3 * dk_size
s full is kept. The empty space changed from type PyDictKeyEntry
to intX_t
.
So, obviously, creating a sparse array of type PyDictKeyEntry
is much more memory demanding than a sparse array for storing int
s.
You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.
In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as [keyhash, key, value]:
entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]
As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.
[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.
[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.
Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.
Are there any reasons not to use an OrderedDict?
OrderedDict
is a subclass of dict
, and needs more memory to keep track of the order in which keys are added. This isn't trivial. The implementation adds a second dict
under the covers, and a doubly-linked list of all the keys (that's the part that remembers the order), and a bunch of weakref proxies. It's not a lot slower, but at least doubles the memory over using a plain dict
.
But if it's appropriate, use it! That's why it's there :-)
How it works
The base dict is just an ordinary dict mapping keys to values - it's not "ordered" at all. When a <key, value>
pair is added, the key
is appended to a list. The list is the part that remembers the order.
But if this were a Python list, deleting a key would take O(n)
time twice over: O(n)
time to find the key in the list, and O(n)
time to remove the key from the list.
So it's a doubly-linked list instead. That makes deleting a key constant (O(1)
) time. But we still need to find the doubly-linked list node belonging to the key. To make that operation O(1)
time too, a second - hidden - dict maps keys to nodes in the doubly-linked list.
So adding a new <key, value>
pair requires adding the pair to the base dict, creating a new doubly-linked list node to hold the key, appending that new node to the doubly-linked list, and mapping the key to that new node in the hidden dict. A bit over twice as much work, but still O(1)
(expected case) time overall.
Similarly, deleting a key that's present is also a bit over twice as much work but O(1)
expected time overall: use the hidden dict to find the key's doubly-linked list node, delete that node from the list, and remove the key from both dicts.
Etc. It's quite efficient.
Does Python have an ordered set?
There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.
OrderedSet([1, 2, 3])
This is a MutableSet, so the signature for .union
doesn't match that of set, but since it includes __or__
something similar can easily be added:
@staticmethod
def union(*sets):
union = OrderedSet()
union.union(*sets)
return union
def union(self, *sets):
for set in sets:
self |= set
Differences between Dictionary orders in Python 3.6 and Python 3.5
TLDR:
To replicate the hash-based ordering, you must take an explicitly ordered dict
and impose your own ordering.
from collections import OrderedDict
def cpy35dict(source_dict):
"""Apply hash-ordering to a dictionary"""
return OrderedDict( # ensure stable ordering
sorted( # order items by key hash
source_dict.items(),
key=lambda item: hash(item[0])
)
)
This replicates the ordering used in CPython up to Python 3.5.
Note that the Python language specification makes no guarantees on order prior to Python 3.7 - you should also use this in Python 3.5 and before if you insist on the ordering.
There are basically three types of order for dict
-like containers:
- no order: There is no specific order to items. Every access may return a different order.
- specific order: There is a well-defined order to items. Every access returns the same order.
- arbitrary order: The ordering is not defined. It may or may not use a specific order.
As per the Python specification, dict
has arbitrary order up to Python 3.6 and insertion order since Python 3.7.
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. [Mapping Types — dict¶]
However, arbitrary order does not exclude specific order. It basically means "whatever the implementation feels like". Different implementations use different schemes to implement dict
.
PyPy uses insertion order since Python 2.7/3.2
CPython 3.5- uses the ordering of the underlying hash function. Since several types have salted hashes, this means you get different order on each invocation.
CPython 3.6 uses the ordering by which items are inserted. This is explicitly an implementation detail.
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon [What's new in Python 3.6]
In other words, code for Python 3.6 and earlier should make no assumptions about ordering of dict
. Only code for Python 3.7 and later should make assumptions about ordering of dict
.
Python OrderedDict not keeping element order
Your problem is that you are constructing a dict
to give the initial data to the OrderedDict
- this dict
doesn't store any order, so the order is lost before it gets to the OrderedDict
.
The solution is to build from an ordered data type - the easiest being a list
of tuple
s:
>>> from collections import OrderedDict
>>> od = OrderedDict([((0, 0), [2]), ((0, 1), [1, 9]), ((0, 2), [1, 5, 9])])
>>> od
OrderedDict([((0, 0), [2]), ((0, 1), [1, 9]), ((0, 2), [1, 5, 9])])
It's worth noting that this is why OrderedDict
uses the syntax it does for it's string representation - string representations should try to be valid Python code to reproduce the object where possible, and that's why the output uses a list of tuples instead of a dict.
Edit: As of Python 3.6, kwargs
is ordered, so you can use keyword arguments instead, provided you are on an up-to-date Python version.
As of 3.7, this is also true for dict
s (it was for CPython in 3.6, but the language spec didn't specify it, so using OrderedDict
was still required for compatibility). This means if you can assume a 3.7+ environment, you can often drop OrderedDict
altogether, or construct one from a regular dict
if you need a specific feature (e.g: order to matter for equality).
Why don't Python sets preserve insertion order?
Sets and dicts are optimized for different use-cases. The primary use of a set is fast membership testing, which is order agnostic. For dicts, cost of the lookup is the most critical operation, and the key is more likely to be present. With sets, the presence or absence of an element is not known in advance, and so the set implementation needs to optimize for both the found and not-found case. Also, some optimizations for common set operations such as union and intersection make it difficult to retain set ordering without degrading performance.
While both data structures are hash based, it's a common misconception that sets are just implemented as dicts with null values. Even before the compact dict implementation in CPython 3.6, the set and dict implementations already differed significantly, with little code reuse. For example, dicts use randomized probing, but sets use a combination of linear probing and open addressing, to improve cache locality. The initial linear probe (default 9 steps in CPython) will check a series of adjacent key/hash pairs, improving performance by reducing the cost of hash collision handling - consecutive memory access is cheaper than scattered probes.
dictobject.c
- master, v3.5.9setobject.c
- master, v3.5.9- issue18771 - changeset to reduce the cost of hash collisions for set objects in Python 3.4.
It would be possible in theory to change CPython's set implementation to be similar to the compact dict, but in practice there are drawbacks, and notable core developers were opposed to making such a change.
Sets remain unordered. (Why? The usage patterns are different. Also, different implementation.)
– Guido van Rossum
Sets use a different algorithm that isn't as amendable to retaining insertion order.
Set-to-set operations lose their flexibility and optimizations if order is required. Set mathematics are defined in terms of unordered sets. In short, set ordering isn't in the immediate future.
– Raymond Hettinger
A detailed discussion about whether to compactify sets for 3.7, and why it was decided against, can be found in the python-dev mailing lists.
In summary, the main points are: different usage patterns (insertion ordering dicts such as **kwargs is useful, less so for sets), space savings for compacting sets are less significant (because there are only key + hash arrays to densify, as opposed to key + hash + value arrays), and the aforementioned linear probing optimization which sets currently use is incompatible with a compact implementation.
I will reproduce Raymond's post below which covers the most important points.
On Sep 14, 2016, at 3:50 PM, Eric Snow wrote:
Then, I'll do same to sets.
Unless I've misunderstood, Raymond was opposed to making a similar
change to set.That's right. Here are a few thoughts on the subject before people
starting running wild.
For the compact dict, the space savings was a net win with the additional space consumed by the indices and the overallocation for
the key/value/hash arrays being more than offset by the improved
density of key/value/hash arrays. However for sets, the net was much
less favorable because we still need the indices and overallocation
but can only offset the space cost by densifying only two of the three
arrays. In other words, compacting makes more sense when you have
wasted space for keys, values, and hashes. If you lose one of those
three, it stops being compelling.The use pattern for sets is different from dicts. The former has more hit or miss lookups. The latter tends to have fewer missing key
lookups. Also, some of the optimizations for the set-to-set operations
make it difficult to retain set ordering without impacting
performance.I pursued alternative path to improve set performance. Instead of compacting (which wasn't much of space win and incurred the cost of an
additional indirection), I added linear probing to reduce the cost of
collisions and improve cache performance. This improvement is
incompatible with the compacting approach I advocated for
dictionaries.For now, the ordering side-effect on dictionaries is non-guaranteed, so it is premature to start insisting the sets become ordered as well.
The docs already link to a recipe for creating an OrderedSet (
https://code.activestate.com/recipes/576694/ ) but it seems like the
uptake has been nearly zero. Also, now that Eric Snow has given us a
fast OrderedDict, it is easier than ever to build an OrderedSet from
MutableSet and OrderedDict, but again I haven't observed any real
interest because typical set-to-set data analytics don't really need
or care about ordering. Likewise, the primary use of fast membership
testings is order agnostic.That said, I do think there is room to add alternative set implementations to PyPI. In particular, there are some interesting
special cases for orderable data where set-to-set operations can be
sped-up by comparing entire ranges of keys (see
https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists
for a starting point). IIRC, PyPI already has code for set-like bloom
filters and cuckoo hashing.I understanding that it is exciting to have a major block of code accepted into the Python core but that shouldn't open to floodgates to
engaging in more major rewrites of other datatypes unless we're sure
that it is warranted.
– Raymond Hettinger
From [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered, Sept 2016.
Related Topics
Password Authentication in Python Paramiko Fails, But Same Credentials Work in Ssh/Sftp Client
Python Decorator Handling Docstrings
Efficient Way to Remove Keys with Empty Strings from a Dict
Multithreaded Web Server in Python
How to Use If/Else in a Dictionary Comprehension
Anaconda/Conda - Install a Specific Package Version
Python Pandas - Missing Required Dependencies ['Numpy'] 1
Multiplying Across in a Numpy Array
What Is the Best Approach to Change Primary Keys in an Existing Django App
Passing a Function to Re.Sub in Python
Write() Versus Writelines() and Concatenated Strings
Python Library 'Unittest': Generate Multiple Tests Programmatically
Using Numpy Vectorize on Functions That Return Vectors
How to Merge Multiple Lists into One List in Python
How to Convert Escaped Characters
How to Read Datetime Back from SQLite as a Datetime Instead of String in Python