Why is python ordering my dictionary like so?
For older versions of Python, the real question should be “why not?” — An unordered dictionary is usually implemented as a hash table where the order of elements is well-defined but not immediately obvious (the Python documentation used to state this). Your observations match the rules of a hash table perfectly: apparent arbitrary, but constant order.
Python has since changed its dict
implementation to preserve the order of insertion, and this is guaranteed as of Python 3.7. The implementation therefore no longer constitutes a pure hash table (but a hash table is still used in its implementation).
Why is the order in dictionaries and sets arbitrary?
Note: This answer was written before the implementation of the
dict
type changed, in Python 3.6. Most of the implementation details in this answer still apply, but the listing order of keys in dictionaries is no longer determined by hash values. The set implementation remains unchanged.
The order is not arbitrary, but depends on the insertion and deletion history of the dictionary or set, as well as on the specific Python implementation. For the remainder of this answer, for 'dictionary', you can also read 'set'; sets are implemented as dictionaries with just keys and no values.
Keys are hashed, and hash values are assigned to slots in a dynamic table (it can grow or shrink based on needs). And that mapping process can lead to collisions, meaning that a key will have to be slotted in a next slot based on what is already there.
Listing the contents loops over the slots, and so keys are listed in the order they currently reside in the table.
Take the keys 'foo'
and 'bar'
, for example, and lets assume the table size is 8 slots. In Python 2.7, hash('foo')
is -4177197833195190597
, hash('bar')
is 327024216814240868
. Modulo 8, that means these two keys are slotted in slots 3 and 4 then:
>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4
This informs their listing order:
>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}
All slots except 3 and 4 are empty, looping over the table first lists slot 3, then slot 4, so 'foo'
is listed before 'bar'
.
bar
and baz
, however, have hash values that are exactly 8 apart and thus map to the exact same slot, 4
:
>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4
Their order now depends on which key was slotted first; the second key will have to be moved to a next slot:
>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}
The table order differs here, because one or the other key was slotted first.
The technical name for the underlying structure used by CPython (the most commonly used Python implemenation) is a hash table, one that uses open addressing. If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict
works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.
Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary or set is then also dependent on the random hash seed for the current Python invocation.
Other implementations are free to use a different structure for dictionaries, as long as they satisfy the documented Python interface for them, but I believe that all implementations so far use a variation of the hash table.
CPython 3.6 introduces a new dict
implementation that maintains insertion order, and is faster and more memory efficient to boot. Rather than keep a large sparse table where each row references the stored hash value, and the key and value objects, the new implementation adds a smaller hash array that only references indices in a separate 'dense' table (one that only contains as many rows as there are actual key-value pairs), and it is the dense table that happens to list the contained items in order. See the proposal to Python-Dev for more details. Note that in Python 3.6 this is considered an implementation detail, Python-the-language does not specify that other implementations have to retain order. This changed in Python 3.7, where this detail was elevated to be a language specification; for any implementation to be properly compatible with Python 3.7 or newer it must copy this order-preserving behaviour. And to be explicit: this change doesn't apply to sets, as sets already have a 'small' hash structure.
Python 2.7 and newer also provides an OrderedDict
class, a subclass of dict
that adds an additional data structure to record key order. At the price of some speed and extra memory, this class remembers in what order you inserted keys; listing keys, values or items will then do so in that order. It uses a doubly-linked list stored in an additional dictionary to keep the order up-to-date efficiently. See the post by Raymond Hettinger outlining the idea. OrderedDict
objects have other advantages, such as being re-orderable.
If you wanted an ordered set, you can install the oset
package; it works on Python 2.5 and up.
strange dictionary iteration order
Python dict
is not ordered. For performance reasons, it is more efficient for the implementation to forget the order in which you added items.
As the documentation states:
Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
If you need an ordered dictionary, you can use OrderedDict
.
Are dictionaries ordered in Python 3.6+?
Are dictionaries ordered in Python 3.6+?
They are insertion ordered[1].
As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict
if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).
As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.
How does the Python
3.6
dictionary implementation perform better[2] than the older one while preserving element order?
Essentially, by keeping two arrays.
The first array,
dk_entries
, holds the entries (of typePyDictKeyEntry
) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).The second,
dk_indices
, holds the indices for thedk_entries
array (that is, values that indicate the position of the corresponding entry indk_entries
). This array acts as the hash table. When a key is hashed it leads to one of the indices stored indk_indices
and the corresponding entry is fetched by indexingdk_entries
. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from typeint8_t
(1
byte) toint32_t
/int64_t
(4
/8
bytes) on32
/64
bit builds)
In the previous implementation, a sparse array of type PyDictKeyEntry
and size dk_size
had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size
full for performance reasons. (and the empty space still had PyDictKeyEntry
size!).
This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t
(X
depending on dict size) 2/3 * dk_size
s full is kept. The empty space changed from type PyDictKeyEntry
to intX_t
.
So, obviously, creating a sparse array of type PyDictKeyEntry
is much more memory demanding than a sparse array for storing int
s.
You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.
In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as [keyhash, key, value]:
entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]
As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.
[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.
[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.
Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.
how reliable is python’s dictionary ordering?
Python >3.7
Dictionary order is guaranteed to be insertion order.
Python <3.7
In terms of the language definition, no you cannot rely on stable ordering, because it is not promised in the language definition.
Now, it might be that over the short- and medium-term you will find that this ordering is stable, and this makes sense: computers are deterministic, so it's reasonable to expect the same results from one iteration of the experiment to the next. (however, since they are complex systems, this nondeterministic machine might still produce unexpected results, since you don't know the factors that are determinant) However, this reasoning does not extend to the long-term, which is what you should be programming to, because the language implementation is free to choose any means of ordering those keys that it likes, and to change that choice at any time, as long as the implementation is consistent with the language definition. This means that programs depending on some order remaining stable are subject to breakage if run under different implementations, and they are subject to breakage when the implementation is updated.
This is not a place you want to be, therefore you should not make any assumptions about the stability of ordering of dictionary keys.
That being said, if you are only concerned about stability just across the lifetime of one running instance of python then this seems like a safe gamble - again, computers are deterministic - but still a gamble. Test carefully against cases rather more complex than the ones you're expecting to encounter, and then decide whether that chopping block looks like a comfortable place to rest your neck.
Why items order in a dictionary changed in Python?
May I know why this is happening?
It is because of the way dicts are organized internally.
In short, this works via a hash-table which puts the keys into buckets according to their hash()
value.
If I use dict.keys() to extract the keys from a dictionary and iterate it in an order that I suppose it to be, will that cause dismatch problem?
Depending on how you do it.
k = list(d.keys())
k.sort()
for i in k: print i, d[i]
should exactly work how you want it to work.
Related Topics
Python Dictionary Comprehension
How to Improve Performance of This Code
Can a Variable Number of Arguments Be Passed to a Function
How to Pad a String With Zeroes
How to Pass a Variable Between Flask Pages
How to Expand the Output Display to See More Columns of a Pandas Dataframe
How to Modify List Entries During For Loop
Loop "Forgets" to Remove Some Items
Best Way to Convert String to Bytes in Python 3
How to Connect to a MySQL Database in Python
Changing the "Tick Frequency" on X or Y Axis in Matplotlib
What Is the Python Equivalent of Static Variables Inside a Function
Python Exit Commands - Why So Many and When Should Each Be Used
How to Get Local Variables Updated, When Using the 'Exec' Call