Why Is Dictionary Ordering Non-Deterministic

Why is dictionary ordering non-deterministic?


Update: In Python 3.6, dict has a new implementation which preserves insertion order. From Python 3.7, this order-preserving behaviour is guaranteed:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to "Security improvements").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be
unpredictable and differ across Python runs. Python has never guaranteed
iteration order of keys in a dict or set, and applications are advised to never
rely on it. Historically, dict iteration order has not changed very often across
releases and has always remained consistent between successive executions of
Python. Thus, some existing applications may be relying on dict or set ordering.
Because of this and the fact that many Python applications which don't accept
untrusted input are not vulnerable to this attack, in all stable Python releases
mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation ("Note" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys())

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999}
> do
> python3.3 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
61 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
73 ['1', '0', '3', '2', '5', '4', '7', '6', '9', '8']
62 ['2', '3', '0', '1', '6', '7', '4', '5', '8', '9']
59 ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']
58 ['4', '5', '6', '7', '0', '1', '2', '3', '8', '9']
55 ['5', '4', '7', '6', '1', '0', '3', '2', '9', '8']
62 ['6', '7', '4', '5', '2', '3', '0', '1', '8', '9']
63 ['7', '6', '5', '4', '3', '2', '1', '0', '9', '8']
60 ['8', '9', '0', '1', '2', '3', '4', '5', '6', '7']
66 ['8', '9', '2', '3', '0', '1', '6', '7', '4', '5']
65 ['8', '9', '4', '5', '6', '7', '0', '1', '2', '3']
53 ['8', '9', '6', '7', '4', '5', '2', '3', '0', '1']
62 ['9', '8', '1', '0', '3', '2', '5', '4', '7', '6']
52 ['9', '8', '3', '2', '1', '0', '7', '6', '5', '4']
73 ['9', '8', '5', '4', '7', '6', '1', '0', '3', '2']
76 ['9', '8', '7', '6', '5', '4', '3', '2', '1', '0']

As noted at the beginning of this answer, that's no longer the case in Python 3.6:

$ for x in {0..999}
> do
> python3.6 -c "print(list({str(i): i for i in range(10)}.keys()))"
> done | sort | uniq -c
1000 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Why is Python 3.3+ dict ordering not only undefined, but variable?

There is a note about this here: https://docs.python.org/3/reference/datamodel.html#object.__hash__

Here is the note:

Note: By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).


See also PYTHONHASHSEED.


Changed in version 3.3: Hash randomization is enabled by default.

how reliable is python’s dictionary ordering?

Python >3.7

Dictionary order is guaranteed to be insertion order.

Python <3.7

In terms of the language definition, no you cannot rely on stable ordering, because it is not promised in the language definition.

Now, it might be that over the short- and medium-term you will find that this ordering is stable, and this makes sense: computers are deterministic, so it's reasonable to expect the same results from one iteration of the experiment to the next. (however, since they are complex systems, this nondeterministic machine might still produce unexpected results, since you don't know the factors that are determinant) However, this reasoning does not extend to the long-term, which is what you should be programming to, because the language implementation is free to choose any means of ordering those keys that it likes, and to change that choice at any time, as long as the implementation is consistent with the language definition. This means that programs depending on some order remaining stable are subject to breakage if run under different implementations, and they are subject to breakage when the implementation is updated.
This is not a place you want to be, therefore you should not make any assumptions about the stability of ordering of dictionary keys.

That being said, if you are only concerned about stability just across the lifetime of one running instance of python then this seems like a safe gamble - again, computers are deterministic - but still a gamble. Test carefully against cases rather more complex than the ones you're expecting to encounter, and then decide whether that chopping block looks like a comfortable place to rest your neck.

What ordering does dict.keys() and dict.values() guarantee?

The general rules:

  1. Before talking about what is guaranteed and what isn't, even if some ordering seems to be "guaranteed", it isn't. You should not rely on it. It is considered bad practice, and could lead to nasty bugs.
  2. d.keys(), d.values(), and d.items() all return the elements in a respective order. The order should be treated as arbitrary (no assumptions should be made about it). (docs)
  3. consecutive calls to d.keys(), d.values(), and d.items() are "stable", in the sense they are guaranteed to preserve the order of previous calls (assuming no insertion/deletion happens between the calls).
  4. Since CPython's V3.6, dict has been reimplemented, and it now preserves insertion order. This was not the goal of the change, but a side effect, and it is NOT part of the python spec, only a detail of the CPython implementation. See point #1 above: relying on this is bad practice and should not be done. Anyways, you should avoid writing CPython-specific code.
  5. In Python2, order is deterministic (i.e. creating a dict twice in the same way will result with the same order). In Python <3.6, it is no longer deterministic, so you can't rely on that either (I'm not sure if this non-determinism is part of the spec or just a CPython implementation detail).

EDIT: added point #5, thanks to @AndyHayden's comment.

Why is a Dictionary not ordered?

Well, for one thing it's not clear whether you expect this to be insertion-order or key-order. For example, what would you expect the result to be if you wrote:

var test = new Dictionary<int, string>();
test.Add(3, "three");
test.Add(2, "two");
test.Add(1, "one");
test.Add(0, "zero");

Console.WriteLine(test.ElementAt(0).Value);

Would you expect "three" or "zero"?

As it happens, I think the current implementation preserves insertion ordering so long as you never delete anything - but you must not rely on this. It's an implementation detail, and that could change in the future.

Deletions also affect this. For example, what would you expect the result of this program to be?

using System;
using System.Collections.Generic;

class Test
{
static void Main()
{
var test = new Dictionary<int, string>();
test.Add(3, "three");
test.Add(2, "two");
test.Add(1, "one");
test.Add(0, "zero");

test.Remove(2);
test.Add(5, "five");

foreach (var pair in test)
{
Console.WriteLine(pair.Key);
}
}
}

It's actually (on my box) 3, 5, 1, 0. The new entry for 5 has used the vacated entry previously used by 2. That's not going to be guaranteed either though.

Rehashing (when the dictionary's underlying storage needs to be expanded) could affect things... all kinds of things do.

Just don't treat it as an ordered collection. It's not designed for that. Even if it happens to work now, you're relying on undocumented behaviour which goes against the purpose of the class.

Is the order of a Python dictionary guaranteed over iterations?

Python 3.1 has a collections.OrderedDict class that can be used for this purpose. It's very efficient, too: "Big-O running times for all methods are the same as for regular dictionaries."

The code for OrderedDict itself is compatible with Python 2.x, though some inherited methods (from the _abcoll module) do use Python 3-only features. However, they can be modified to 2.x code with minimal effort.



Related Topics



Leave a reply



Submit