Does Python Have an Ordered Set

Does Python have an ordered set?

There is an ordered set (possible new link) recipe for this which is referred to from the Python 2 Documentation. This runs on Py2.6 or later and 3.0 or later without any modifications. The interface is almost exactly the same as a normal set, except that initialisation should be done with a list.

OrderedSet([1, 2, 3])

This is a MutableSet, so the signature for .union doesn't match that of set, but since it includes __or__ something similar can easily be added:

@staticmethod
def union(*sets):
union = OrderedSet()
union.union(*sets)
return union

def union(self, *sets):
for set in sets:
self |= set

How to get ordered set?

Python doesn't have an OrderedSet; usually we fake it with an OrderedDict.

For example:

>>> from collections import OrderedDict
>>> s = "mathematics"
>>> alpha = "abcdefghiklmnopqrstuvwxyz"
>>> d = OrderedDict.fromkeys(s+alpha)
>>> d
OrderedDict([('m', None), ('a', None), ('t', None), ('h', None), ('e', None), ('i', None), ('c', None), ('s', None), ('b', None), ('d', None), ('f', None), ('g', None), ('k', None), ('l', None), ('n', None), ('o', None), ('p', None), ('q', None), ('r', None), ('u', None), ('v', None), ('w', None), ('x', None), ('y', None), ('z', None)])
>>> ''.join(d)
'matheicsbdfgklnopqruvwxyz'

This doesn't quite work as well as an OrderedSet would, but is often close enough for government work.

Why don't Python sets preserve insertion order?

Sets and dicts are optimized for different use-cases. The primary use of a set is fast membership testing, which is order agnostic. For dicts, cost of the lookup is the most critical operation, and the key is more likely to be present. With sets, the presence or absence of an element is not known in advance, and so the set implementation needs to optimize for both the found and not-found case. Also, some optimizations for common set operations such as union and intersection make it difficult to retain set ordering without degrading performance.

While both data structures are hash based, it's a common misconception that sets are just implemented as dicts with null values. Even before the compact dict implementation in CPython 3.6, the set and dict implementations already differed significantly, with little code reuse. For example, dicts use randomized probing, but sets use a combination of linear probing and open addressing, to improve cache locality. The initial linear probe (default 9 steps in CPython) will check a series of adjacent key/hash pairs, improving performance by reducing the cost of hash collision handling - consecutive memory access is cheaper than scattered probes.

  • dictobject.c - master, v3.5.9
  • setobject.c - master, v3.5.9
  • issue18771 - changeset to reduce the cost of hash collisions for set objects in Python 3.4.

It would be possible in theory to change CPython's set implementation to be similar to the compact dict, but in practice there are drawbacks, and notable core developers were opposed to making such a change.

Sets remain unordered. (Why? The usage patterns are different. Also, different implementation.)

– Guido van Rossum

Sets use a different algorithm that isn't as amendable to retaining insertion order.
Set-to-set operations lose their flexibility and optimizations if order is required. Set mathematics are defined in terms of unordered sets. In short, set ordering isn't in the immediate future.

– Raymond Hettinger

A detailed discussion about whether to compactify sets for 3.7, and why it was decided against, can be found in the python-dev mailing lists.

In summary, the main points are: different usage patterns (insertion ordering dicts such as **kwargs is useful, less so for sets), space savings for compacting sets are less significant (because there are only key + hash arrays to densify, as opposed to key + hash + value arrays), and the aforementioned linear probing optimization which sets currently use is incompatible with a compact implementation.

I will reproduce Raymond's post below which covers the most important points.

On Sep 14, 2016, at 3:50 PM, Eric Snow wrote:

Then, I'll do same to sets.

Unless I've misunderstood, Raymond was opposed to making a similar
change to set.

That's right. Here are a few thoughts on the subject before people
starting running wild.

  • For the compact dict, the space savings was a net win with the additional space consumed by the indices and the overallocation for
    the key/value/hash arrays being more than offset by the improved
    density of key/value/hash arrays. However for sets, the net was much
    less favorable because we still need the indices and overallocation
    but can only offset the space cost by densifying only two of the three
    arrays. In other words, compacting makes more sense when you have
    wasted space for keys, values, and hashes. If you lose one of those
    three, it stops being compelling.

  • The use pattern for sets is different from dicts. The former has more hit or miss lookups. The latter tends to have fewer missing key
    lookups. Also, some of the optimizations for the set-to-set operations
    make it difficult to retain set ordering without impacting
    performance.

  • I pursued alternative path to improve set performance. Instead of compacting (which wasn't much of space win and incurred the cost of an
    additional indirection), I added linear probing to reduce the cost of
    collisions and improve cache performance. This improvement is
    incompatible with the compacting approach I advocated for
    dictionaries.

  • For now, the ordering side-effect on dictionaries is non-guaranteed, so it is premature to start insisting the sets become ordered as well.
    The docs already link to a recipe for creating an OrderedSet (
    https://code.activestate.com/recipes/576694/ ) but it seems like the
    uptake has been nearly zero. Also, now that Eric Snow has given us a
    fast OrderedDict, it is easier than ever to build an OrderedSet from
    MutableSet and OrderedDict, but again I haven't observed any real
    interest because typical set-to-set data analytics don't really need
    or care about ordering. Likewise, the primary use of fast membership
    testings is order agnostic.

  • That said, I do think there is room to add alternative set implementations to PyPI. In particular, there are some interesting
    special cases for orderable data where set-to-set operations can be
    sped-up by comparing entire ranges of keys (see
    https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists
    for a starting point). IIRC, PyPI already has code for set-like bloom
    filters and cuckoo hashing.

  • I understanding that it is exciting to have a major block of code accepted into the Python core but that shouldn't open to floodgates to
    engaging in more major rewrites of other datatypes unless we're sure
    that it is warranted.

– Raymond Hettinger

From [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered, Sept 2016.

Why are python sets sorted in ascending order?

The order correlates to the hash of the object, size of the set, binary representation of the number, insertion order and other implementation parameters. It is completely arbitrary and shouldn't be relied upon:

>>> st = {3, 1, 2,4,9,124124,124124124124,123,12,41,15,}
>>> st
{1, 2, 3, 4, 9, 41, 12, 15, 124124, 123, 124124124124}
>>> st.pop()
1
>>> st.pop()
2
>>> st.pop()
3
>>> st.pop()
4
>>> st.pop()
9
>>> st.pop()
41
>>> st.pop()
12
>>> {1, 41, 12}
{1, 12, 41}
>>> {1, 9, 41, 12}
{1, 12, 9, 41} # Looks like 9 wants to go after 12.
>>> hash(9)
9
>>> hash(12)
12
>>> hash(41)
41
>>> {1, 2, 3, 4, 9, 41, 12}
{1, 2, 3, 4, 9, 12, 41} # 12 before 41
>>> {1, 2, 3, 4, 9, 41, 12, 15} # add 15 at the end
{1, 2, 3, 4, 9, 41, 12, 15} # 12 after 41

Python: order in a set of numbers

Sets are not ordered collections in python or any other language for that matter.

Sets are usually implemented using hash keys (hash codes). So order is probably related to how hash functions are used instead of natural order of its elements.

If you need order, please do consider using a list.

Why doesn't a set in Python have a sort() method?

Short answer from python doc.

https://docs.python.org/3.4/tutorial/datastructures.html#sets

A set is an unordered collection with no duplicate elements.

https://docs.python.org/2/library/sets.html

Since sets only define partial ordering (subset relationships), the output of the list.sort() method is undefined for lists of sets.


Long answer from Fluent Python Chapter 3 Dictionaries and Sets.

Understanding how Python dictionaries and sets are implemented using hash tables is helpful to make sense of their strengths and limitations.

#4: Key ordering depends on insertion order
#5: Adding items to a dict may change the order of existing keys

Sorting a set of values

From a comment:

I want to sort each set.

That's easy. For any set s (or anything else iterable), sorted(s) returns a list of the elements of s in sorted order:

>>> s = set(['0.000000000', '0.009518000', '10.277200999', '0.030810999', '0.018384000', '4.918560000'])
>>> sorted(s)
['0.000000000', '0.009518000', '0.018384000', '0.030810999', '10.277200999', '4.918560000']

Note that sorted is giving you a list, not a set. That's because the whole point of a set, both in mathematics and in almost every programming language,* is that it's not ordered: the sets {1, 2} and {2, 1} are the same set.


You probably don't really want to sort those elements as strings, but as numbers (so 4.918560000 will come before 10.277200999 rather than after).

The best solution is most likely to store the numbers as numbers rather than strings in the first place. But if not, you just need to use a key function:

>>> sorted(s, key=float)
['0.000000000', '0.009518000', '0.018384000', '0.030810999', '4.918560000', '10.277200999']

For more information, see the Sorting HOWTO in the official docs.


* See the comments for exceptions.



Related Topics



Leave a reply



Submit