Accessing the List While Being Sorted

Accessing the list while being sorted

Looking at the source code (of CPython, maybe different behaviour for other implementations) the strange output of your script becomes obvious:

/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/
saved_ob_size = Py_SIZE(self);
saved_ob_item = self->ob_item;
saved_allocated = self->allocated;
Py_SET_SIZE(self, 0);

The comment says it all: When you begin sorting, the list is emptied. Well, it is "empty" in the eye of an external observer.

I quite like the term "core-dump factory".


Compare also:

b = ['b','e','f','d','c','g','a']
f = 'check this'

def m(i):
print i, b, f
return None

b = sorted(b, key= m)
print b

Should you sort a list when getting or setting it?

Sorting the list at every acccess is a bad idea. You have to have a flag which you set when the collection is modified. Only if this flag is set, you need to sort and then reset the flag.

But the best is if you have a data structure which is per definition always sorted. That means, if you insert a new element, the element is automatically inserted at the right index, thus keeping the collection sorted.

I don't know which platform / framework you are using. I know .NET provides a SortedList class which manages that kind of insertion-sort algorithm for you.

Sorting list based on values from another list

Shortest Code

[x for _, x in sorted(zip(Y, X))]

Example:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0, 1, 1, 0, 1, 2, 2, 0, 1]

Z = [x for _,x in sorted(zip(Y,X))]
print(Z) # ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

Generally Speaking

[x for _, x in sorted(zip(Y, X), key=lambda pair: pair[0])]

Explained:

  1. zip the two lists.
  2. create a new, sorted list based on the zip using sorted().
  3. using a list comprehension extract the first elements of each pair from the sorted, zipped list.

For more information on how to set\use the key parameter as well as the sorted function in general, take a look at this.



List appears to be empty during sorting

From the listobject.c source code:

/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/

and from the Mutable Sequence Types documentation:

CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python 2.3 and newer makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.

You could zip a and b instead:

b[:] = [bval for (aval, bval) in sorted(zip(a, b))]

Sort a list by the number of occurrences of the elements in the list

This is by design and intentional. CPython temporarily "disallows" access to the list while the list is being sorted in place, the behavior is documented here:

CPython implementation detail: While a list is being sorted, the
effect of attempting to mutate, or even inspect, the list is
undefined.
The C implementation of Python makes the list appear empty
for the duration, and raises ValueError if it can detect that the list
has been mutated during a sort.

You can inspect that by printing A inside the key function - you'll get an empty list:

In [2]: def key_function(x):
...: print(A, x)
...: return A.count(x)
...:

In [3]: A.sort(key=key_function)
([], 2)
([], 1)
([], 3)
([], 4)
([], 2)
([], 2)
([], 3)

But, if you do that for sorted():

In [4]: sorted(A, key=key_function)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 1)
([2, 1, 3, 4, 2, 2, 3], 3)
([2, 1, 3, 4, 2, 2, 3], 4)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 3)
Out[4]: [1, 4, 3, 3, 2, 2, 2]

It is also documented inside the sort() implementation:

/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/.


Related Topics



Leave a reply



Submit