Is Using Random and Orderby a Good Shuffle Algorithm

Is using Random and OrderBy a good shuffle algorithm?

It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.

I prefer Durstenfeld's variant of the Fisher-Yates shuffle which swaps elements.

Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.

The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.

EDIT: Here's a simple implementation (no error checking!):

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
// Note i > 0 to avoid final pointless iteration
for (int i = elements.Length-1; i > 0; i--)
{
// Swap element "i" with a random earlier element it (or itself)
int swapIndex = rng.Next(i + 1);
T tmp = elements[i];
elements[i] = elements[swapIndex];
elements[swapIndex] = tmp;
}
// Lazily yield (avoiding aliasing issues etc)
foreach (T element in elements)
{
yield return element;
}
}

EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    for (int i = elements.Length - 1; i >= 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
// ... except we don't really need to swap it fully, as we can
// return it immediately, and afterwards it's irrelevant.
        int swapIndex = rng.Next(i + 1);
yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
    }
}

This will now only do as much work as it needs to.

Note that in both cases, you need to be careful about the instance of Random you use as:

  • Creating two instances of Random at roughly the same time will yield the same sequence of random numbers (when used in the same way)
  • Random isn't thread-safe.

I have an article on Random which goes into more detail on these issues and provides solutions.

Why is this simple shuffle algorithm — sorting by random() — biased?

JavaScript doesn't specify a specific algorithm for sort, and this shuffling algorithm can give very biased results depending on the specific sorting algorithm in use. Below, I describe some simple, well-known sorting algorithms that give very biased results; I demonstrate that Firefox and Chrome both give very biased results for an array of length 4; and I give a general argument for why any sorting algorithm will give biased results (though not necessarily as biased as these explicit examples).


Example #1 — selection sort. In selection sort, we first find the least element and put it at index 0, then find the second-least element and put it at index 1, and so on. The important thing to note is that, with the comparison function () => Math.random() - 0.5, each argument to the comparison has an equal chance of being considered "less". So if you find the least element by iterating over the array and comparing each element to the previously-least element, then there's a 50% chance that you'll deem the last element to be the least, a 25% chance that you'll deem the second-to-last element to be the least, a 12.5% chance that you'll deem the third-to-last element to be the least, etc., therefore giving a biased distribution of which element goes first.


Example 2 — insertion sort. In insertion sort, we build up a "sorted" section of the array by taking each element in turn and inserting it into the right place in that sorted section (shifting all greater elements by one to make room for it). This means that the last element has a 50% chance of being deemed the least, a 25% chance of being deemed the second-least, a 12.5% chance of being deemed the third-least, etc.


Examples #3 and #4 — whatever Firefox and Chrome use for a four-element array.

Now, realistically, I wouldn't expect any implementation of sort to use exactly selection sort or insertion sort, since there are other algorithms that are much more efficient for large inputs. But sophisticated modern sorting algorithms, such as Timsort, incorporate multiple different sorting algorithms, adaptively choosing between them based on the size and characteristics of the input (or of parts of the input, since they can combine these algorithms in sophisticated ways). So, as an experiment, I tried out this shuffle algorithm on the array [1, 2, 3, 4] — a short enough array that it seemed likely that the sort implementation might just use insertion sort for the whole array.

Here's the code I used:

const counts = {};
for (let i = 0; i < 1_000_000; ++i) {
const permutation = [1, 2, 3, 4].sort(() => Math.random() - 0.5).join('');
counts[permutation] = (counts[permutation]||0) + 1;
}

const result = [];
for (let permutation in counts) {
result.push(permutation + ': ' + counts[permutation]);
}

result.join('\n')

I tried this in both Firefox and Chrome.

In Firefox, I got results like this:

1234: 125747
1243: 62365
1324: 62299
1342: 31003
1423: 31320
1432: 15635
2134: 125380
2143: 62216
2314: 62615
2341: 31255
2413: 31509
2431: 15608
3124: 62377
3142: 31166
3214: 62194
3241: 31293
3412: 15631
3421: 15782
4123: 31056
4132: 15672
4213: 31231
4231: 15319
4312: 15727
4321: 15600

which doesn't match what I'd expect from insertion sort, so it must be doing something different, but regardless, it shows very clear biases. Some permutations occur 1/64 of the time (15,625 times out of a million, plus/minus random noise), some occur 1/32 of the time (31,250), some occur 1/16 of the time (62,500), and some occur 1/8 of the time (125,000); so some permutations are eight times as common as others.

In Chrome, I got results like this:

1234: 187029
1243: 62380
1324: 15409
1342: 15679
1423: 62476
1432: 15368
2134: 31280
2143: 31291
2314: 15683
2341: 15482
2413: 31482
2431: 15732
3124: 15786
3142: 15692
3214: 47186
3241: 47092
3412: 15509
3421: 46600
4123: 62825
4132: 15595
4213: 31091
4231: 15763
4312: 15624
4321: 171946

which also doesn't match what I'd expect from insertion sort, and is a bit more complicated than the distribution in Firefox (I think I see some 3/16ths (187,500) and 3/64ths (46,875) in there?), but is actually even more heavily biased, with a factor of twelve difference between the most and least common permutations.


Example #5 — any deterministic sorting algorithm. I've given various examples above of fairly extreme biases; but really, any sorting algorithm would be expected to produce some bias, because if the algorithm does worst-case k comparisons on an array of length n, and each comparison has a 50–50 split, then the probability of any given permutation has to be a multiple of 1/2k, whereas an unbiased shuffler has to give each permutation the probability 1/n!, which won't be a multiple of 1/2k if n ≥ 3 (because then n! will be a multiple of 3).

That said, I should acknowledge that these biases might be small enough that they don't matter; after all, even 1.0 / 3.0 doesn't compute exactly 1/3, but rather, rounds it to a binary approximation. Of more direct relevance, a typical implementation of Math.random() holds 64 or 128 bits of internal state, which means that it doesn't even have 21! or 35! distinct internal states, which means that no algorithm that uses Math.random() to shuffle an array of 21 or 35 or more elements can possibly produce each permutation with nonzero probability. So I suppose that some bias is inevitable!


Even if you're using a sort implementation that gives results that you consider good enough, there's just no reason to do it this way, since a Fisher–Yates shuffle is simple to code and is faster than any comparison-based sorting algorithm.



But I made a simple script to create an empirical probability distribution of the index that the last element of the array ends after the shuffle: […]

Note that there can potentially be subtler biases such that not all permutations are equally likely even if the last element is equally likely to end up at any position. Even if the sort implementation is held fixed, you'd want to do a more thorough analysis (probably including a look at its source code) before relying on this shuffling algorithm giving unbiased results.

Is there a performance difference between these two algorithms for shuffling an IEnumerable?

The second version will probably be a bit slower because of RemoveAt. Lists are really arrays that grow when you add elements to them, and as such, insertion and removal in the middle is slow (in fact, MSDN states that RemoveAt has an O(n) complexity).

Anyway, the best would be to simply use a profiler to compare both methods.

Running Time of Random Sort

This Algorithm is called Bogosort. It is an instance of a class of Algorithms called Las Vegas Algorithms. Las Vegas Algorithms are Randomized Algorithms which always guarantee correct results but make no guarantees about the computing resources.

The time-complexity of Bogosort cannot directly be expressed in Bachmann-Landau Notation, because of its probabilistic nature. However, we can make a statement about its expected time-complexity. The expected time-complexity of Bogosort is O(n·n!).

Most efficient way to randomly sort (Shuffle) a list of integers in C#

A good linear-time shuffling algorithm is the Fisher-Yates shuffle.

One problem you'll find with your proposed algorithm is that as you near the end of the shuffle, your loop will spend a lot of time looking for randomly chosen elements that have not yet been swapped. This may take an indeterminate amount of time once it gets to the last element to swap.

Also, it looks like your algorithm will never terminate if there are an odd number of elements to sort.

How to implement lazy shuffling of Lists in C#?

Since the original list can be modified, here is a very simple and efficient solution, based on this answer:

public static IEnumerable<T> Shuffle<T>(this IList<T> list, Random rng)
{
for(int i = list.Count - 1; i >= 0; i--)
{
int swapIndex = rng.Next(i + 1);
yield return list[swapIndex];
list[swapIndex] = list[i];
}
}

linq: order by random

http://msdn.microsoft.com/en-us/library/system.guid.newguid.aspx

return (from examQ in idb.Exam_Question_Int_Tbl
where examQ.Exam_Tbl_ID==exam_id
select examQ).OrderBy(x => Guid.NewGuid()).Take(50);

If this is LINQ-to-SQL you could simply add a ORDER BY NEWID() to your SELECT statement.

As commented it might be better to use an algorithm like Fisher-Yates Shuffle, here is an implementation: https://stackoverflow.com/a/375446/284240

Why does using Random in Sort causing [Unable to sort IComparer.Compare error]

Because as the error says, Random is not consistent. You must have a comparer that always returns the same result when given the same parameters. otherwise the sort will not be consistent.

Knuth has a random sort algorithm which worked like an insertion sort, but you swapped the value with a randomly chosen location in hhe existing array.



Related Topics



Leave a reply



Submit