Java's Collections.Shuffle Is Doing What

Java's Collections.shuffle is doing what?

Yes, you can look at the code; it basically does a Fisher-Yates shuffle. Here it is (thanks OpenJDK, and yay for open source :-P):

public static void shuffle(List<?> list, Random rnd) {
    int size = list.size();
    if (size < SHUFFLE_THRESHOLD || list instanceof RandomAccess) {
        for (int i=size; i>1; i--)
            swap(list, i-1, rnd.nextInt(i));
    } else {
        Object arr[] = list.toArray();

        // Shuffle array
        for (int i=size; i>1; i--)
            swap(arr, i-1, rnd.nextInt(i));

        // Dump array back into list
        ListIterator it = list.listIterator();
        for (int i=0; i<arr.length; i++) {
            it.next();
            it.set(arr[i]);
        }
    }
}

The swap method:

 private static void swap(Object[] x, int a, int b) {
    Object t = x[a];
    x[a] = x[b];
    x[b] = t;
}

Why does the Collections.shuffle() algorithm work better than my implementation

Collections.Shuffle() does a Fisher-Yates shuffle. It's a more evenly distributed form of shuffling, and does not reshuffle what might have previously been shuffled already, as opposed to your algorithm.

What your algorithm does (also the known as the naive implementation) is that it will randomly pick any array index and shuffle it over, meaning there's a high chance of it picking the same index that has previously been shuffled already.

The Fisher-Yates shuffle (also known as the Donald Knuth Shuffle) is an unbiased algorithm that shuffles items in the array in an equally likely probability. It avoids the chance if it 'moving' the same objects twice.

Here's a good explanation of the Fisher Yates Shuffle by our own Jeff Atwood at Coding Horror, he discusses the naive implementation of the random shuffle versus the Fisher Yates shuffle.

See also this SO question about Java's implementation. It mentions what you asked. You can also look at the source code if you'd like, as mentioned there. I found it by Googling Collections.shuffle() in the top 5 links.

To further discuss this, it's always a good idea to use the Fisher-Yates shuffle compared to the naive implementations, especially in implementations that require a higher level of randomness (such as shuffle poker cards) to avoid introducing odds and unfair play. It wouldn't be a good thing if games of chance where implemented based on our naive implementation, as the bias leads to what you've observed, where the same permutation seems to show up more often than the others.

Lastly, as user @jmruc mentioned, here's a very nice tutorial on visualizing algorithms, it contains the Fisher-Yates shuffle, as well as other algorithms, all beautifully presented. Might help you wrap your head around the concepts if you're more of a visualizer: Visualizing Algorithms by Mike Bostock

What does Collections.shuffle function do

I don't think you're really asking about what Collections.shuffle does in general: I think you're asking why Collections.shuffle affects the output:

You are creating a load of tasks of two kinds: one of them increments a value ("kind 1"); the other increments a value and sometimes prints a message ("kind 2"). These are put into a list, and executed sequentially.

If you put an equal number of these two kinds of tasks into a list, shuffle them, and execute them sequentially, the item at a position is going to be a "kind 1" task 50% of the time, and a "kind 2" task the rest of the time.

Since "kind 2" tasks only prints for a specific value, you will only get something printed if it is a "kind 2" task at position 2 * size - 1, which will be 50% of the time.

If you don't shuffle them, but add "kind 1" and "kind 2" in turn, the item at position 2 * size - 1 will always be a "kind 2", so it will always print.

Does Collections.shuffle() randomize list in place?

Yes, the shuffle method of Collections indeed shuffles the ArrayList in place, that also holds for any other Collection you pass to the method. That is also the case for many other methods there, like sort.

From the official documentation:

This implementation traverses the list backwards, from the last element up to the second, repeatedly swapping a randomly selected element into the "current position". Elements are randomly selected from the portion of the list that runs from the first element to the current position, inclusive.

And also, very important:

If the specified list does not implement the RandomAccess interface and is large, this implementation dumps the specified list into an array before shuffling it, and dumps the shuffled array back into the list.

This would, for example, be the case for a LinkedList.

There is also a variant that takes a Random object, if you want to have more control over the seeding for example, also see the official documentation.

To answer your last question, yes it is a good idea to use that method if you want to have a random permutation of a given collection. The implementation is pretty efficient. However for some special collections you could write a more efficient method, if that is really necessary.

For this I'd recommend to take a look at the implementation of Collections#shuffle as seen here.

How to shuffle a Collection?

It's not really possible - the Collection abstraction does not define an order, for example a set is Collection, and ordering is not defined on sets, thus shuffling them doesn't make sense.

You should convert your Collection to a list (if it is not a list already) and then shuffle it. See also: How to convert a Collection to List?

How to use Java Collections.shuffle() on a Scala array?

java.util.Collections.shuffle(java.util.Arrays.asList(a:_*))

For the above to work correctly, a's element type has to be a subclass of scala.AnyRef (equivalent to java.lang.Object) because Arrays.asList() uses the array passed in as the backing store for the result java.util.List and java.util.List can contain only object references (not primitive values).*

That is also the reason why Collections.shuffle() which shuffles the passed-in java.util.List actually shuffled the array.*

*: See the note below

For example:

scala> val a = Array[java.lang.Integer](1, 2, 3) // note the type parameter                  
a: Array[java.lang.Integer] = Array(1, 2, 3)

scala> java.util.Collections.shuffle(java.util.Arrays.asList(a:_*))

scala> a
res43: Array[java.lang.Integer] = Array(1, 3, 2)

scala> java.util.Collections.shuffle(java.util.Arrays.asList(a:_*))

scala> a
res45: Array[java.lang.Integer] = Array(1, 2, 3)

Note: Scala 2.7.5 is used for the above code snippets. Scala 2.8.0 exhibits different behaviors as Daniel demonstrated. To be on the safe side, do not depend on the fact that the array gets shuffled but instead the list that is returned from Arrays.asList() gets shuffled.

scala> val a = Array[java.lang.Integer](1, 2, 3)
a: Array[java.lang.Integer] = Array(1, 2, 3)

scala> val b = java.util.Arrays.asList(a:_*)
b: java.util.List[java.lang.Integer] = [1, 2, 3]

scala> java.util.Collections.shuffle(b)

scala> b
res50: java.util.List[java.lang.Integer] = [2, 1, 3]

scala> java.util.Collections.shuffle(b)

scala> b
res52: java.util.List[java.lang.Integer] = [3, 1, 2]

Java's Collections.Shuffle Is Doing What