Why Does Collections.Sort Use Mergesort But Arrays.Sort Does Not

Why does Collections.sort use Mergesort but Arrays.sort does not?

The API guarantees a stable sorting which Quicksort doesn’t offer. However, when sorting primitive values by their natural order you won’t notice a difference as primitive values have no identity. Therefore, Quicksort can used for primitive arrays and will be used when it is considered more efficient¹.

For objects you may notice, when objects with different identity which are deemed equal according to their equals implementation or the provided Comparator change their order. Therefore, Quicksort is not an option. So a variant of MergeSort is used, the current Java versions use TimSort. This applies to both, Arrays.sort and Collections.sort, though with Java 8, the List itself may override the sort algorithms.

¹ The efficiency advantage of Quicksort is needing less memory when done in-place. But it has a dramatic worst case performance and can’t exploit runs of pre-sorted data in an array, which TimSort does.

Therefore, the sorting algorithms were reworked from version to version, while staying in the now-misleadingly named class DualPivotQuicksort. Also, the documentation didn’t catch up, which shows, that it is a bad idea in general, to name an internally used algorithm in a specification, when not necessary.

The current situation (including Java 8 to Java 11) is as follows:

Generally, the sorting methods for primitive arrays will use Quicksort only under certain circumstances. For larger arrays, they will try to identify runs of pre-sorted data first, like TimSort does, and will merge them when the number of runs does not exceed a certain threshold. Otherwise they will fall back to Quicksort, but with an implementation that will fall back to Insertion sort for small ranges, which does not only affect small arrays, but also quick sort’s recursion.
sort(char[],…) and sort(short[],…) add another special case, to use Counting sort for arrays whose length exceeds a certain threshold
Likewise, sort(byte[],…) will use Counting sort, but with a much smaller threshold, which creates the biggest contrast to the documentation, as sort(byte[],…) never uses Quicksort. It only uses Insertion sort for small arrays and Counting sort otherwise.

Why is Collections.sort() much slower than Arrays.sort()?

So, there are two different methods with totally different algorithms here:

Arrays.sort(int[]) uses a dual-pivot quicksort algorithm.

Collections.sort(List<T>) calls list.sort(null) which in turn calls Arrays.sort(T[]). This uses a Timsort algorithm.

So, let's compare Arrays.sort(int[]) and Arrays.sort(T[]).

T[] is a boxed array so there is one extra level of indirection: for each call, you have to unwrap Integer. This certainly leads to an overhead. On the other hand, int[] is a primitive array so all elements are available "immediately".
TimSort is a variation of a classic mergesort algorithm. It is faster than mergesort but it still slower than quicksort because
- quicksort has fewer data movements on random data
- quicksort requires O(log(n)) extra space while TimSort requires O(n) to provide stability which also leads to an overhead.

Why Collections.sort uses merge sort instead of quicksort?

Highly likely from Josh Bloch §:

I did write these methods, so I suppose I'm qualified to answer. It is
true that there is no single best sorting algorithm. QuickSort has
two major deficiencies when compared to mergesort:

It's not stable (as parsifal noted).

It doesn't guarantee n log n performance; it can degrade to quadratic performance on pathological inputs.

Stability is a non-issue for primitive types, as there is no notion of
identity as distinct from (value) equality. And the possibility of
quadratic behavior was deemed not to be a problem in practice for
Bentely and McIlroy's implementation (or subsequently for Dual Pivot
Quicksort), which is why these QuickSort variants were used for
the primitive sorts.

Stability is a big deal when sorting arbitrary objects. For example,
suppose you have objects representing email messages, and you sort
them first by date, then by sender. You expect them to be sorted by
date within each sender, but that will only be true if the sort is
stable. That's why we elected to provide a stable sort (Merge Sort)
to sort object references. (Techincally speaking, multiple sequential
stable sorts result in a lexicographic ordering on the keys in the
reverse order of the sorts: the final sort determines the most
significant subkey.)

It's a nice side benefit that Merge Sort guarantees n log n (time)
performance no matter what the input. Of course there is a down side:
quick sort is an "in place" sort: it requies only log n external space
(to maintain the call stack). Merge, sort, on the other hand,
requires O(n) external space. The TimSort variant (introduced in Java
SE 6) requires substantially less space (O(k)) if the input array is
nearly sorted.

Also, the following is relevant:

The algorithm used by java.util.Arrays.sort and (indirectly) by
java.util.Collections.sort to sort object references is a "modified
mergesort (in which the merge is omitted if the highest element in the
low sublist is less than the lowest element in the high sublist)." It
is a reasonably fast stable sort that guarantees O(n log n)
performance and requires O(n) extra space. In its day (it was written
in 1997 by Joshua Bloch), it was a fine choice, but today but we can
do much better.

Since 2003, Python's list sort has used an algorithm known as timsort
(after Tim Peters, who wrote it). It is a stable, adaptive, iterative
mergesort that requires far fewer than n log(n) comparisons when
running on partially sorted arrays, while offering performance
comparable to a traditional mergesort when run on random arrays. Like
all proper mergesorts timsort is stable and runs in O(n log n) time
(worst case). In the worst case, timsort requires temporary storage
space for n/2 object references; in the best case, it requires only a
small constant amount of space. Contrast this with the current
implementation, which always requires extra space for n object
references, and beats n log n only on nearly sorted lists.

Timsort is described in detail here:
http://svn.python.org/projects/python/trunk/Objects/listsort.txt.

Tim Peters's original implementation is written in C. Joshua Bloch
ported it from C to Java and end tested, benchmarked, and tuned the
resulting code extensively. The resulting code is a drop-in
replacement for java.util.Arrays.sort. On highly ordered data, this
code can run up to 25 times as fast as the current implementation (on
the HotSpot server VM). On random data, the speeds of the old and new
implementations are comparable. For very short lists, the new
implementation is substantially faster that the old even on random
data (because it avoids unnecessary data copying).

Also, see Is Java 7 using Tim Sort for the Method Arrays.Sort?.

There isn't a single "best" choice. As with many other things, it's about tradeoffs.

Why does Java's Arrays.sort method use two different sorting algorithms for different types?

The most likely reason: quicksort is not stable, i.e. equal entries can change their relative position during the sort; among other things, this means that if you sort an already sorted array, it may not stay unchanged.

Since primitive types have no identity (there is no way to distinguish two ints with the same value), this does not matter for them. But for reference types, it could cause problems for some applications. Therefore, a stable merge sort is used for those.

OTOH, a reason not to use the (guaranteed n*log(n)) stable merge sort for primitive types might be that it requires making a clone of the array. For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.

Collections vs Arrays regarding sort()

Well, besides operating on different stuff (Collections.sort operates on a List, and Arrays.sort operates on an array), java.util.Collections.sort() simply calls java.util.Arrays.sort() to do the heavy lifting.

Also, for what it's worth, notice that Arrays.sort runs a merge sort.

Inbuilt java sorting algorithms for array/linked lists that uses Quicksort

Would it be a way to reverse the test-procedure?

Modify your sorting algorithm to work with an array and use Arrays.sort() to match it.

Both classes – ArrayList and LinkedList – are having a toArray() function.

Arrays.sort() calls java.util.ComparableTimSort.sort() internally.

Why Arrays.sort is quicksort algorithm, why not another sort algorithm?

Quicksort has O(n log n) average and O(n^2) worst case performance, that is the best "average case" a sort algorithm can be, there are other sort algorithms that have this performance, but quicksort tends to perform better than most.

See: http://en.wikipedia.org/wiki/Quicksort

Arrays.parallelSort vs Collections.sort

If one runs the program in TryJava8, I get the correctly sorted array. I think you probably printed the input (fruits) instead of the output (arrayFruits). This said, you opened an interesting topic, since in general, you are right a sorting algorithm does not guarantee the full order. In general for large arrays, if two elements are equivalent, but not the same (for instance a different pointer to an equivalent record), the algorithms do not guarantee a specific order. This said ties are in general broken differently by different algorithms.

A compare method should satisfy the order relation constraints:

An order relation should be:

reflexive: every item should be equal to itself (you better return 0 I guess)
asymmetrical: if A is less than or equal to B and B is less than or equal to A, A and B are equal.
transitive: if A is less than or equal to B and B is less than or equal to C, A is less than or equal to C.

Most of the sorting algorithms assume this constraints implicitly (they don't check them) and thus offer a O(n log n) time complexity. If the condition does not hold, depending on the implementation of the algorithm, one obtains different results.

Since a parallel sort uses the MergeSort algorithm, and a default sort uses the QuickSort algorithm, the two algorithms have different behavior.

A relevant topic: most sorting algorithms are not stable. Say two items are "equal", then it is not guaranteed that if A was placed before A' in the original array, A will be placed before A' in the resulting array.

Drawback of using Arrays.sort() method in Java instead of QuickSort or MergeSort

Unless you really need (proven by measurements) a more efficient implementation, and can come up with one, it always makes much more sense to use Arrays.sort. It is efficient and well-tested. It rarely makes any sense at all to implement Quicksort or Merge Sort, as these are generic sorting algorithms. It might make sense to implement a custom sorting algorithm that uses some properties of your data that Java has no way of using (counting sort and bucket sort are nice examples).

As for Arrays.sort, Oracle implementation works as follows:

For int, long, float and double there are essentially three algorithms: Insertion Sort, Dual-Pivot Stable Quicksort and Merge Sort.

1.1. If the array is large enough (currently larger than 286 elements) and it's not “almost sorted” (determined dynamically), then Merge Sort is used.

1.2. Otherwise, if the array is very small (currently less than 47 elements), then Insertion Sort is used.

1.3. Otherwise (if the array is not that small, but either smaller than 286 elements or mostly sorted), Dual-Pivot Quicksort is used.

For short and char, there are Insertion Sort, Dual-Pivot Stable Quicksort and Counting Sort.

2.1. If the array is larger than a certain threshold (currently 3200 elements), then Counting Sort is used.

2.2. Otherwise, similarly to larger types, Insertion Sort or Dual-Pivot Stable Quicksort is used (using the threshold of 47 elements).

For bytes, it's like for shorts, but Counting Sort is used when there are more than 29 elements (since it doesn't require that much memory).
For reference types, things are complicated.

4.1. If the java.util.Arrays.useLegacyMergeSort property is set, then some sort of legacy Merge Sort is used (surprisingly), which falls back to Insertion Sort on really small arrays (less than 7 elements).

4.2. Otherwise, TimSort is used (Tim Pieter's list sort for Python), which is something similar to Merge Sort, but for arrays of less than 32 objects no merging is performed.

The point of the above is that Java people really did their research instead of blindly implementing a random sorting algorithm, hoping that everyone will be happy with it.

In my experience, I find Arrays.sort to be extremely efficient. I can think of two reasons to implement a custom algorithm: the aforementioned case using certain data properties, and sorting data that comes in different “parallel” arrays that we can't merge into one array of composite objects for whatever reason (performance, or perhaps we just don't have control over the code that produced those arrays).

Why Does Collections.Sort Use Mergesort But Arrays.Sort Does Not