Is Java 7 Using Tim Sort for the Method Arrays.Sort

In which version of JDK7, MergeSort was replaced with TimSort in Collections.sort method?

TimSort has been used in Java 7 right from the release. To be more specific, it has been added in Java 7, beta 70.

It might be worth noting that this is still a modified merge sort, just having more modifications being more efficient than the modified merge sort that has been used in previous version.

Why does Java's Arrays.sort method use two different sorting algorithms for different types?

The most likely reason: quicksort is not stable, i.e. equal entries can change their relative position during the sort; among other things, this means that if you sort an already sorted array, it may not stay unchanged.

Since primitive types have no identity (there is no way to distinguish two ints with the same value), this does not matter for them. But for reference types, it could cause problems for some applications. Therefore, a stable merge sort is used for those.

OTOH, a reason not to use the (guaranteed n*log(n)) stable merge sort for primitive types might be that it requires making a clone of the array. For reference types, where the referred objects usually take up far more memory than the array of references, this generally does not matter. But for primitive types, cloning the array outright doubles the memory usage.

Why does Collections.sort use Mergesort but Arrays.sort does not?

The API guarantees a stable sorting which Quicksort doesn’t offer. However, when sorting primitive values by their natural order you won’t notice a difference as primitive values have no identity. Therefore, Quicksort can used for primitive arrays and will be used when it is considered more efficient¹.

For objects you may notice, when objects with different identity which are deemed equal according to their equals implementation or the provided Comparator change their order. Therefore, Quicksort is not an option. So a variant of MergeSort is used, the current Java versions use TimSort. This applies to both, Arrays.sort and Collections.sort, though with Java 8, the List itself may override the sort algorithms.


¹ The efficiency advantage of Quicksort is needing less memory when done in-place. But it has a dramatic worst case performance and can’t exploit runs of pre-sorted data in an array, which TimSort does.

Therefore, the sorting algorithms were reworked from version to version, while staying in the now-misleadingly named class DualPivotQuicksort. Also, the documentation didn’t catch up, which shows, that it is a bad idea in general, to name an internally used algorithm in a specification, when not necessary.

The current situation (including Java 8 to Java 11) is as follows:

  • Generally, the sorting methods for primitive arrays will use Quicksort only under certain circumstances. For larger arrays, they will try to identify runs of pre-sorted data first, like TimSort does, and will merge them when the number of runs does not exceed a certain threshold. Otherwise they will fall back to Quicksort, but with an implementation that will fall back to Insertion sort for small ranges, which does not only affect small arrays, but also quick sort’s recursion.
  • sort(char[],…) and sort(short[],…) add another special case, to use Counting sort for arrays whose length exceeds a certain threshold
  • Likewise, sort(byte[],…) will use Counting sort, but with a much smaller threshold, which creates the biggest contrast to the documentation, as sort(byte[],…) never uses Quicksort. It only uses Insertion sort for small arrays and Counting sort otherwise.

What is the time complexity of Collections#sort method in Java?

This depends on the version of Java you use. But in the end, the Big-O time complexity is still O(N*log(N)).

For Java 6, it's a modified version of mergesort. Check the description here: Collections#sort for Java 6

The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. The specified list must be modifiable, but need not be resizable. This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.

For Java 7, it was improved: Collections#sort for Java 7 due to enhancement. Note that TimSort has a best case of O(N) and proves to be faster than the previous implementation.

Implementation note: This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, while offering the performance of a traditional mergesort when the input array is randomly ordered. If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.

The implementation takes equal advantage of ascending and descending order in its input array, and can take advantage of ascending and descending order in different parts of the same input array. It is well-suited to merging two or more sorted arrays: simply concatenate the arrays and sort the resulting array.

The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.

This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.



Is this a good method for sorting an ArrayList of 10^6?

In theory, it is enough to use. But this makes me wonder why would you have to sort the data in memory. If the data comes from a database, then sort it there using an indexed column/field, otherwise check if you know some characteristics of the field you will use for sorting and if you may use a O(N) time complexity algorithm like Bucket Sort or Radix Sort. When there's no other way, use Collections#sort.

What is the sorting algorithm for Java

Beginning with version 7, Oracle's Java implementation is using Timsort for object arrays bigger than 10 elements, and Insertion sort for arrays with less than that number of elements. The same considerations apply for both Arrays.sort() and Collections.sort(). In older versions of Java, Merge sort was used instead of Timsort.

Other implementations of the language (other than Oracle's) might use a different sorting algorithm, as this is not mandated by the specification. Quoting Collections' documentation:

The documentation for the polymorphic algorithms contained in this class generally includes a brief description of the implementation. Such descriptions should be regarded as implementation notes, rather than parts of the specification. Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used by sort does not have to be a mergesort, but it does have to be stable.)

For sorting numeric primitives, JDK 7 uses "dual pivot quicksort".

Arrays.sort (Comparator) - Java 6 vs Java 7

Apart from a different sorting algorithm, which is only relevant internally, there are no difference between the final result of Array.sort(E, Comparator). So, no, there is not difference "introduced in Java7 for Comparator".

Just change your implementation to

import java.util.Comparator;
public class CompareTester {
int i;
public static class Inside implements Comparator<CompareTester> {
@Override
public int compare(CompareTester o1, CompareTester o2) {
return o1.i - o2.i;
}
}
}

and compare the final resulting array instead and you'll see that there is no difference between Java 6 and 7.

Edit

If you want to avoid using a Comparator, then make your CompareTester class implement Comparable<T> instead (Java 7 doc here);

import java.lang.Comparable;
public class CompareTester implements Comparable<CompareTester> {
int i;

public int compareTo(CompareTester o) {
return this.i - o.i;
}
}

Then, you don't need anything special with Array.sort :

Arrays.sort(numbers);

Why does this sort in Java 1.8

Java 8 uses a modied merge sort. The key line it uses is

// From TimSort.binarySort
while (left < right) {
int mid = (left + right) >>> 1;
if (c.compare(pivot, a[mid]) < 0) // compares for less than 0.
right = mid;
else
left = mid + 1;
}

Note: it only cares about whether you return -1 or 0 (more specifically is < 0, true or false)

Your comparator is the same as

return i1 < i2 ? -1 : 0;

so in all the ways that matter for this code it's correct.

Note: if you change the code like this

return i1 > i2 ? +1 : 0;

It doesn't sort anything.



Related Topics



Leave a reply



Submit