algorithm for nth_element
It's called a selection algorithm and wikipedia has a decent page on it: http://en.wikipedia.org/wiki/Selection_algorithm
Also read about order statistics: http://en.wikipedia.org/wiki/Order_statistic
How is nth_element Implemented?
You asked two questions, the titular one
How is nth_element Implemented?
Which you already answered:
There are a lot of claims on StackOverflow and elsewhere that nth_element is O(n) and that it is typically implemented with Introselect.
Which I also can confirm from looking at my stdlib implementation. (More on this later.)
And the one where you don't understand the answer:
How can an algorithm switch between QSort and Median-of-Medians?
Lets have a look at pseudo code that I extracted from my stdlib:
nth_element(first, nth, last)
{
if (first == last || nth == last)
return;
introselect(first, nth, last, log2(last - first) * 2);
}
introselect(first, nth, last, depth_limit)
{
while (last - first > 3)
{
if (depth_limit == 0)
{
// [NOTE by editor] This should be median-of-medians instead.
// [NOTE by editor] See Azmisov's comment below
heap_select(first, nth + 1, last);
// Place the nth largest element in its final position.
iter_swap(first, nth);
return;
}
--depth_limit;
cut = unguarded_partition_pivot(first, last);
if (cut <= nth)
first = cut;
else
last = cut;
}
insertion_sort(first, last);
}
Without getting into details about the referenced functions heap_select
and unguarded_partition_pivot
we can clearly see, that nth_element
gives introselect 2 * log2(size)
subdivision steps (twice as much as needed by quickselect in the best case) until heap_select
kicks in and solves the problem for good.
What is nth_element and what does it do exactly? and how to implement it
So where is nth element here?
The n-th element is the 2
at index 2
because thats what you asked for when you passed begin()+2
.
The element pointed at by nth is changed to whatever element would occur in that position if [first, last) was sorted.
This means that, if the vector was sorted, the order of elements would be
0 1 2 3 4 5 6 7 8 9
^--- begin() + 2
You asked to have 3rd largest element at index 2 (3rd position), and thats what the algorithm does.
In addition it puts all elements smaller in the front and all elements larger in the back:
!(*j < *i)
(for the first version, orcomp(*j, *i) == false
for the second version) is met for any i in the range [first, nth) and for any j in the range [nth, last).
Let's use indices rather than iterators, then for any i < 2
and any j > 2
it holds that v[i] < v[j]
. In other words, 1
and 0
are both smaller than any element in 2 3 6 7 8 5 4 9
.
nth_element implementations complexities
The expected running time is O(N)
The worst case running time for most implemententations is O(N * N) because most implementations use QuickSelect and it could be that QuickSelect runs into bad partitions.
That is true for Microsoft VS2008, VS2010 & VS2012.
Now with the new ISO C++ 2011 standard, the complexity for std::sort has been tightened up - it is guaranteed to be O(N * log N) and has no worse case as David Musser's IntroSort is used: - use QuickSort and if parts of the array experience bad partitioning, swap to heapsort.
Ideally exactly the same should apply std::nth_element but the ISO C++ 2011 standard has not tightened up the complexity requirements. So std::nth_element could be O(N * N) in the worst case. This could be because in David Musser's original paper (see here) he did not mention what algorithm should be swapped to if QuickSelect goes bad.
In the worst case, the median-of-medians using groups of 5 could be used (I have seen a paper the recommended groups of 7 but cannot find it). So a quality implementation of std::nth_element could use QuickSelect and swap to median-of-medians if partitioning goes bad. This would guarantee O(N) behaviour. QuickSelect can be improved by using sampling making the worst case unlikely but not impossible.
Towards understanding nth_element
Try:
std::nth_element (myvector.begin(), myvector.begin()+ 4, myvector.end());
instead of :
std::nth_element (myvector.begin() + 1,
myvector.begin() + 4,
myvector.begin() + 5);
You are shuffling and then calling nth_element
for only a subsequence. This way you cannot what the returned element should be. If you do it for the whole sequence, then the answer is 5.
Std::nth_element is not giving the correct element
The variable end
is not correctly pointing to the end of the vector. Hence, you only perform nth_element()
on the first 2 elements. Since the variable median
value is 1, the point with greater x value will be returned.
You can use points.end()
instead of points.begin() + end
.
nth_element on a vector of structs c++
I suppose that that call to nth_element
produced a compile error, possibly one of those famously incomprehensible C++ template errors. But there should have been some sort of clue at least.
The second argument -- points.size() + points.size()/2
-- is not an iterator into the container; it's an integer. You meant points.begin() + points.size()/2
.
What is an interesting use-case for C++ nth_element algorithm?
A typical use case is when one wants to avoid sorting the whole set because it is large. If you just need the top n, then nth_element is used to partition the set. The first partition then contains the top n elements.
nth_element implementation based on modified quick_sort, not working as expected
In case if anybody tries to implement nth_element based on Hoar partition:
use 2 pointers i,j and maintain loop invariant:
for k = 0..i-1 a[k] <= X and for z = j+1 .. n-1 a[z] >= X
after the loop, the array might be partitioned in 2 ways:
...| j | i |...
or
...| j | X | i |...
all elements(left part) with idx in [L,j] <= X
with idx(right part) in [i,R] >= X
based on the number of elements in left/right part recur on the required path
std::random_device dev;
std::mt19937 rng(dev());
long long kth(vector<long long>& a, int L, int R, int k) {
if(L>=R) return a[L];
int i = L,j = R;
std::uniform_int_distribution<std::mt19937::result_type> dist(L,R);
long long X = a[dist(rng)];
while(i <= j) {
while(a[i] < X) ++i;
while(a[j] > X) --j;
if(i <= j) {
swap(a[i], a[j]);
++i;
--j;
}
}
if(k <= j)
return kth(a, L,j,k);
if (k >=i)
return kth(a,i,R,k);
return X;
}
Related Topics
Should I Return an Rvalue Reference (By Std::Move'Ing)
Best Way to Start a Thread as a Member of a C++ Class
Error Lnk2005: New and Delete Already Defined in Libcmtd.Lib(New.Obj)
Zero Initialization and Static Initialization of Local Scope Static Variable
How to Create a Type List (For Variadic Templates) That Contains N-Times the Same Type
How to Test If a Constexpr Function Is Evaluated at Compile Time
Using C++11 Range-Based for Loop Correctly in Qt
Is There a Shorter Way to Write Compound 'If' Conditions
Typeid() Returns Extra Characters in G++
Differencebetween Static_Cast and Implicit_Cast
Use #Ifdefs and #Define to Optionally Turn a Function Call into a Comment